I recently completed a peer review for an academic journal, and as I was submitting it I wondered how Claude would perform if given the task. Since the article was anonymised, I didn’t think there’d be any problems uploading the PDF and asking Claude to review it. And, I had already submitted my review so I wouldn’t be influenced by the output.
Note that I’ve left out some of the other parts of my submission, limiting that information in this post, while Claude’s responses are provided verbatim and in full.
This is the prompt I used (substituting the relevant details of the paper): “You are an expert in TOPIC X in POPULATION Y. You are also a researcher with 15 years of experience in academic writing, and in publishing academic articles in peer-reviewed journals. I would like you to review the attached article and tell me what the strengths and weaknesses are. I would also like you to make a recommendation about accepting or rejecting the paper, based on your understanding of what makes a good academic paper.”
Strengths (Claude’s response):
- The paper addresses a timely and relevant topic – the use of TOPIC X in POPULATION Y. This is an emerging issue that warrants discussion in the field.
- The authors provide useful background information on TOPIC X to contextualize the topic for readers.
- The paper presents original survey data collected by the authors on attitudes and use of TOPIC X among POPULATION Y. This helps characterize the current landscape.
- The discussion section relates the survey findings to prior literature and makes recommendations for POPULATION Y. This provides helpful insights.
- The writing is clear and well-organized overall. The authors use an appropriate academic tone and style.
Strengths (MR)
- The author(s) have provided a timely and reasonable overview of the general state of TOPIC X in POPULATION Y.
As you can see, Claude was more expressive than I was in identifying the strengths of the paper, and I agree with the general thrust of Claude’s feedback. However, points 1 and 2 are basically the same as my single point. Claude’s point 3 is just a summary of what was done, and doesn’t really say anything about it being done well. Point 4 was actually wrong (the discussion was poor, and Claude notes a concern in the next section). And, while I can appreciate that the writing was clear and well-organised, I expect this to be baseline, rather than a strength. Having said that, I think that Claude’s response is basically OK, if a little forgiving around what was done well.
Weaknesses (Claude’s response):
- The sample size for the survey is relatively small which limits generalizability of the findings.
- More details on the survey methodology would strengthen the paper – how was the sample recruited? What were the response rates?
- The recommendations section relies heavily on text generated by ChatGPT without sufficient critical analysis by the authors (note: for some reason, the authors decided to use ChatGPT to generate the recommendations for the article). This raises questions about the depth of the discussion.
- There are missed opportunities to relate the findings to theory and frameworks on TOPIC X.
- The conclusion is quite brief and does not adequately summarize the key takeaways.
Weaknesses (MR submission, with some redaction)
- Do the authors want to comment on the fact that this represents a very limited perspective in terms of the population sampled?
- The methods are poorly described. In particular the data analysis is limited. I suggest the complete removal of the ChatGPT analysis and recommendations.
- Much of the discussion reads like an uncritical literature review.
- I went into a lot more detail about the weaknesses of the paper, providing examples and rationales for my thinking.
As you can see from the lists above, both Claude and I raised concerns about the sample, the methods, and the quality of the discussion. Claude also mentioned the conclusion as a weakness, which I didn’t bother to do.
Recommendation (Claude’s response)
- Overall this is a worthwhile contribution that provides timely insights into an emerging issue. However, I would recommend major revisions to strengthen the methodology, enhance the critical analysis and discussion, and improve the conclusion. With these revisions, the paper could make a nice addition to the literature and provide value to readers. Having said that, in its current form, I would likely recommend rejecting the paper for publication, and encourage the authors to resubmit after making substantial revisions.
Recommendation (MR submission)
- Even if the concerns around methods and analysis were addressed, I’m not sure that this article adds much to the conversation. There’s very little here that isn’t fairly well-known about the topic.
- And, I chose the option to Reject the paper.
Claude seems conflicted in it’s recommendation; can a paper be ‘a worthwhile contribution’ if it’s rejected? Maybe? It could also be the case that Claude’s understanding of the journal submission process is flawed. I don’t know of any journal editorial processes that reject a paper while also asking the author to make major revisions.
In summary
- Both Claude and I thought the introduction was comprehensive, and provided a clear and well-written overview of the topic.
- Both Claude and I thought the major weaknesses of the article were related to concerns around the sample, poor description of the methods, an uncritical analysis of the data, and concern around the use of ChatGPT to generate the recommendations.
- Both Claude and I recommended that the editor reject the article, and here’s the important part, we both had the same reasons to support that conclusion.
On the one hand, these are still early days and I obviously can’t say anything about how well LLMs would perform across a range of topics (i.e. is it reliable?). But, on the other hand, these are still early days, and Claude did a remarkable job of reviewing the paper. While it’s true that I included far more detail in my review, I have no doubt that Claude would have gone into the same (actually, more) detail with further prompting.
As someone who has worked at all levels in the editorial team of multiple academic journals, I quite like the idea of having language models to support the review process. The amount of time it takes editors to get reviewers is significant and Claude did this one in seconds, generating a better review than many I’ve seen in my various editorial roles. And, it did it on the first take, with no further prompting. I can easily imagine how good the review could have been if I’d experimented with different prompts, and asked it to explain some of it’s choices. I recognise the usual caveats about bias, reliability, validity, etc. But this is a generic LLM that hasn’t been fine-tuned and I think future research-focused LLMs are going to be phenomenal in this area.
In the short-term, I see enormous potential for academics to take on more peer reviews, with support from language models. I could have done a quick read-through of the paper, identifying the major points of concern, and then asked Claude to elaborate on those, with justification. Much of the time I spend on peer reviews is just taken up by typing up the rationale and argument for my position.
However, as we build more trust in language models, and we start seeing that their outputs have significant overlap with our thinking, I have to ask why we’d spend any time on the review in the first place. And then I have to ask why editors would spend any time sending papers out to review.
There’s a lot to reflect on here. All these blog posts and articles with the obligatory statements about how AI will only supplement or augment more traditional roles in the academy, may turn out to be wrong. I’m moving towards the position that it is already capable of replacing a lot of what we do, and that they’re only going to get better at doing this.
What do you think? Tell me why I’m wrong.
Comments
3 responses to “Claude, you are an expert peer-reviewer…”
[…] Claude, you are an expert peer-reviewer… […]
[…] last 2 posts have dealt with 1) the use of Claude to complete a peer review, and 2) how journals could include this process in their […]
[…] recently wrote a post about using Claude to peer review an academic paper, and the decent job it did. Based on that experience, I started thinking about the probable impact […]