Adventures in Replication – Reviewers don’t want to believe disappointing replication results

Trying to publish replication results is difficult.  Even when the original evidence is very weak or uncertain, reviewers tend to look for reasons to explain away a smaller effect in the replication.  If nothing comes to mind, reviewers may even make something up.  Here’s an actual review I received:

Depending upon where your participants are from, it is quite possible that cultural differences could explain some of the differences between your results and those by B&E.  If indeed they hail from the Dominican Republic, as you state on page 4, then a viable hypothesis might be that power primes have less of an effect (on performance, or perhaps in general) on individuals with more interdependent self-construals than on individuals with more independent self-construals.

What the manuscript had actually stated is that the students were from Dominican University, my home university in River Forest, Illinois.  That’s not in the Dominican Republic.  The reviewer’s major concern over the replication was based on assuming we were in a different country and then not bothering to check.

When I pointed out the error to the editor, they assured me that the reviewer’s misunderstanding did not substantively influence the decision to reject the manuscript.

Of course, reviewers can make mistakes.  But in the reviews I’ve collected this type of thing seems surprisingly common, and the mistakes all seem to lean towards the reviewer finding reasons to discount the research.  Coincidence or an example of motivated reasoning?

Here are a couple of other examples:

I would like to see confidence intervals reported for all statistics (including manipulation checks), not only sometimes. Also, the authors do not report effect sizes.
This was in a manuscript that featured effect sizes and confidence intervals for all variables, including manipulation checks.  It’s never been clear to me what paper the reviewer was discussing.  I asked the editor, and they said that this was a ‘presentational issue’ and that this issue was not critical to rejection of the manuscript.  What might have been critical, though, was that the reviewer had paid so little attention to the manuscript that they didn’t realize it was chock full of effect sizes and confidence intervals (even in the abstract).
And then,
I think replication is extremely important to our field but I also think it's important that we use our limited resources (and journal pages) for replication studies of greater importance than this.
This was for a paper submitted as an online-only replication report; the data was already collected.  So resources had already been expended, no space in the printed journal would be wasted.
And then,
The two weakest studies use a novel DV that has not been used in the literature (at least the authors provide no citations for this measure).
The DV the reviewers described was mirror-tracing, a measure that is very well-established in the literature.  The manuscript included several references.  The manuscript also showed that the DV showed expected relationships with gender and age and provided citations to the original studies documenting these relationships.  What’s especially funny is that this comment would apply to the original research, which used two novel measures of motor skill without any evidence of reliability and validity.
There are more, but this gives the flavor.  Publishing replication work is a truly uphill battle.

I'm a teacher, researcher, and gadfly of neuroscience. My research interests are in the neural basis of learning and memory, the history of neuroscience, computational neuroscience, bibliometrics, and the philosophy of science. I teach courses in neuroscience, statistics, research methods, learning and memory, and happiness. In my spare time I'm usually tinkering with computers, writing programs, or playing ice hockey.

Leave a Reply

Your email address will not be published. Required fields are marked *