Why do some replication studies fail to produce the expected results? There are lots of possible reasons: the expectation might have been poorly founded, the replication study could have been under-powered, there could be some unknown moderator, etc. Sure, but let’s be real for a moment. We all know that one explanation that will surely leap to mind is that the replicators screwed up, that they were not competent or capable enough to obtain the same results as the original researchers.
This worry over competence is often unspoken, but a few have dared to be explicit in questioning the competence of replicators who fail to confirm original findings. Probably the most notorious working paper by social cognitive neuroscientist Jason Mitchell entitled “On the evidentiary emptiness of failed replications“. Here’s a taste:
Because experiments can be undermined by a vast number of practical mistakes,the likeliest explanation for any failed replication will always be that the replicator bungled something along the way. Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.
You have to read the whole piece to appreciate just how pithily Mitchell demonstrates his basic lack of understanding of science. For one, he doesn’t seem to know about positive and negative controls, time-honored tools that help rule out bungling for both expected and un-expected findings. Second, Mitchell doesn’t seem to understand that there are an equally vast number of practical mistakes that can lead to a “positive” finding (remember that loose USB cord and faster-than-light particles?). If we have different standards of evidence for data we like vs. data we do not…we’re not actually doing science.
I’ve mulled Mitchell’s missive quite a bit. It’s very wrong, but wrong in a way that forced me to think deeply about how science protects itself from bunglers. His work is part of the reason I brought positive controls from my neuroscience lab into my psychology research. All of my recent work has included positive control studies to verify that my students and I can observe expected effects (see Cusack et al., 2015; Moery & Calin-Jageman, 2016; Sanchez et al., 2017). So, thanks to Mitchell, we have repeatedly demonstrated that we’re not bunglers.
Where’s the news? Well there is a new pre-print out by Protzko & Schooler that examines if there is any connection between researcher competence and replication success. Specifically, the researchers examine 4 registered replication reports (RRR), where different labs from around the world all conduct the same replication protocol. This is a great data set because the participating labs are all quite different–quite notably in prior research experience and impact. Therefore, Protzko & Schooler examined if a lab’s research skill is related to replication success. As a rough measure of research skill, each PI’s h index was used–not perfect, but surely a researcher with a high impact is more experienced than one who has not been published and/or cited before. The analysis reveals no consistent relationship between research impact and replication success–not in any of the RRRs individually nor overall. There is enough data that the all but the very weakest of relationships can be rules out. Full disclosure: one of the RRRs was the replication of the facial-feedback hypothesis (Wagenmakers et al., 2016); I was part of a group at Dominican that contributed to this project (I must have been one of the dots on the far left of the impact scale for that analysis).
So… on both empirical and logical grounds it seems that science is safe from bunglers. Failed replications are disappointing, but we should probably be less quick to judge the replicator and more willing to thoughtfully suss out more substantive reasons for discrepant findings.