Replication: ‘Psychological Science’ does the right thing

I have been enjoying Bob’s series of posts about replication. (Go to our home page and scroll down to see links and a few lines of text about each of the 5 posts, with title starting ‘Adventures in Replication’.) Actually, “enjoying” isn’t quite the right word. “Appreciating”, perhaps, and “sharing the frustration”?

Bob tells of the replicator’s frustrations when a journal just doesn’t want to know, or a referee or editor just doesn’t seem to get it. Science relies on replication to decide what to really trust?? Not always in psychology it seems.

Anyway, now for some good news. Psychological Science, which The Association for Psychological Science describes as its ‘flagship’ journal, recently announced a new category of article that it would publish: Preregistered Direct Replications (PDRs).

Editor-in-chief, Steve Lindsay, writes that “PDR articles report high-quality, preregistered, direct replications of studies published in Psychological Science.”

He also writes that “One of the motivations for adding PDRs to Psychological Science is the belief that a journal is responsible for the works it publishes (as per Sanjay Srivastava’s,
2012, “Pottery Barn rule” blog post). (The pottery barn rule says that ‘You break it, then you own it and you pay for it’. We don’t have pottery barns in Australia, not by that name anyway, but I can guess.)

Preregistration is required for PDRs, and submission for review when the study is at the proposal stage is strongly encouraged, although not (yet) required. Of course, review at that early stage can lead to improvements to the proposal, and acceptance of the proposal gives assurance of publication, subject only to careful implementation of the preregistered plan, and full disclosure.

The advent of PDRs is just the latest step taken by Psychological Science, which has been a pioneer in the encouragement of Open Science. Recall the introduction by previous editor-in-chief, Eric Eich, of new disclosure requirements and OS badges, and the encouragement of use of the new statistics rather than NHST–supported by publication of my tutorial article.

So, consider encouraging your groups of students seeking worthwhile research projects to scour Psychological Science looking for interesting findings that deserve replication. Or consider doing a PDR yourself. You may find something surprising, but even if you don’t you can make a valuable contribution to science.

And here’s a challenge: Can you sniff out a published effect that is reported as statistically significant, or even highly statistically significant, but for which a larger replication finds evidence that the effect is tiny or maybe zero? Bob may be the reigning world champion at doing that: He and his students are, I believe, currently running at about 9 out their 10 replication projects yielding zero or tiny estimates of effects.

Geoff

Posted in Open Science, Replication, The New Statistics

Adventures in Replication – Reviewers don’t want to believe disappointing replication results

Trying to publish replication results is difficult.  Even when the original evidence is very weak or uncertain, reviewers tend to look for reasons to explain away a smaller effect in the replication.  If nothing comes to mind, reviewers may even make something up.  Here’s an actual review I received:

Depending upon where your participants are from, it is quite possible that cultural differences could explain some of the differences between your results and those by B&E.  If indeed they hail from the Dominican Republic, as you state on page 4, then a viable hypothesis might be that power primes have less of an effect (on performance, or perhaps in general) on individuals with more interdependent self-construals than on individuals with more independent self-construals.

What the manuscript had actually stated is that the students were from Dominican University, my home university in River Forest, Illinois.  That’s not in the Dominican Republic.  The reviewer’s major concern over the replication was based on assuming we were in a different country and then not bothering to check.

When I pointed out the error to the editor, they assured me that the reviewer’s misunderstanding did not substantively influence the decision to reject the manuscript.

Of course, reviewers can make mistakes.  But in the reviews I’ve collected this type of thing seems surprisingly common, and the mistakes all seem to lean towards the reviewer finding reasons to discount the research.  Coincidence or an example of motivated reasoning?

Here are a couple of other examples:

I would like to see confidence intervals reported for all statistics (including manipulation checks), not only sometimes. Also, the authors do not report effect sizes.
This was in a manuscript that featured effect sizes and confidence intervals for all variables, including manipulation checks.  It’s never been clear to me what paper the reviewer was discussing.  I asked the editor, and they said that this was a ‘presentational issue’ and that this issue was not critical to rejection of the manuscript.  What might have been critical, though, was that the reviewer had paid so little attention to the manuscript that they didn’t realize it was chock full of effect sizes and confidence intervals (even in the abstract).
And then,
I think replication is extremely important to our field but I also think it's important that we use our limited resources (and journal pages) for replication studies of greater importance than this.
This was for a paper submitted as an online-only replication report; the data was already collected.  So resources had already been expended, no space in the printed journal would be wasted.
And then,
The two weakest studies use a novel DV that has not been used in the literature (at least the authors provide no citations for this measure).
The DV the reviewers described was mirror-tracing, a measure that is very well-established in the literature.  The manuscript included several references.  The manuscript also showed that the DV showed expected relationships with gender and age and provided citations to the original studies documenting these relationships.  What’s especially funny is that this comment would apply to the original research, which used two novel measures of motor skill without any evidence of reliability and validity.
There are more, but this gives the flavor.  Publishing replication work is a truly uphill battle.
Posted in NHST, Replication, The New Statistics

Beyond p values – Dispatches from the ASA symposium on statistical inference

The next couple of posts will be about my experience at the ASA conference on statistical inference: A World Beyond p < .05.

The first session featured Steve Goodman and John Ioannidis (who Skyped in from Australia).  One highlight was Goodman’s explanation of why p values continue to be so prevalent.  He argued that p values are like a currency–we can trade them in for useful things (grants, papers, promotions).  Their value lies primarily in our common belief in them, more than from their specific mathematical underpinnings.  Good analysis.

 

Posted in NHST, Teaching, The New Statistics

Brain Stimulation – Can we trust the empirical record?

Brain stimulation research has been exploding in neuroscience.  First came the rapid adoption of Transcranial Magnetic Stimulation (TMS), a technique in which powerful magnetic fields are used to create inductive currents within the skull.  More recently, Direct Current Stimulation (DCS) has burst onto the scene, a technique where current is simply pushed through the skull (it’s not much more sophisticated than strapping a small battery to your head).  These techniques have launched literally thousands of studies, as researchers have been drawn by the allure of treating mental disorders by cheaply tweaking brain function.  Best of all, these techniques offer the promise of personalized medicine, as the stimulation locations, magnitudes, and frequencies can be adjusted for each patient to obtain the best results.

Two recently published papers throw some cold water on TMS   (Héroux, Loo, Taylor, & Gandevia, 2017) and DCS research (Héroux, Taylor, & Gandevia, 2015) .  Both survey researchers in their field, asking about their success replicating published results and their perceptions that others in their field use Questionable Research Practices.

  • Researchers reported that they frequently used previous sample sizes to determine their own sample sizes, a practice that is problematic given that sample sizes in brain stimulation research are known to be too small.
  • For TMS, only 20% reported using power analysis to determine sample sizes.  For DCS, 60% reported that they sometimes do this, but a random sample of 100 papers found only 6% mentioned power analysis for sample-size determination.
  • Depending on the protocol, about 30-50% of respondents reported being unable to replicate previously published findings.  Most, though, had chosen to give up on the protocol rather than publish negative results.
  • Although relatively few researchers admitted to engaging in questionable research practices themselves, many reported that they believed others in the field were under-reporting, p-hacking, and the like.

I must admit, that reading these papers is a bit difficult: the exact wording of the surveys is not made entirely clear, and the way results are reported is sometimes very confusing (it can be tough to tell if percentages given are per respondant, per paper, per respondent per technique, etc.).  Still, the overall message seems quite clear: researchers within the field can have trouble reproducing findings and perceive others as using questionable research practices.  Thus, the fairly sunny literature on these techniques may be highly misleading, as the doubts and failed replications don’t seem to be making it into print.  The team that conducted the survey points out that their data helps explain how so many papers on these techniques can show statistically significant results despite very low power–probably because they are just the tip of the iceburg.

It’s a shame to type this up.  I remember rTMS exploding onto the scene when I was in grad school, and the equal excitement when DCS posters started showing up at conferences.  The public is hungry for remedies, and there are already tons of clinics offering these therapies, many of them operating outside the U.S.  It seems likely that the literature on brain stimulation is hopelessly biased, and that bias is offering the scientific veneer needed for these clinics to fleece many desperate patients.  We can and should do better.

References

Héroux, M. E., Loo, C. K., Taylor, J. L., & Gandevia, S. C. (2017). Questionable science and reproducibility in electrical brain stimulation research. PLOS ONE, 12(4), e0175635. https://doi.org/10.1371/journal.pone.0175635
Héroux, M. E., Taylor, J. L., & Gandevia, S. C. (2015). The Use and Abuse of Transcranial Magnetic Stimulation to Modulate Corticospinal Excitability in Humans. PLOS ONE, 10(12), e0144151. https://doi.org/10.1371/journal.pone.0144151
Posted in Applied research, Open Science, Replication

Enthusiasm for teaching and learning

It’s a joy to be with faculty who are deeply enthusiastic about teaching and about student learning. I’m just back from AusPLAT (Australian Psychology Learning and Teaching), the first Australian conference on learning and teaching, under the auspices of the Australian Psychological Society. Only about 60-70 folks participated, but the focus and enthusiasm was great. Also, having just survived a frostier than usual winter in Victoria, it was nice to be up north in Ipswich, Queensland, in clear sunshine.

I had been given the title ‘The Joy of Stats’ for may talk, and I was very happy with this. It’s a title stolen from a fabulous BBC documentary. I focused on what in my experience is one of the really great things about taking an estimation and Open Science approach from the very start–from class 1 with beginning undergraduates: It all simply makes sense, and that gives pleasure.

I’d love to have more empirical evaluation of the curriculum followed in ITNS–surely that will come. In the meantime, I can enjoy my classroom experience that students say more positive things and report much less confusion and negative emotion than in the old days of hacking through the weird logic of NHST. In my experience, the new ways can indeed bring joy!

The slides for my talk are here. I’m always looking for pithy and memorable ways to make key points. Here’s one of the last slides, with two messages that are not new, but perhaps make the points with a bit of zip.

The first expresses what seems to me the bottom-line logic that compels choice of The New Statistics. The second tries to illustrate that a CI gives a much different, and more justifiable, message than the equivalent p value. Yes, researchers should develop the habit of mentally converting any p value (and point estimate) they see into the approximate equivalent CI. Woe, a long interval signals much uncertainty, but that’s the truth!

I salute all faculty who are enthusiastic about their teaching, and dedicated to doing it better, always better…
Geoff

Posted in Uncategorized

Adventures in Replication: p values and Illusions of Incompatibility

Here’s an idea I run into a lot in peer reviews of replication studies:

If the original study found p < .05 but the replication found p > .05, then the results are incompatible and additional research is needed to explain the difference.

Poor p values.  I’m sure they want to be able to tell us when results are incompatible, but they just can’t.  The little beasts are too erratic (Cumming, 2008).  And just because two p values live on opposite sides of alpha doesn’t mean the results that brought them into this world are notably different (Gelman & Stern, 2006) .   It’s seductive, but if you compare p values to check consistency, you will end with illusions of incompatibility.

Here’s an example which I hope makes the issue very clear.  The original study found p < .05.  Three subsequent replications found p > .05.  Comparing statistical significance (falsely) suggests the results are  incompatible and that we need to start thinking about what caused the difference.  Right?  Wrong, as you can see in this forest plot:

The original study isn’t notably incompatible with the replication results because the original study is incredibly uninformative.  Even though p < .05 (the CI does exclude the null hypothesis of 0), the CI suggests anywhere from a very large effect down to an incredibly small effect.  The replication results all suggest an incredibly small effect.  That’s no contradiction, just disappointment!  Trying to “explain the difference” is a fool’s errand–the differences in results are not clearly more than would be expected from sampling error.

Unfortunately, most of the reviewers treat differences in statistical significance as a reliable indication of incompatibility.  Here’s an extreme example:

First, I would like to understand how one can continue to assume that a replication experiment that leads to significantly different results can still be called a “direct” or “precise” replication in all respects. If you really believe in the informative value of significance tests – which is obviously at the heart of such replication work – should you then not resort to the alternative assumption that original and replication experiments must have been not  equivalent? Was this a miracle, or is it simply an index of changing (though perhaps subtle) boundary conditions?

Want your head to explode?  The forest plot above presents (most of) the data the reviewer was writing about!  The reviewer is so ardently misled by p values that they believe there simply must be an substantive difference between the sets of studies.  This is really amazing because this set of studies is the most precisely direct set of replications I’ve been able to complete.  The original studies were all done online, and I was able to obtain the same exact survey and use it with the same exact population within 1 year of the original studies being done.  So strong is this reviewer’s confidence in the ability of p values to do what they cannot, that they wag the whole dog by the tail.

Some notes:

  • I’m not hating on this reviewer.  I’d have made the same mistake 5 years ago… maybe even 2 years ago.  It takes a methodological awakening.  I wish, though, that prominent journals that pretend to welcome replications would select reviewers with just a bit less over-confidence in p values.
  • I didn’t include the forest plot in the paper, just tables with CIs.  Maybe the figure would have helped.
  • There’s a bit more to this paper.  I replicated 4 different experimental protocols (and 2 completely worked!).  This reviewer was writing about the 2 protocols where replications showed essentially no effect.  So it was really 2 forest plots, but both with the same pattern.
  • Yes, the reviewer apparently thinks that 2 significant results (with 2 different protocols) not replicating would require a miracle.

References

Cumming, G. (2008). Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better. Perspectives on Psychological Science, 3(4), 286–300. https://doi.org/10.1111/j.1745-6924.2008.00079.x
Gelman, A., & Stern, H. (2006). The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant. The American Statistician, 60(4), 328–331. https://doi.org/10.1198/000313006×152649
Posted in ITNS, NHST, Replication, Statistical graphics, The New Statistics

Something a bit different: maintaining memories

I’m wondering off the topic of the New Statistics today just to mention that my lab has published a new paper that characterizes the the changes in gene expression that accompany storing and maintaining a new long-term memory  (Conte et al., 2017) .  This is my main research passion, and this paper is a big milestone for my lab, as we push to develop a comprehensive understanding of how transcription supports the encoding, maintenance, and subsequent decay of long-term memory.

There is a bit of a New Stats angle here.  For one, we of course used the estimation approach throughout (though, yes, we also reported p values).  More importantly, though, because of the New Statistics we have really improved our entire scientific process: we optimized our protocol to maximize effect size, ran much larger studies to obtain more informative estimates, sought direct replication in independent samples, pre-registered our sample-size and analysis plans, and shared all our data and processing scripts.  It is seeing what a big difference these practices have made within my own lab that make me believe so strongly in spreading the New Stats and Open Science more broadly across the life and behavioral sciences.  Onward!

References

Conte, C., Herdegen, S., Kamal, S., Patel, J., Patel, U., Perez, L., … Calin-Jageman, I. (2017). Transcriptional correlates of memory maintenance following long-term sensitization of Aplysia californica. Learning and Memory, 24, 502–515. https://doi.org/10.1101/lm.045450.117 [Source]
Posted in Open Science, Uncategorized

Adventures in Replication: Your replication appears to be somewhat underpowered

Many journals now proclaim their openness to replication research.  Behind the scenes, though, replication manuscripts are often met with impossible demands and/or insane double-standards.

Here’s an example from an editor at a prominent social psychology journal:

the studies appear to be somewhat underpowered. This is (as reviewer 1 notes) because you estimated sample sizes from a power analysis based on the (very likely overestimated) d from the B & E study.”

Here’s why this makes my head explode:

  • The original study had 36 participants (18/group).
  • We conducted 5 replications that encompassed almost 700 participants!  (Two studies had only 2x the original sample size, but all the the others had > 2.5x)
  • The reviewers (and the editors) suggest that our power analysis should have assumed the original study was very wrong!

It’s been 2 years but I still sometimes wake up in the middle of the night thinking about this.  I sit bolt upright and bellow out incredulous ripostes into the darkness: “So now you care about sample size!!” or “YOUR UNDERSTANDING OF POWER IS SOMEWHAT UNDERPOWERED!”.  My wife, G-d bless her, has become used to this and has not sought to have me committed (that I know of).

My occasional night rage is mellow compared to how I felt when the review first came back.  I probably didn’t handle it perfectly.  I bashed out a very snarky reply and hit send.  Here are the highlights:

It is fortunate that computers were designed to process data without regard for veracity or your email would have broken the internet….I’ve published enough to know that this review is an outlier on the bananas scale

I guess I got what I was looking for, because the editor did send back a very irritated response:

I see no need for any further correspondence about this decision.  If you feel that you must reply, please refrain from inflammatory, ad hominem comments such as the formulation in your email that indirectly questions my veracity, or the characterization of the letter as “bananas.”  Such comments have absolutely no place in professional discourse.

This tells me that people confused about power are also confused about what an ad hominem attack is.

I’m sure things have changed for the better… right?  We did, at least, find a home for our somewhat underpowered replication results at PLOS ONE (Cusack, Vezenkova, Gottschalk, & Calin-Jageman, 2015).

References

Cusack, M., Vezenkova, N., Gottschalk, C., & Calin-Jageman, R. J. (2015). Direct and Conceptual Replications of Burgmer & Englich (2012): Power May Have Little to No Effect on Motor Performance. PLOS ONE, 10(11), e0140806. https://doi.org/10.1371/journal.pone.0140806
Posted in Replication, The New Statistics

The joy of many disciplines

One of the great things about working in psychology, or statistics, or–just imagine!– both, is that you can get to play in the backyards of many other folks. As science becomes more and more fragmented, and many researchers feel that their best strategy is to aim for expertise in some highly specialised sub-field, it’s worth taking a moment to enjoy cross- or multi-disciplinary research.

I’ve been lucky enough, over the decades, to publish journal articles with colleagues in computer science, linguistics, education, cell biology, philosophy, ecology, artificial intelligence, statistics, history & philosophy of science, health sciences, and maybe one or two other disciplines that don’t spring to mind right now.

In UTNS, my first book about the new statistics, I included boxed examples from numerous disciplines to illustrate the main argument that anyone using NHST, in whatever discipline, should think hard about making the change to TNS. Maybe as a result I still get emails–almost always positive, and often highly enthusiastic–from folks from a stunning breadth of backgrounds. Recent examples include someone working on assessment of risk in financial markets and another in a ministry for justice.

In ITNS, our introductory book, we mainly focus on psychology–including numerous sub-fields–and education, but we’ve included some examples from wider afield. We certainly believe that our arguments and methods are widely applicable across many disciplines. So we’re delighted when teachers in other disciplines express interest in ITNS.

My most recent example is archaeology. I gave an invited talk to the Archaeology Department at La Trobe University. My slides for the talk are here.

It was a lucky fluke that I had recently posted about archaeology and Open Science, so I could include some archaeology examples. I’m happy to say that the response was highly positive. I was asked specifically about chi-square, so could immediately open the Two proportions page of ESCI intro chapters 10-16, plug in the numbers for an archaeology example, then display the chi-square analysis and our recommended better way based on proportions.

Bob and I dearly hope that ITNS will be of use to folks in lots of disciplines. We love hearing from anyone interested in using ITNS, and particularly so from teachers or researchers outside psychology.

Geoff

Posted in ITNS, Teaching

Adventures in Replication: Scientific journals are not scientific

The essence of science is seeking and weighing evidence on both sides of a proposition.  One might think, then, that when a scientific journal publishes a research paper it then acquires a special interest in publishing subsequent replications or commentary on that topic.   We might call this the “principle of eating your own dog food”.  Or maybe the “you published it, you own it” policy.  Or perhaps just the “scientific journals should publish science” doctrine.

If only, if only.  Looking at the scars I’ve incurred dragging 6 replication papers across the publication line, my conclusion is that most journals reject these notions in practice if not in policy.

The main tactic for avoiding the publication of embarrassing replication results is the old saw of interest and importance.  Apparently, interest and importance have a very strange life cycle for an editor. It is at its peak when a new paper on a topic is submitted with p < 0.05.  This high level of interest and importance leads to publication, press releases, and breathy popular-press coverage.  A few months later, though, when a replication result comes in indicating that perhaps the original result is optomistic, the life cycle of interest and importance reaches a sudden and dramatic end.  Editors and reviewers now find the topic sooooo boring and trivial; they cannot fathom why their readers would want to continue to read about the topic.  After all, who could possibly find scientific value in learning that a previous result was unreliable?  Who would their readers be, a bunch of scientists?

The extreme double-standard of what counts as interesting from original to replication manuscript means that at least some journals are acting simply as printed monuments to confirmation bias.  They are no more scientific or self-correcting than Vogue or GQ (that may be unfair to Vogue and GQ–for all I know they may be more welcoming of contrary viewpoints and data).

I’m not enjoying being so cynical, but looking through the reviews I’ve collected from my replication work, the pattern is pretty clear.  For each replication I’ve conducted I have first submitted to the journal that originally published the research (except for 2 studies completed for specific journals).

So far, only 1 replication paper I’ve submitted has made it past the ‘interest’ bar at the original journal–that was at Social Psychology and Personality Science (kudos to SPPS, though see below).

Below are some examples of editor’s comments I’ve received.  In each case, the comments come from the journal that originally published the research.  In each case, the rejection is of a replication manuscript that reports multiple, high-powered, pre-registered replications, almost always with the exact materials as the original.  In most cases, the replications also included positive controls to prove researcher competence, and varied conditions and/or participant pools to ensure the finding of little-to-no effect was robust across multiple conditions.  In other words, these replications as close to air-tight as humanly possible.  Of course, there is no such thing as a perfect replication–but if your epistemic standards are so high as to find these efforts unacceptable, well then you likely are too skeptical to be sure you’re even reading this blog (hail to the Evil Genius!).

Here are some highlights in my adventures in having replications be rejected from the journals that published the original paper:

  • Journal of Experimental Social Psychology.  I conducted replications of this study (Price, Ottati, Wilson, & Kim, 2015) which originally showed that manipulations of task difficulty produce large changes in open-mindedness.  The original study was covered extensively in the popular press.  The replications showed little-to-no effect of task difficulty manipulations (though other aspects of the original research did replicate quite well).  The editor listed importance as the first criterion for rejection:
    • “the case for why this is an important replication effort from a scientific perspective is much less clear”
    • What am I missing?  if the original study was scientifically important enough for JESP, isn’t the fact that key experiments are unreliable equally scientifically important?
    • Just received this rejection (submitted in April, rejected in August on the basis of a review by the original author (who recommended publication,) and by 1 additional reviewer (who recommended rejection).  I appealed for a third review, and that is now pending).
  • Science.  Working with a student and collaborators at two other institutions, I conducted replications of this study (Gervais & Norenzayan, 2012) which originally showed that manipulations of analytic thinking decrease religious belief.  The replications across multiple sites of one study showed little-to-no-effect and in the meantime additional studies showed the manipulations used in the original research have no validity.  We submitted a 300-word note on the replication results to Science.  The submission was not reviewed.  The editor, Gilbert Chin, wrote back this form letter:
    • “Because your manuscript was not given a high priority rating during the initial screening process, we have decided not to proceed to in-depth review. The overall view is that the scope and focus of your paper make it more appropriate for a more specialized journal.”
    • I wrote back “I guess science as whole is self-correcting, but Science the journal is not.”
    • No response
    • The paper was eventually published in PLOS One (Sanchez, Sundermeier, Gray, & Calin-Jageman, 2017)after also being rejected from Psychological Science and Social Psychology without review.
  • Social Psychology and Personality Science – Working with a team of students I conducted replications of this paper (Burgmer & Englich, 2012) which reported studies showing that feelings of power produce large increases in motor skill.  We submitted a set of replications to the SPPS–each showing little-to-no effect despite strong manipulation check data.  The paper editor, Gerben van Kleef, cited importance/theoretical contribution as the primary reason the paper could not be accepted:
    • “What was actually most critical in my decision (and perhaps I did not make this sufficiently clear in my letter) is the requirement that papers published in SPPS should make a compelling theoretical and empirical contribution to the literature. Reporting evidence suggesting that a particular effect may be difficult to replicate or may be weaker than earlier studies suggested, even if demonstrated beyond doubt, is only half of the story. Something new must be added subsequently. (Note that I am referring just to SPPS policies now. Other journals may hold different standards.)”
    • This response is somewhat about interest/importance, but it also shades into another criterion that insulates journals from publishing replications of previous papers–the demand to go beyond and demonstrate some new and interesting theoretical development.  That’s a great criterion for a new paper.  But if you are replicating a study and find that the data presented are unreliable, there’s nothing there to go beyond.  Somehow, showing a theory to be erroneous is not a theoretical contribution.
    • This was only the second replication project I had worked on.  Some of the reviewer comments were really lazy (one complained that I hadn’t included CIs or effect sizes in the paper..which was the main thing we had reported!).  But there were also some good suggestions, and I ended up extending the series of replications further, though still finding little to no effect.
    • The paper was then rejected from JPSP with a set of reviews that were absolutely bananas.  More on that later.
    • It finally found a home in PLOS One (Cusack, Vezenkova, Gottschalk, & Calin-Jageman, 2015), the last refuge of the replicator.
    • I didn’t give up on JPSP and sent another replication in about a year later (Moery & Calin-Jageman, 2016).  This second replication also found little-to-no effect, but the editors were quite clear that they felt an obligation to consider publication of a paper challenging a previous JPSP manuscript.  Kudos!

References

Burgmer, P., & Englich, B. (2012). Bullseye! Social Psychological and Personality Science, 4(2), 224–232. https://doi.org/10.1177/1948550612452014
Cusack, M., Vezenkova, N., Gottschalk, C., & Calin-Jageman, R. J. (2015). Direct and Conceptual Replications of Burgmer & Englich (2012): Power May Have Little to No Effect on Motor Performance. PLOS ONE, 10(11), e0140806. https://doi.org/10.1371/journal.pone.0140806
Gervais, W. M., & Norenzayan, A. (2012). Analytic Thinking Promotes Religious Disbelief. Science, 336(6080), 493–496. https://doi.org/10.1126/science.1215647
Moery, E., & Calin-Jageman, R. J. (2016). Direct and Conceptual Replications of Eskine (2013). Social Psychological and Personality Science, 7(4), 312–319. https://doi.org/10.1177/1948550616639649
Price, E., Ottati, V., Wilson, C., & Kim, S. (2015). Open-Minded Cognition. Personality and Social Psychology Bulletin, 41(11), 1488–1504. https://doi.org/10.1177/0146167215600528
Sanchez, C., Sundermeier, B., Gray, K., & Calin-Jageman, R. J. (2017). Direct replication of Gervais & Norenzayan (2012): No evidence that analytic thinking decreases religious belief. PLOS ONE, 12(2), e0172636. https://doi.org/10.1371/journal.pone.0172636
Posted in Replication