Confirmatory Research – A special issue of JESP

Catching up a bit, but in November of 2016 the Journal of Experimental Social Psychology published a special issue dedicated just to confirmatory research.

The whole issue is well-worth reading:

  • There is  an excellent guide to pre-registration (ostensibly for social psychologists, but really for anyone).   (van ’t Veer & Giner-Sorolla, 2016)
  • Lots of interesting pre-registered studies, like this one  (McCarthy, Coley, Wagner, Zengel, & Basham, 2016)
  • Many Labs 3 is published, which is completely fascinating.  (Ebersole et al., 2016)
  • And some fascinating commentaries, including this one about using MTurk samples.  (DeVoe & House, 2016)


DeVoe, S. E., & House, J. (2016, November). Replications with MTurkers who are naïve versus experienced with academic studies: A comment on Connors, Khamitov, Moroz, Campbell, and Henderson (2015). Journal of Experimental Social Psychology. Elsevier BV.
Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., … Nosek, B. A. (2016, November). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology. Elsevier BV.
McCarthy, R. J., Coley, S. L., Wagner, M. F., Zengel, B., & Basham, A. (2016, November). Does playing video games with violent content temporarily increase aggressive inclinations? A pre-registered experimental study. Journal of Experimental Social Psychology. Elsevier BV.
van ’t Veer, A. E., & Giner-Sorolla, R. (2016, November). Pre-registration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology. Elsevier BV.
Posted in Open Science, Replication

A cool new journal is Open

APS (The Association for Psychological Science) recently launched its sixth journal: Advances in Methods and Practices in Psychological Science. A dreadful mouthful of a title–why not drop ‘Advances in’ for a start–but it looks highly promising. Maybe it will become known as AIMPIPS?

The foundation editor is Dan Simons, perhaps most widely known for the invisible gorilla, although he’s done lots besides.

The announcement of the new journal is here and the main site for the journal is here.

It’s worth browsing the Submission Guidelines, especially the ‘General Journal Information’ and ‘Statistics’ sections. Also the stuff about ‘Registered Replication Reports’ and ‘Registered Reports’–peer review before data collection, gasp! Open Science is in just about every sentence. Such guidelines could hardly have been dreamt of a few years ago. Here’s wishing it every success in helping make research better–and more open.


Posted in Open Science, Replication

Publishing unexpected results as a moral obligation for scientists


A revised European Code of Conduct for Research Integrity now specifically calls on researchers and publishers to not bury negative results.  Specifically, the guidelines formulate this principle for publication and dissemination:

  • Authors and publishers consider negative results to be as valid as positive findings for publication and dissemination.

My only complaint is this continued red herring of ‘positive’ and ‘negative’ results.  There are results.  We should explore if the results are reliable and valid, but your particular liking of the result shouldn’t enter into the equation.  Still, a big step forward.

I found out about this through this news blurb:

Posted in Open Science

What the datasaurus tells us: Data pictures are cool

In various places in ITNS, especially Chapter 11 (Correlation) we discuss how important it is to make good pictures of data, to reveal what’s really going on. Calculating a few summary statistics–or even CIs–often just doesn’t do the job.
Many statistics textbooks use Anscombe’s Quartet to make the point: the 4 scatterplots below, all of which have the same (or very close to the same) mean and SD of both X and Y, and also correlation between X and Y.

Now some clever folks have worked out how to generate an unlimited number of datasets all with these same summary statistics (or very close) but weirdly different shapes and patterns. One of the pics is the dotted outline of the datasaurus, below. Click here to see a dozen pics cycle through. (At that site, click on the changing pic to go to the paper that describes what’s in the engine room.)

We may never trust a Pearson r value again, and that may not be a terrible thing!
P.S. Thanks to Francis S. Gilbert for the heads up.

Posted in Statistical graphics

Now for some good news: SIPS

The Society for the Improvement of Psychological Science (SIPS) held its first meeting last year, with around 100 good folks attending. Working groups have been–would you believe–working hard since then. The second meeting is 30 July to 1 August, in Charlottesville VA. More than 250 are expected to participate.

Today’s announcement:
“The program for the SIPS 2017 conference is up!

“We have lots of great workshops, hackathons, and unconference sessions in the works. Don’t know what that means? Don’t worry, we’re not sure either. Come join us and help us create it!

“There’s still time to register – click here to start!

“-The SIPS program committee
“Alexa Tullett, John Sakaluk, Michèle Nuijten, and Brian Nosek”

It’s so exciting to see so many people working on ways that things can be done better, especially by building tools to make the new ways actually easier.

Posted in Open Science, Stats tools

Methodological awakening: backlash against the backlash

Science has had some rough times lately, no doubt.  No need to rehearse the many findings indicating that we have some problems that need a fixing.  As Will Gervais put it, we’re in the midst of a “methodological awakening”.  The good news is this is normal.  Science is always self-critical, searching, contested.  Techniques are always being re-vamped, refined, improved.  Eternal crisis–that’s the scientific way.

Predictably, though, some researchers worry that the tumult will tarnish the golden image of science.  There are already lots of political forces eager to discount scientific findings, to dismiss conclusions contrary to their ideology or bottom line.  Maybe science reformers are adding fuel to this fire, providing unwitting support to those thrive in the post-truth era.  If that’s the case, perhaps science reform should be done out of the public’s eye; in more restrained terms.

Blech.  I have no patience for this arse-covering backlash against scientific reform.  First, it is pointless–you’re not going to stop or slow down mendacity by being regulating scientific discourse.  Second, you can’t defend science by corrupting it.  Science is trustworthy because it is constantly reforming, improving, putting cherished ideas to the test.  Science stands apart from politics in not worrying about its image; in attending to the truth, the whole truth, and nothing but the truth.  Those who seek to preserve the image of science by policing it will ruin both science and its image.

Unfortunately, a lot of smart people disagree with me on this one.  At the Sackler convention on reproducibility I attend in March I was shocked that even some of the conference organizers were eager to minimize the problems and warn us against pushing for reform in a way that might damage the reputation of science.  Predictably, this seems to have become one of the big take-aways of the conference.  Specifically, here’s a piece from the Atlantic summarizing some of the conference from the perspective that the ‘reproducibility crisis’ is soon to become an arrow in the quiver of the anti-science political factions:

So– the internal backlash against reform is alive and potentially growing.  Hopefully some reflection will quell this line of thinking.  We shouldn’t let the forces arrayed against science panic us into betraying the core principles we’re committed to.  Science = constant reform, and let the chips fall where they may.

Posted in Open Science, Replication

Replication problems are not competence problems

Why do some replication studies fail to produce the expected results?  There are lots of possible reasons: the expectation might have been poorly founded, the replication study could have been under-powered, there could be some unknown moderator, etc.  Sure, but let’s be real for a moment.  We all know that one explanation that will surely leap to mind is that the replicators screwed up, that they were not competent or capable enough to obtain the same results as the original researchers.

This worry over competence is often unspoken, but a few have dared to be explicit in questioning the competence of replicators who fail to confirm original findings.  Probably the most notorious working paper by social cognitive neuroscientist Jason Mitchell entitled “On the evidentiary emptiness of failed replications“.  Here’s a taste:

Because experiments can be undermined by a vast number of practical mistakes,the likeliest explanation for any failed replication will always be that the replicator bungled something along the way. Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.


You have to read the whole piece to appreciate just how pithily Mitchell demonstrates his basic lack of understanding of science.  For one, he doesn’t seem to know about positive and negative controls, time-honored tools that help rule out bungling for both expected and un-expected findings.  Second, Mitchell doesn’t seem to understand that there are an equally vast number of practical mistakes that can lead to a “positive” finding (remember that loose USB cord and faster-than-light particles?).  If we have different standards of evidence for data we like vs. data we do not…we’re not actually doing science.

I’ve mulled Mitchell’s missive quite a bit.  It’s very wrong, but wrong in a way that forced me to think deeply about how science protects itself from bunglers.  His work is part of the reason I brought positive controls from my neuroscience lab into my psychology research.  All of my recent work has included positive control studies to verify that my students and I can observe expected effects (see Cusack et al., 2015; Moery & Calin-Jageman, 2016; Sanchez et al., 2017).  So, thanks to Mitchell, we have repeatedly demonstrated that we’re not bunglers.

Where’s the news?  Well there is a new pre-print out by Protzko & Schooler that examines if there is any connection between researcher competence and replication success.  Specifically, the researchers examine 4 registered replication reports (RRR), where different labs from around the world all conduct the same replication protocol.  This is a great data set because the participating labs are all quite different–quite notably in prior research experience and impact.  Therefore, Protzko & Schooler examined if a lab’s research skill is related to replication success.  As a rough measure of research skill, each PI’s h index was used–not perfect, but surely a researcher with a high impact is more experienced than one who has not been published and/or cited before.  The analysis reveals no consistent relationship between research impact and replication success–not in any of the RRRs individually nor overall.  There is enough data that the all but the very weakest of relationships can be rules out.   Full disclosure: one of the RRRs was the replication of the facial-feedback hypothesis (Wagenmakers et al., 2016); I was part of a group at Dominican that contributed to this project (I must have been one of the dots on the far left of the impact scale for that analysis).

So… on both empirical and logical grounds it seems that science is safe from bunglers.  Failed replications are disappointing, but we should probably be less quick to judge the replicator and more willing to thoughtfully suss out more substantive reasons for discrepant findings.

Posted in Replication

Don’t fool yourself: Facilitated Communication continues to be a cautionary tale

When I (Bob) was an undergrad, I took methods/stats in the psychology department.  I wasn’t a psych major, but I wanted to take a class on brain and behavior, and I was told I had to take methods/stats first.  At the time, I had no plans on pursuing a career in science–I was majoring in philosophy and obsessed with computer programming.  I had no idea what a methods/stats class was even about.

In the first week of the class, the professor (William Hayes!) wheeled in a TV and VCR cart and popped in a tape (no–there were no projectors or computers in the classroom; we watched the tape on what was probably a 20-inch tube TV).  I had no idea my life was about to change.

The video we watched was an episode of Frontline called “Prisoners of Silence” providing an overview and expose on Facilitated Communication (FC).  If you want your life to be forever changed, too, you can watch the video here (and you won’t even need a VCR):

This documentary has stuck with me, and I now show my methods/stats students the film at the start of each semester.  It’s an incredible lesson in how easy it is to fool oneself, and how powerful simple scientific techniques can be to help dissolve self-delusion: all that is needed is the courage and wisdom to put our ideas to a rigorous test.  For me, this was my ah-ha moment of understanding what stats/methods is all about, and the beginning of my life-long journey to try hard not to fool myself.  I’m sure it hasn’t always been successful, but there it is.

This is all a long preamble to mentioning that Facilitated Communication is again back in the news.  And, incredibly, it seems to have fooled some prominent philosophers–folks who one would hope would be skilled at detecting the many problems with FC.   As usual, FC is surrounded by controversy and tragedy.  This summary, by excellent science journalist Daniel Engber at Slate is well worth reading:


Posted in Uncategorized

The persistence of NHST: “Wilfully stupid”?

I recently gave a research talk to Psychology at La Trobe, my old University–although I now live an hour out of the city and rarely visit the campus. I decided to turn things around from my previous few talks: Instead of starting with Open Science then discussing the move from NHST to the new statistics, I decided to start with the NHST story.

The really weird thing is that NHST and p values have for more than half a century been subject to cogent and severe criticism, which has received hardly any considered rebuttal, and yet NHST and p values simply persist. Really weird!

Here are my Powerpoint slides for the talk:
Cumming LTU Psychology colloquium 13 Apr 17

And here’s Slide 11 with a quote from each decade to illustrate how long-standing and out-spoken the criticism of NHST has been. Yes, cherry-picked quotes, but there are truckloads more, and hardly anyone has tried to seriously answer them.

Then I quoted from psychologist Michael Oakes (1986): “Psychologists… have, for the last 40 years or so, been almost wilfully stupid. What explanations can be offered for their failure to acknowledge, at a much earlier date, the cogency of these arguments [against NHST]?”

“Wilfully stupid”!? It seems that rational argument has not persuaded researchers to change. NHST is somehow addictive. Let’s hope things are different now, in the age of Open Science.

Perhaps in addition it’s worth trying another approach–that’s why I put forward the dance of the p values, and now significance roulette, as attempts to dramatize a particularly striking weakness of p values and NHST. Will they help researchers throw away the security blanket at last?

Posted in NHST, Replication, The New Statistics

Castles made of sand in the land of cancer research

Not all problems with scientific practice are statistical.  Sometimes, methods and protocols are introduced and accepted without sufficient vetting and quality control.  Hopefully this is rare, but in the biological sciences there is an ongoing worry that too many ‘accepted’ techniques might not be well founded.  Here is one striking example from the world of cancer research, where it turns out that a particular cell line thought to have been a useful tool for studying breast cancer is not really representative of breast cancer at all.  Ok, so mistakes happen–but sadly hundreds of papers have been published using this cell line, many were published after data were available suggesting the cell lines were not representative of breast cancer, and it is still taking surprisingly long for the problem to be widely recognized.  Fortunately, steps are being taken to make sure that cell line work included more careful checks, and the NIH is even requiring these plans be part of funding applications.

Here’s the story, in Slate by Richard Harris, an NPR correspondent and author of a new book about reproducibility in science:

Posted in Replication