Don’t fool yourself: Facilitated Communication continues to be a cautionary tail

When I (Bob) was an undergrad, I took methods/stats in the psychology department.  I wasn’t a psych major, but I wanted to take a class on brain and behavior, and I was told I had to take methods/stats first.  At the time, I had no plans on pursuing a career in science–I was majoring in philosophy and obsessed with computer programming.  I had no idea what a methods/stats class was even about.

In the first week of the class, the professor (William Hayes!) wheeled in a TV and VCR cart and popped in a tape (no–there were no projectors or computers in the classroom; we watched the tape on what was probably a 20-inch tube TV).  I had no idea my life was about to change.

The video we watched was an episode of Frontline called “Prisoners of Silence” providing an overview and expose on Facilitated Communication (FC).  If you want your life to be forever changed, too, you can watch the video here (and you won’t even need a VCR):

This documentary has stuck with me, and I now show my methods/stats students the film at the start of each semester.  It’s an incredible lesson in how easy it is to fool oneself, and how powerful simple scientific techniques can be to help dissolve self-delusion: all that is needed is the courage and wisdom to put our ideas to a rigorous test.  For me, this was my ah-ha moment of understanding what stats/methods is all about, and the beginning of my life-long journey to try hard not to fool myself.  I’m sure it hasn’t always been successful, but there it is.

This is all a long preamble to mentioning that Facilitated Communication is again back in the news.  And, incredibly, it seems to have fooled some prominent philosophers–folks who one would hope would be skilled at detecting the many problems with FC.   As usual, FC is surrounded by controversy and tragedy.  This summary, by excellent science journalist Daniel Engber at Slate is well worth reading:


Posted in Uncategorized

The persistence of NHST: “Wilfully stupid”?

I recently gave a research talk to Psychology at La Trobe, my old University–although I now live an hour out of the city and rarely visit the campus. I decided to turn things around from my previous few talks: Instead of starting with Open Science then discussing the move from NHST to the new statistics, I decided to start with the NHST story.

The really weird thing is that NHST and p values have for more than half a century been subject to cogent and severe criticism, which has received hardly any considered rebuttal, and yet NHST and p values simply persist. Really weird!

Here are my Powerpoint slides for the talk:
Cumming LTU Psychology colloquium 13 Apr 17

And here’s Slide 11 with a quote from each decade to illustrate how long-standing and out-spoken the criticism of NHST has been. Yes, cherry-picked quotes, but there are truckloads more, and hardly anyone has tried to seriously answer them.

Then I quoted from psychologist Michael Oakes (1986): “Psychologists… have, for the last 40 years or so, been almost wilfully stupid. What explanations can be offered for their failure to acknowledge, at a much earlier date, the cogency of these arguments [against NHST]?”

“Wilfully stupid”!? It seems that rational argument has not persuaded researchers to change. NHST is somehow addictive. Let’s hope things are different now, in the age of Open Science.

Perhaps in addition it’s worth trying another approach–that’s why I put forward the dance of the p values, and now significance roulette, as attempts to dramatize a particularly striking weakness of p values and NHST. Will they help researchers throw away the security blanket at last?

Posted in NHST, Replication, The New Statistics

Castles made of sand in the land of cancer research

Not all problems with scientific practice are statistical.  Sometimes, methods and protocols are introduced and accepted without sufficient vetting and quality control.  Hopefully this is rare, but in the biological sciences there is an ongoing worry that too many ‘accepted’ techniques might not be well founded.  Here is one striking example from the world of cancer research, where it turns out that a particular cell line thought to have been a useful tool for studying breast cancer is not really representative of breast cancer at all.  Ok, so mistakes happen–but sadly hundreds of papers have been published using this cell line, many were published after data were available suggesting the cell lines were not representative of breast cancer, and it is still taking surprisingly long for the problem to be widely recognized.  Fortunately, steps are being taken to make sure that cell line work included more careful checks, and the NIH is even requiring these plans be part of funding applications.

Here’s the story, in Slate by Richard Harris, an NPR correspondent and author of a new book about reproducibility in science:

Posted in Replication

p intervals: Replicate and p is likely to be *very* different!

The Significance Roulette videos (here and here) are based on the probability distribution of the p value, in various situations. There’s more to the second video than I mentioned in my recent post about it. The video pictures the distribution of replication p, which is the p value of a single replication experiment, following an initial experiment that gave some particular initial p value.

In the video I illustrated how that distribution can be used to find the p interval, meaning the 80% prediction interval for replication p, for some particular initial p value. (All p values are two-tailed.) Here are some example p intervals:

So, if you run an experiment that gives p = .05, then, if you replicate–everything just the same, but with new samples–then you have an 80% chance of getting p within the interval (.0002, .65) and there is fully a 10% chance that p is greater than .65 and 10% it is less than .0002. An amazingly long interval! Just about any p value is possible! And that’s all true, no matter what the N, the power, or the true effect size!

For an initial p of .001, (***) the interval ends are lower, as we’d expect, but the interval is once again very long. In addition, the chance of not even obtaining p < .05 (*) is fully 17%, about 1 in 6.

That’s all another way of seeing the enormous unreliability of p. The world would be better if we all simply stopped using p values!




Posted in NHST, Replication

Significance Roulette 2

In my post of a couple of days ago I gave the link to Significance Roulette 1, a video that explains how to generate the roulette wheel for a ‘typical experiment’, by which I meant an independent groups experiment, N = 32 in each group, with half a standard deviation difference between the population means. In other words, the population effect size is assumed to be 0.5, traditionally considered a medium-sized effect.

However, assuming a known population effect size is unrealistic: If we knew that, we’d hardly have reason to run the experiment! Can we do better? The second video, Significance Roulette 2 explains how we can. Give me just the p value from an initial experiment, and I will generate the roulette wheel that represents the probability distribution of replication p, meaning the p value we would get if we made a single replication of the original experiment–just the same but with new samples.

The remarkable thing is that we don’t need to know the sample sizes, the power, or the population effect size for the initial experiment. All we need is the p value from the initial experiment, and assurance that the replication is just the same as the original, but with new samples.

The shortcut to the Significance Roulette 2 video is:

The article in which I explain the calculations is here.

Once again I conclude that we should never trust a p value, we should not use p values at all, and that there are much better ways.



Posted in ITNS, NHST, Replication, The New Statistics

Significance Roulette 1

If you run an experiment, obtain p = .05, then repeat the experiment–exactly the same but with a new sample–what p value are you likely to get? The answer, surprisingly, is just about any value! In other words, the sampling variability of the p value is enormous, although most people don’t appreciate this.

Years ago I became fascinated with the unreliability of p, and wrote a paper exploring it, with simulations and pictures and formulas. (I can’t post the paper itself here, but if you would like it and have trouble getting access, email me at and I’ll send you the pdf.)

I also posted to YouTube the first video of the dance of the p values, which illustrated the great variability of p and argued that this variability is one more reason for using confidence intervals and not using p values at all. Later I posted a second version of the dance, and also made a version to go with ITNS.

For the last year or two I’ve been playing with another approach to illustrating the enormous variability of p–significance roulette. It turns out that running a typical experiment to obtain a p value is equivalent to spinning a particular roulette wheel marked with 38 p values that range from *** (p < .001) to values up near 1. I’ve just posted a video. I hope you enjoy it. Even more, I hope it helps persuade people that it’s crazy to use p values, and that there are much better ways.

The shortcut to the Significance Roulette 1 video is:



Posted in ITNS, NHST, Replication, The New Statistics

The long road towards clinical trials registries – Sackler Colloquim on Reproducibility Field Report 4

Science only works if we have the whole story. This is especially important in clinical trials, where the results of these studies are used to guide medical practice.  Unfortunately, getting the whole story can be difficult–there are strong incentives to bury negative results.  This radically distorts the published information on treatment effectiveness.

How to fix this problem?  Clinical trials registries–(hopefully) mandatory databases of *all* clinical trials (hopefully) with the data once complete.  Sounds easy, right?  Unfortunately, implementing this straightforward idea has been surprisingly difficult and time-consuming.

At the Sackler colloquium I (Bob) attended in early March, Roberta Scherer reviewed the history of clinical trials registries.  It stretches back much further than I realized, as shown in this graph Scherer presented from Dickersin & Rennie (2012):

There has been progress during this long history–clinical trials databases are now available for researchers throughout the world, they are being increasingly used, and some journals have successfully implemented and monitored processes to require registry prior to publication.  Unfortunately, as we have discussed, there is still lots of work to do.  Compliance remains low in some fields, the information in the registry is often incomplete, sharing of the data still is not uniform, and researchers often end up reporting their work in ways that differ from their registered protocols without noting the change (see this older post on the COMPARE project led by Ben Goldacre).  Finally, only 54% of registered trials are not published–which is good because now this is discoverable, but bad because it means  synthesizing the literature would still require trolling this partial and incomplete registries to try to scrape together the whole story.  Still a long, long way to go.

Dickersin, K., & Rennie, D. (2012). The evolution of trial registries and their use to assess the clinical trial enterprise. Jama, 307(17), 1861–4.

Posted in Uncategorized

Replication is the new black, and not only in Psychology: Economics too

There are good folks in many disciplines who are working to encourage Open Science practices. Here’s an example from economics: A website that promotes replication.

The Network is run by Bob Reed, at the University of Canterbury in New Zealand (earthquake central), and Maren Duvendack in the U.K.   The Network has several hundred members.

It’s fun to browse the Guest Blogs–I discovered all sorts of interesting links and discussions about replication and Open Science stuff.

News & Events is also worth a squiz. And that’s not even mentioning the big pile of info about replication work in economics.



P.S. When we were writing ITNS, our copy editor queried a few expressions that turned out to be Australianisms. We dropped some, but one that survived was ‘squiz’. We reckoned that readers would be savvy enough to guess correctly, or could even resort to googling it if they insisted.

Posted in Open Science, Replication

A conference about–wait for it–the p value! But other things too.

In March 2016 the American Statistical Association (ASA) posted online a policy statement about the p value. You can see it here. This was remarkable–for one thing because it was the first time the ASA had made a public pronouncement about a particular statistical technique or concept.

The statement I like was made by Ron Wasserstein (the ASA president) while discussing the statement:

“In the post p<0.05 era, scientific argumentation is not based on whether a p value is small enough or not.  Attention is paid to effect sizes and confidence intervals.”

Of course, I read that as endorsement of the new statistics!

I’ve just had word from my colleague Michael Lew, of Melbourne University, about a conference that ASA is running in October, in Bethesda MD. The conference description starts by mentioning the ASA statement on the p value, but then describes the aim as discussing numerous aspects of how research and statistical inference should be conducted in the 21st century. Open Science issues figure prominently.

So it’s not really a conference about the p value, I’m relieved to say. It looks wonderful. I won’t be there, but perhaps you might consider it? ASA makes clear that the target audience is lots of different kinds of folks, not just statisticians.

I should mention that Michael, who is mentioned in ITNS, was a member of the expert working group tasked by the ASA to develop the statement about the p value. It took many months, numerous drafts, and some robust discussions–which included some very prominent statisticians. It was fascinating to watch. Best was that the final statement was well expressed, not too long, and fairly strong in its critique of the p value and, especially, how it is typically used.

Here’s to “the post p<0.05 era”!


Posted in NHST, Open Science, The New Statistics

Science Spin – Sackler Colloquim on Reproducibility Field Report 3

The conference on reproducibility I (Bob) attended in early March was so invigorating I figured I would spread these posts out.  Here’s the next installment.

Another good talk on the first day was from Isabelle Boutron, an MD PhD at Oxford who has extensively researched scientific spin: making claims about research results which go beyond what is clearly indicated in the data.  According to Boutron: “Publication bias is showing only the tip of the iceburg.  Spin is trying to make the iceburg look beautiful.”

Boutron reviewed evidence that over half of all applied medical papers have at least some “spin”–some attempt to claim stronger findings than clearly demonstrated in the data.  Strategies documented include:

  • Interpretation of non-significance as indicating comparability
  • Mining for significance in subgroups, focus on secondary outcomes, etc.
  • Approached significance phrasing.
  • Overly forceful language relative to the finding “demonstrated safety” for merely non-siginficant difference in safety.

Does spin matter?  Yes–clinicians who examined an abstract with spin found the results more promising than those who examined the abstract without spin (but with same data).  Moreover, spin from the scientist is propagated and amplified if the study makes its way into the popular press.  Although there is a strong contribution from the press in distorting research results, more distortions seem to occur when the paper and/or original press release already has some spin.

One interesting tidbit–does press coverage matter to a researcher’s career?  Probably.  Boutron mentioned a study by Phillips (1991, NEJM, I think) which looked at citations to articles that were either a) covered in the NYT, b) not covered but otherwise similar, or c) selected to be covered but not due to a strike at the NYT (what a cool control!).  The study concluded that being written up in the NYT boosts citations by 73%.  Fascinating!

Overall, Boutron provided a convincing case that scientists are overhyping their research in a way that is likely to have negative consequences.  Although I expected no positive solutions to this, Boutron highlighted one possible way forward:, a website which de-spins health-science news.  Here’s a post just about this cool resource:

Posted in Uncategorized