The Journal of Physiology Adopts Better Practices and Excoriates p Values
The Journal of Physiology is embracing a number of Open Science practices and has just published an editorial highly critical of the p value. Yay!
In less brief
Simon Gandevia, of the NeuRA research institute in Sydney, has been awarded Honorary Membership of The Century Citation Club of JPhysiol, having published more than 100 articles (phew!) in that journal. He was invited to write an editorial, which was published in January. Simon reflected on changes in the Journal and the discipline—massive advances in techniques, larger teams, longer articles, stronger links to clinical applications—and closed by explaining how unreliable and misleading p can be, especially in relation to replications.
Simon’s editorial, statistical aspects
Simon described how JPhysiol had, since around 2010, encouraged authors to adopt better statistical techniques and reporting practices, and had published how-to guides to help. However, little changed, and so in 2018 the Journal mandated more complete reporting of research—to facilitate replication—and especially of data and statistical analyses. In other words, adoption of key Open Science practices.
Simon then used his lovely figure, above, to demonstrate that an Initial p value (p found in an initial study, as shown on the horizontal axis) gives extremely little information about what p an exact replication will find. The solid line (refer to left axis) tells us that initial p = .05 gives only a 50:50 chance that a replication finds p < .05. Initial p = .01 gives only about a 67% chance of replication p < .05.
The vertical grey bars (refer to right axis) are the 80% prediction intervals for replication p. These come from my p intervals article (Cumming, 2008, here or here). If initial p = .05, there’s an 80% chance that p in an exact replication lies in (.0002, .65), and a 10% chance it lies to the left of that interval, and 10% to the right. For initial p = .01, the prediction interval is (.00001, .41). The intervals are so long! A replication can, alas, give just about any p value, so no p is to be trusted.
Simon also discusses the advantages of replacing NHST “with presentation of effect sizes and confidence intervals (ESCI). This… would avoid the phoney dichotomy of significant vs. insignificant…”. Indeed! He mentioned a recent article of his that’s one of a number in JPhysiol in which authors had taken such an estimation approach. That’s progress!
Simon’s editorial is definitely worth a read.
Simon’s editorial: A response and our reply
In reply, Brent Raiteri made a number of points and, in particular, argued that there are typically so many unknowns about a replication that it’s not possible to calculate any value for replication p. Simon kindly invited three of us to join him in a reply to Raiteri, which has just come online. We clarified a couple of points and Simon prepared an amended version of his figure that makes clear that two-tailed p values are used throughout. (It’s this amended figure that I include above.) We also noted that the values in the figure don’t rely on the assumption—unrealistic, but often made—that the initial study estimated exactly the size of the effect in the population.
We noted that the figure assumes that sampling variability is the only cause for differences between initial and replication p, so the probabilities and prediction intervals depicted represent a best case—replication p may, in practice, be even more unreliable. We agreed with Raiteri that usually there are further differences between initial and replication studies, perhaps sometimes sufficient to justify Raiteri’s claim that it’s not possible to calculate replication p values.
The Journal of Physiology
You probably know that JPhysiol is one of the longest-running and most highly regarded journals in the biological sciences. Founded in 1878, it has published classic research from numerous Nobel Laureates, as well as other leading scientists. Many neuroscience courses still ask students to read classic JPhysiol articles, perhaps about the sodium pump, and other fundamental discoveries. It’s especially pleasing to see such a journal adopt and promote Open Science practices.
Research Quality at NeuRA
Within his institute, Simon has long championed improved research practices. The Research Quality page of NeuRA introduces the Reproducibility & Quality Sub-Committee that Simon convenes. It’s active in promoting Open Science practices by NeuRA’s researchers.
At that page, scroll down to see the video of a March 2021 talk by Simon:
Research Quality and Reproducibility: Why You Should Be Worried
- At about 8.45, note a nice story about Sir John Eccles being acutely aware of having published incorrect findings, then finding a way to do better—which led to his Nobel Prize.
- At about 22.30, see the Quality Output Checklist and Content Assessment (QuOCCA), an instrument for assessing the transparency, data analysis, and reporting practices of a draft journal manuscript.
A little lower on that page is a video of a talk that I, at Simon’s invitation, gave at NeuRA in December 2019:
Improving the Trustworthiness of Neuroscience Research
I included two demonstrations of the unreliability of replication p values:
- At about 11.00 see the dance of the p values (or search YouTube for ‘dance of the p values).
- At about 13.00 I move on to explain then demonstrate significance roulette (or search YouTube for ‘significance roulette’ to find two videos).
A warm salute to Simon and the editors at JPhysiol for bringing that august journal into the world of Open Science and better statistics.