Journal Articles Without p Values
Once we have a CI, a p value adds nothing, and is likely to mislead and to tempt the writer or readers to fall back into mere dichotomous decision making (boo!). So let’s simply use estimation and never report p values, right? Of course, that’s what we advocate in UTNS and ITNS–almost always p values are simply not needed and may be dangerous.
Bob and I are occasionally asked for examples of published articles that use CIs, but don’t publish any p values. Good question. I know there are such articles out there, but–silly me–I haven’t been keeping a list.
I’d love to have a list–please let me know of any that you notice (or, even better, that you have published). (Make a comment below, or email email@example.com )
Here are a few notes from recent emails on the topic:
First, if we look at the big picture, it is clear that estimation is on the rise. CIs are now reported in the majority of papers both at Psych Science (which enjoins their use) *and* at JEP:General (which encourages them) (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0175583#pone-0175583-g001 ). That suggests some good momentum. However, you’ll see that there is no corresponding decline in NHST, so that means folks are primarily reporting CIs alongside p values. It’s unclear, then, if this change in reporting is producing the desired change in thinking (Fidler et al. 2004 found that in medical journals CIs are reported but essentially ignored).
As for specific examples…
· Adam Claridge-Chang’s lab developed cool software for making difference plots, and their work now reports both p values and CIs but focuses primarily on the CIs. The work is in neurobiology…so probably not the best example for psych graduate students
o Eriksson, A., Anand, P., Gorson, J., Grijuc, C., Hadelia, E., Stewart, J. C., Holford, M., and Claridge-Chang, A. (2018), “Using Drosophila behavioral assays to characterize terebrid venom-peptide bioactivity,” Scientific Reports, 8, 1–13. https://doi.org/10.1038/s41598-018-33215-2.
· In my lab, we first started reporting CIs along with p values but focusing discussion on the CIs. For our latest paper we finally omitted the p values altogether, and yet the sky didn’t fall (Perez et al., 2018). This is behavioral neuroscience, so a bit closer to psych, but still probably not really a great example for psych students.
o Perez, L., Patel, U., Rivota, M., Calin-jageman, I. E., and Calin-jageman, R. J. (2018), “Savings memory is accompanied by transcriptional changes that persist beyond the decay of recall,” Learning & Memory, 25, 1–5. https://doi.org/10.1101/lm.046250.117.25.
· I don’t read much in Psych Science, but here’s one paper that caught my eye that seems sensitive primarily to effect-size issues:
o Hirsh-Pasek, K., Adamson, L. B., Bakeman, R., Owen, M. T., Golinkoff, R. M., Pace, A., Yust, P. K. S., and Suma, K. (2015), “The Contribution of Early Communication Quality to Low-Income Children’s Language Success,” Psychological Science, 26, 1071–1083. https://doi.org/10.1177/0956797615581493.
· Overall, where I the most progress is with the continued rise of meta-analysis and/or large data sets that are pushing effect-size estimates to the forefront of the discussion. For example,
o This recent paper examining screen time and mental health in teens. It uses a huge data set, so the question is not “is it significant” but “how strong could the relationship be”. They do a cool multiverse analysis, too.
§ Orben, A., and Przybylski, A. K. (2019), “The association between adolescent well-being and digital technology use,” Nature Human Behaviour, Springer US. https://doi.org/10.1038/s41562-018-0506-1.
o Or the big discussion on Twitter on if a significant finding of egodepletion of d = .10 means anything
More from Bob
I just came across this interesting article in Psych Science:
Nave, G., Jung, W. H., Karlsson Linnér, R., Kable, J. W., and Koellinger, P. D. (2018), “Are Bigger Brains Smarter? Evidence From a Large-Scale Preregistered Study,” Psychological Science, 095679761880847. https://doi.org/10.1177/0956797618808470.
This paper has p values alongside confidence intervals. But the sample size is enormous (13,000 brain scans) so basically everything is significant and the real focus is on the effect sizes.
It strikes me that this would be a great paper for debate about interpreting effect sizes. Once controlling for other factors, the researchers find a relationship between fluid intelligence and total brain volume, but it is very weak: r = .19 95% CI[.17, .22] with just 2% added variance in the regression analysis. The researchers describe this as “solid”. I think it would be interesting to have students debate: does this mean anything, why or why not?
There’s also some good measurement points to make in this paper—they compared their total brain volume measure to one extracted from by the group that collected the scans and found r = .91… which seems astonishingly low for using the exact same data set. If about 20% of the variance in this measure is error, it makes me wonder if a relationship with 2% of the variance could be thought of as meaningful.
Oh, and the authors also break total brain volume down into white matter, gray matter, and CSF. They find the corrected correlation with fluid intelligence to be r = 0.13, r = 0.06, and r = 0.05, with all 3 being statistically significant. This, to me, shows how little statistical significance means. It also make me even more worried about interpreting these weak relationships… I can’t think of any good reason why having more CSF would be associated with higher intelligence. I suspect that their corrections for confounding variables were not perfect (how could they be) and that the r = .05 represents remaining bias in the analysis. If so, that means we should be subtracting an additional chunk of variance from the 2% estimate.
Oh yeah, and Figure 1 shows that their model doesn’t really fit very well to the extremes of brain volume.
In this post, I note that an estimated 1% of papers in a set of HCI research papers report CIs but don’t seem to have signs of NHST or dichotomous decision making, but I don’t have any citations.
So, between us (ha, almost all Bob), Bob and I can paint a somewhat encouraging picture of progress in adoption of the new statistics, while not being able to pinpoint may articles that don’t include p values at all. As I say, let’s know of any you come across. Thanks!