Yay! I salute the editors and everyone else who toiled for more than a year to create this wonderful collection of TAS articles. Yes, let’s move to a “post p<.05 world” as quickly as we can.
Much to applaud
Numerous good practices are identified in multiple articles. For example:
- Recognise that there’s much more to statistical analysis and interpretation of results than any mere inference calculation.
- Promote Open Science practices across many disciplines.
- Move beyond dichotomous decision making, whatever ‘bright line’ criterion it’s based on.
- Examine the assumptions of any statistical model, and be aware of limitations.
- Work to change journal policies, incentive structures for researchers, and statistical education.
And much, much more. I won’t belabour all this valuable advice.
The editorial is a good summary I said that in this blog post. The slogan is ATOM: “Accept uncertainty, and be Thoughtful, Open, and Modest.” Scan the second part of the editorial for a brief dot point summary of each of the 43 articles.
Estimation to the fore I still think our (Bob’s) article sets out the most-likely-achievable and practical way forward, based on the new statistics (estimation and meta-analysis) and Open Science. Anderson provides another good discussion of estimation, with a succinct explanation of confidence intervals (CIs) and their interpretation.
What’s (largely) missing: Bayes and bootstrapping
The core issue, imho, is moving beyond dichotomous decision making to estimation. Bob and I, and many others, advocate CI approaches, but Bayesian estimation and bootstrapping are also valuable techniques, likely to become more widely used. It’s a great shame these are not strongly represented.
There are articles that advocate a role for Bayesian ideas, but I can’t see any article that focuses on explaining and advocating the Bayesian new statistics, based on credible intervals. The closest is probably Ruberg et al., but their discussion is complicated and technical, and focussed specifically on decision making for drug approval.
I suspect Bayesian estimation is likely to prove an effective and widely-applicable way to move beyond NHST. In my view, the main limitation at the moment is the lack of good materials and tools, especially for introducing the techniques to beginners. Advocacy and a beginners’ guide would have been a valuable addition to the TAS collection.
Bootstrapping to generate interval estimates can avoid some assumptions, and thus increase robustness and expand the scope of estimation. An article focussing on explaining and advocating bootstrapping for estimation would have been another valuable addition.
The big delusion: Neo-p approaches
I and many others have long argued that we should simply not use NHST or p values at all. Or should use them only in rare situations where they are necessary—if these ever occur. For me, the biggest disappointment with the TAS collection is that a considerable number of articles present some version of the following argument: “Yes, there are problems with p values as they have been used, but what we should do is:
- Use .005 rather than .05 as the criterion for statistical significance, or
- teach about them better, or
- think about p values in the following different way, or
- replace them with this modified version of p, or
- supplemente them in the following way, or
There seems to be an assumption that p values should—or at least will—be retained in some way. Why? I suspect that none of the proposed neo-p approaches is likely to become very widely used. However, they blunt the core message that it’s perfectly possible to move on from any form of dichotomous decision making, and simply not use NHST or p values at all. To this extent they are an unfortunate distraction.
p as a decaf CI One example of neo-p as a needless distraction is the contribution of Betensky. She argues correctly and cogently that (1) merely changing a p threshold, for example from .05 to .005 is a poor strategy, and (2) interpretation of any p value needs to consider the context, in particular N and the estimated effect size. Knowing all that, she correctly explains, permits calculation of the CI, which provides a sound basis for interpretation. Therefore, she concludes, a p value, when considered in context in this way, does provide information about the strength of evidence. That’s true, but why not simply calculate and interpret the CI? Once we have the CI, a p value adds nothing, and is likely to mislead by encouraging dichotomisation.
I’ll mention just one further article that caught my attention. Billheimer contends that “observables are fundamental, and that the goal of statistical modeling should be to predict future observations, given the current data and other relevant information” (abstract, p.291). Rather than estimating a population parameter, we should calculate from the data a prediction interval for a data point, or sample mean, likely to be given by a replication. This strategy keeps the focus on observables and replicability, and facilitates comparisons of competing theories, in terms of the predictions they make.
This strikes me as an interesting approach, although Billheimer gives a fairly technical analysis to support his argument. A simpler approach to using predictions would be to calculate the 95% CI, then interpret this as being, in many situations, on average, approximately an 83% prediction interval. That’s one of the several ways to think about a CI that we explain in ITNS.
I haven’t read every article in detail. I could easily be mistaken, or have missed things. Please let me know.
I suggest (maybe slightly tongue-in-cheek):
- Read the editorial, and skim the 43 brief summaries.
- Read the comment and editorial in Nature.
- Read our article and use it as a blueprint for future practice!