Joining the fractious debate over how to do science best
At the end of the month (March 2019) the American Statistical Association will publish a special issue on statistical inference “after p values”. The goal of the issue is to focus on the statistical “dos” rather than statistical “don’ts”. Across these articles there are some common themes, but also some pretty sharp disagreements about how best to proceed. Moreover, there is some very strong disagreement about the whole notion of bashing p values and the wisdom of the ASA putting together this special issue (see here, for example).
Fractious argument is the norm in the world of statistical inference, hence the old joke that the plural of “statistician” is a “quarrel”. And why not? Debates about statistical inference get to the heart of epistemology and the philosophy of science–they represent the ongoing normative struggle to articulate how to do science best. Sharp disagreement over the nature of science is the norm–it has always been part of the scientific enterprise and it always will be. It is this intense conversation that has helped define and defend the boundaries of science.
Geoff has long been involved in debates over statistical inference and how to do science best, but this is new to me (Bob). I’m proud of the contribution we submitted to the ASA–I think it’s the best piece I’ve ever written. But I have to say that I go into the debate over inference (and science in general) with some trepidation. First, it is intrinsically gutsy to think you have something to say about how to do science best. Second, I’m the smallest of small-frys in the world of neuroscience–so it’s not like I have notable success at doing science to point to as a support for my claims. Finally, this ongoing debate has a long history and is populated by giants I look up to, most of whom (unlike me) have specialized in studying these topics. In my case, I’ve been learning on the go for the past ten years or so, starting from a foundation that involved plenty of graduate-level stats, but which didn’t even equip me to properly understand the difference between Bayesian and frequentist approaches to statistics.
As I wade into this fraught debate, I thought it might help me to reflect a bit on my own meta-epistemology–to articulate some basic premises that I hold to in terms of thinking about how to fruitfully engage in debate over inference and the philosophy of science. These premises are not only my operating rules, but also my philosophical courage–they explain why I think a noob like me can and should be part of the debate, and why I encourage more of my colleagues in the neurosciences and psychological sciences to tune in and jump in.
There are no knock-out punches in philosophy. This comes from one of my amazing philosophy mentors, Gene Cline. It has taken me a long time to both understand and embrace what he meant. As a young undergrad philosophy major I was eager to demolish–to embarrass Descartes’ naive dualism, to rain hell on Chalmer’s supposedly hard problems of consciousness, and to expose the circular bloviation of Kant’s claims about the categorical imperative. Gene (gradually) helped me understand, though, that if you can’t see any sense in someone’s philosophical position then you’re probably not engaging thoughtfully with their ideas, concerns, or premises (cf Eco’s Island of the Day Before). It’s easy to dismiss straw-person or exaggerated versions of someone’s position, but if you interpret generously and take seriously their best arguments, you’ll find that no deep philosophical debate is easily settled. I initially found this infuriating, but I’ve come embrace it. So I now look with healthy skepticism at those who offer knock-out punches (e.g. (Morey, Hoekstra, Rouder, Lee, & Wagenmakers, 2015)). I hope that in discussing my ideas with others to a) take their claims and concerns seriously, taking on the best possible argument for their position, and b) not to offer my criticisms as a sure and damning refutation… these only seem to exist when we’re not really listening to each other.i
Inference works great until it doesn’t. As Hume pointed out long ago, there is no logical glue holding together the inference engine. Inference assumes that the past will be a good guide to the future, but there is no external basis for this premise, nor could there be (A rare knockout punch in philosophy? Well, even this is still debated). Even if we don’t mind the circularity of induction, we still have to respect the fact that past is not always prelude: inference works great, until it doesn’t (c.f. Mark Twain’s amazing discussion in Life on the Mississippi). So whatever system of inference we want to support we should be clear-eyed that it will be imperfect and subject to error, and that when/how it breaks down will not always be predictable. This is really important in terms of how we evaluate different approaches to statistical inference–none will be perfect under all circumstances, so evaluations must proceed in terms of strengths/weaknesses and boundary conditions. The fact that an approach works poorly in one circumstance is not always a reason to condemn it. We can thoughtfully make use of tools that in under some circumstances are dangerous.
We don’t all want the same things. Science is diverse and we’re not all playing the game in exactly the same way or for the same ends. I see this every year on the floor of the Society for Neuroscience conference, where over 30,000 neuroscientists meet to discuss their latest research. The scope of the enterprise is hard to imagine, and the diversity in terms of what people are trying to do is staggering. That’s ok. We can still have boundaries between science and pseudoscience without having complete homogeneity of statistical, inferential, and scientific approaches. So beware of people telling you what you, as a scientist, want to know. Beware of someone condemning all use of a statistical approach because it doesn’t tell them what they,want to know. That’s my take on a good blog post by Daniel Lakens.
Nullius in verba* Ok – so we have to tread cautiously. But that does not devolve us into sophomoric inferential relativism (everyone’s right in some way; trophies for all!). We can still make distinctions and recognize differences. How? Well, to the extent that there is any “ground truth” in science it is the ability to establish procedures for reliably observing an effect. We could be wrong about what the effect means. But we’re not doing science if we can’t produce procedures that others can use to verify our observations. This is embodied in the founding of the Royal Society, which selected the motto Nullius in verba (
verbum), which means “take no one’s word for it” or “see for yourself” (hat tip to a fantastic presentation by Cristobal Young on this). We can evaluate scientific fields for their ability to be generative this way–to establish effects that can be reliably observed and then dissected (not so fast, Psi research). We can also evaluate systems of inference in this way–for their ability (predicted or actual) to help scientists develop procedures to reliably observe effects. By this yardstick some methods of inference will be demonstrably bad (conducting noisy studies and then publishing the statistically significant results as fact while discarding the rest—bad!). But we should expect there to be multiple reasonable approaches to inference, as well as constant space for potential improvement (though usually with other tradeoffs). Oh yeah–this is a very slippery yardstick. It is not easy to discern or predict the fruitfulness of an inferential approach, and there can be strong disagreement about what counts as reliably establishing an effect.
This emphasis on replicability as essential to science cuts a tiny bit against my above point that not all scientists want the same thing. Moreover, in the negative reaction to the replication crisis, I’ve seen some commentaries where there seems to be little concern or regard for the standard of establishing verifiable effects. This, to my mind, stretches scientific pluralism past the breaking point: if you’re not bothered by a lack of replicability of your research, you’re not interested in science.
Authority will only get you so far. The debate over inference has a long history. It’s important not to ignore that . But it is equally important not to use historical knowledge as a cudgel; appeals to authority are not a substitute for good argument. Maybe it is my outside perception, but I feel like quotes from Fisher or Jeffreys or Meehl or sometimes weaponized to end discussion rather than contribute to it.
Ok – so those are my current ideas for how to approach arguments about science and statistical inference: a) embrace real statistical pluralism without letting go of norms and evaluation; b) ground evaluation (as much as possible) in what we think can best foster generative (reproducible) research, c) listen and take the best of what others have to offer, and d) try not to lean too heavily on the Fisher quotes.
At the moment, I’ve landed on estimation as the best approach for the statistical issues I face. I’m confident enough in that choice that I feel good advocating for the use of estimation for other scientists with similar goals. In advocating for estimation, I’m not going to claim a knock-out punch against p values or other approaches, or that the goals estimation can help with are the only legitimate goals to have. Moreover, in advocating for estimation, my goal is not hegemony. Hegemony of misusing p values is where we are currently at, and we don’t need to replace one imperial rule with another. I am helping a journal re-orient its author guidelines towards estimation (with or in place of p values)—but my goal is a diverse landscape of publication options in neuroscience, one where there are outlets for different but fruitful approaches to inference.
Ok – those are my thoughts for now on how to fruitfully debate about statistical inference. I’m sure I have a lot to learn. I’m looking forward to the special issue that will soon be out from the ASA and the debate that will surely ensue.
*Thanks to Boris Barbour for pointing out I misquoted the Royal Society Motto in the original post.
- Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E.-J. (2015). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review, 103–123. doi:10.3758/s13423-015-0947-8