APS Publicises the TAS Articles

The APS has just released this Research Spotlight item:

Suddenly it’s more than 5 years since APS made some important and very early steps to promote Open Science and new-statistics practices. Back in Jan 2014, then Editor-in-Chief of Psychological Science, Eric Eich, explained in an editorial a whole set of new policies: Offering of OS badges, requirements for more complete reporting of methods and data, encouragement for use of the new statistics, discouragement of NHST, and more. Steve Lindsay, who followed Eric, has introduced further policies to encourage replication and improve the trustworthiness of research published in Psychological Science. Various of the new policies have spread to other APS journals.

The Research Spotlight item also notes some of the other steps APS has taken in support of OS and statistical improvements, including publication of this tutorial article, and this set of six videos.

APS Conventions have hosted many symposia and workshops on Open Science, and on the new statistics and other modern statistical practices. You can rush to sign up (click at left then scroll down) for Tamara and Bob’s great workshop at this year’s Convention in D.C.:

All strength to the APS in its long-standing and continuing support for Open Science and better statistical methods.


Ditching Statistical Significance: The Most Talked-About Paper Ever?

Well, that might be a stretch, but in relation to the Nature Comment that Bob and I signed to support, Altmetric tweeted:

John Ioannidis published this criticism of the Comment, with the subtitle Do Not Abandon Significance. Much of what he writes is sensible, and in agreement with the Comment, but in my opinion he doesn’t make a concerted or convincing case for not abandoning statistical significance. He hardly seems to attempt to assemble such a case, despite that subtitle.

The authors of the Comment, joined by Andrew Gelman, replied. Their reply, titled Abandoning statistical significance is both sensible and practical, strikes me as succinct, clear, and convincing.

I’m still working through the 43 TAS articles that prompted the Nature Comment. I’ll report.

In the meantime I continue to be so happy that statistical significance may at last be receiving its comeuppance. The battalion of scholars who have published swingeing critiques of NHST since around 1950 may at last be vindicated!

Now we just need all this great progress to filter through to instructors of the intro stats course, so that they can feel emboldened to adopt ITNS. Then we’ll know that things have really changed for the better!


Moving Beyond p < .05: The Latest

A couple of days ago, the three authors of the Nature paper accompanying the special issue of TAS on moving beyond p < .05 sent the update below. (See below for lots of links.)

We are writing with a brief update on the Nature comment “Retire statistical significance” that you signed. The comment has so far been subject to spirited discussion in both traditional and social media (see below).

We believe keeping this discussion going by approaching colleagues and sharing links on social media is the only way a reform in statistical practice will come about. Your continued support in this is most welcome and appreciated! Together, our chance to make this happen has perhaps never been better.

With kind regards and many thanks,Valentin, Sander, Blake

Retire statistical significancehttps://www.nature.com/articles/d41586-019-00857-9
The American Statistician – Statistical inference in the 21st century: a world beyond p < 0.05https://www.tandfonline.com/toc/utas20/73/sup1

Media mentions:
Retraction Watchhttps://retractionwatch.com/2019/03/21/time-to-say-goodbye-to-statistically-significant-and-embrace-uncertainty-say-statisticians

Andrew Gelman’s blog (discussion with 400 comments)https://statmodeling.stat.columbia.edu/2019/03/20/retire-statistical-significance-the-discussion


The Guardianhttps://www.theguardian.com/commentisfree/2019/mar/24/the-guardian-view-on-statistics-in-sciences-gaming-the-unknown




Browse Gelman’s blog and its comments to get an idea of the range of views held by, especially, statisticians. There may be a danger that the imperative to move forward from p values gets lost in the noise. We really do need to keep focus! (Yay for the new statistics! Estimation, CIs, meta-analysis to the fore! Let p values wither and die a natural death!)


Moving to a World Beyond “p < 0.05”

The 43 articles in The American Statistician discussing what researchers should do in a “post p<.05” world are now online. See here for a list of them all, with links to each article.

The collection starts with an editorial:

Go here to get the full editorial as a pdf.

Bob and I commented on earlier drafts of the editorial, as did some other authors of articles in the collection. I think the published version is great, even if, as usual, I’d like it to have gone a bit further towards virtually always ending the use of p values. But there are very welcome strong recommendations for the use of estimation, as well as many other wise words.

We’re pleased that the authors of the editorial elected to refer to our article (more on that below) in 5 places, and to quote our words in 4 of those (see pp. 3 (twice), 9, and 10 in the pdf). A very strong theme of the editorial is that researchers should always ’embrace uncertainty’, which is a major theme also in our article, among others.

Go here for the pdf of our article,

Yes, my name is listed as a co-author, but Bob wrote the article, and an excellent job he did of it too! If all researchers followed our (his) advice, the world would be a much better place–says he modestly. But have a squiz and see what you think.

The editorial includes (pp. 10-18 in the pdf) a brief dot point summary of each article. The summary we contributed of ours is:

1. Ask quantitative questions and give quantitative answers.

2. Countenance uncertainty in all statistical conclusions, seeking ways to quantify, visualize, and interpret the potential for error.

3. Seek replication, and use quantitative methods to synthesize across data sets as a matter of course.

4. Use Open Science practices to enhance the trustworthiness of research results.

5. Avoid, wherever possible, any use of p values or NHST.

Here’s to the onward march of Open Science and better statistical practice!


Ditching Statistical Significance?!

Nature (!) has just published an editorial discussing and advocating that statistical significance should be ditched. For me, that’s the stuff of dreams, but I have lived to see it happen! I’m so happy!

Here’s one para from the editorial:

The ‘call for scientists to abandon statistical significance‘, by Valentin Amrhein, Sander Greenland, Blake McShane is the Comment that is also in this week’s Nature. Its title is Scientists rise up against statistical significance, and it is supported by 854 scientists from 52 countries. See a list of these good folks here.

The Comment has been developed over past weeks, with revisions in response to suggestions (and exhortations) from many, including Bob and me. Of course I’d like the final version to be stronger, and to call for an end to any use of p values, or their use only in rare cases when we don’t (yet) have good alternatives. But it is pretty strong, and has much wise advice that would lead to much improved practice if widely followed.

The 854 signatures were collected in a very few days, during which the Comment itself was strictly embargoed. Bob and I were very happy to sign.

The ‘series of related articles‘ mentioned above, and published by the American Statistical Association, has not yet appeared, but should do so any moment. That should also be a massive game changer. Let’s hope.

Sometimes, ‘ditching’ can be great progress!



Last month I (Bob) visited a local elementary school for a “Science Alliance” visit. This is a program in our community to being local scientists into the classroom. I brought the Cartoon Network simulator I have been developing (Calin-Jageman, 2017, 2018). This simulator is simple-enough that kids can use, but complex enough to generate some really cool network behaviors (reflex chains, oscillations, etc.). The simulation can be hooked up to a cheap USB robot, so kids can design the ‘brains’ of the robot, giving it the characteristics they want (fearful–to run away from being touched; aggressive–to track light), etc.

The kids *loved* the activity–the basic ideas were easy to grasp and they were quickly exploring, trying things out, and sharing results with each other. They made their Finches chirp and dance, and in the process discovered recurrent loops and the importance of inhibition.

In developing Cartoon Network, my inspiration was logo, the programming language developed by Seymour Papert and colleagues at MIT. I was a “logo kid”–it was basically the only thing you *could* do on the computer lab my elementary school installed when I was in second grade. Logo was *fun*–you could draw things, make animations…it was a world I wanted to explore. But Logo didn’t make it terribly easy–as you went along you would need/want key programming concepts. I clearly remember sitting in the classroom writing a program to draw my name and being frustrated at having to re-write the commands to make a B at the end of my name when I had already typed them out for the B at the beginning of my name. The teacher came by and introduced me to functions, and I remember being so happy about the idea of a “to b” function. I immediately grasped that I could write functions for every letter once and then be able to have the turtle type anything I wanted in no time at all. Pretty soon I had a “logo typewriter” that I was soooo proud of. I could viscerally appreciate the time I had saved, as I could quickly make messages to print out that would have taken me the whole class-period to code ‘by letter’.

Years later I read Mindstorms, Papert’s explanation of the philosophy behind Logo. This remains, to my mind, one of the most important books on pedagogy, teaching, and technology. Papert applied Piaget’s model of children as scientists (he had trained with Piaget). He believed that if you can make a microworld that is fun to explore, children will naturally need, discover, and understand deep concepts embedded in that world. That’s what I was experiencing back in 2nd grade–I desperately needed functions, and so the idea of them stuck with me in a way that they never would in an artificial “hello world” type of programming exercise. Having grown up a “logo kid”, reading Mindstorms was especially exciting–I could recognize myself in the examples, and connect my childhood experiences to the deeper ideas about learning Papert used to structure my experience.

Papert warned that microworlds must be playful and open-ended. Most importantly a microworld should not be reduced to a ‘drill and skill’ environment where kids have to come up with *the* answer. Sadly, he saw computers and technologies being used that way–to program the kids rather than having the kids program the computers. Even more sad, almost all the “kids can code” initiatives out there have lost this open-ended sense of exploration–they are mostly a series of specific challenges, each with one right answer. They do not inspire much joy or excitement; their success is measured in the number of kids pushed through. (Yes, there are some exceptions, like minecraft coding, etc… but most of the kids code initiatives are just terrible, nothing like what Papert had in mind).

So, what does all this have to do with statistics? Well, the idea of a microworld still makes a lot of sense and is also applicable to statistics education. Geoff’s dance of the means has become rightly famous, I would suggest, because it is a microworld users can explore to sharpen their intuitions about sampling, p values,CIs, and the like. Richard Morey and colleagues recently ran a study where you could sample from a null distribution to help make a judgement about a possible effect. And, in general, the use of simulations is burgeoning in helping researchers explore and better understand analyses (Dorothy Bishop has some great blog posts about this). Thinking of these examples makes me wonder, though–can we do even better? Can we produce a fun and engaging microworld for the exploration of inferential statistics, one that would help scientists of all ages gain deep insight into the concepts at play? I have a couple of ideas… but nothing very firm yet, and even less time to start working on them.. But still, coming up with a logo of inference is definitely on my list of projects to take on.

I’m going to end with 3 examples of thank-you cards I received from the 3rd grade class I visited. All the cards were amazing–they genuinely made my week. I posted these to Twitter but thought I’d archive them here as well.

This kid has some great ideas for the future of AI

“I never knew neurons were a thing at all”–the joy of discovery
“Your job seems awesome and you are the best at it”—please put this kid on my next grant review panel.
  1. Calin-Jageman, R. (2017). Cartoon Network: A tool for open-ended exploration of neural circuits. Journal of Undergraduate Neuroscience Education : JUNE : A Publication of FUN, Faculty for Undergraduate Neuroscience, 16(1), A41–A45. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/29371840
  2. Calin-Jageman, R. (2018). Cartoon Network Update: New Features for Exploring of Neural Circuits. Journal of Undergraduate Neuroscience Education : JUNE : A Publication of FUN, Faculty for Undergraduate Neuroscience, 16(3), A195–A196. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/30254530

The Multiverse! Dances, and More, From Pierre in Paris

Our Open Science superego tells us that we must preregister our data analysis plan, follow that plan exactly, then emphasise just those results as most believable. Death to cherry-picking! Yay!

The Multiverse

But one of the advantages of open data is that other folks can apply different analyses to our data, perhaps uncovering interesting things. What if we’d like to explore systematically a whole space of analysis possibilities ourselves, to give a fully rounded picture of what our research might be revealing?

The figure below shows (a) traditional cherry-picking–boo!, (b) OCD following of fine Open Science practice–hooray!, and (c) off-the-wall anything goes–hmmm.

That fig is from a recent (in fact, forthcoming) paper by Pierre in Paris and colleagues. The paper is here, and the reference is below at the end (Dragicevic et al., 2019). The abstract below outlines the story.

The Multiverse, Live

Pierre and colleagues not only discuss the multiverse idea in that paper, but here they give neat interactive tools that allow any reader of several example papers to do the exploration themselves. Hover the mouse, or click, to explore the outcome of different analyses.


I suggest Sections 4 and 5 in the paper are especially worth reading. Section 4 discusses what’s called explorable multiverse analysis reports (EMARs), with a focus on mapping out just what a rich range of possibilities there often are for alternative analyses.

Then Section 5 grapples with the (large) practical difficulties of building, reviewing, and using EMARs, with the aim of increasing insight into research results. Cherry-picking risks need always to be at the forefront of our thinking. Preregistration of certain proposed uses of an EMAR could be possible, with possibly somewhat reduced cherry-picking risks.

Play Multiverse on Twitter

Matthew Kay, one of the team, gave a great overview in 8 posts to Twitter. See the posts, and a bunch of GIFs in action here.



Pierre Dragicevic, Yvonne Jansen, Abhraneel Sarma, Matthew Kay, Fanny Chevalier. Increasing the Transparency of Research Papers with Explorable Multiverse Analyses. CHI 2019 – The ACM CHI Conference on Human Factors in Computing Systems, May 2019, Glasgow, United Kingdom. 2019, <10.1145/3290605.3300295>.

Joining the fractious debate over how to do science best

At the end of the month (March 2019) the American Statistical Association will publish a special issue on statistical inference “after p values”. The goal of the issue is to focus on the statistical “dos” rather than statistical “don’ts”. Across these articles there are some common themes, but also some pretty sharp disagreements about how best to proceed. Moreover, there is some very strong disagreement about the whole notion of bashing p values and the wisdom of the ASA putting together this special issue (see here, for example).

Fractious argument is the norm in the world of statistical inference, hence the old joke that the plural of “statistician” is a “quarrel”. And why not? Debates about statistical inference get to the heart of epistemology and the philosophy of science–they represent the ongoing normative struggle to articulate how to do science best. Sharp disagreement over the nature of science is the norm–it has always been part of the scientific enterprise and it always will be. It is this intense conversation that has helped define and defend the boundaries of science.

Geoff has long been involved in debates over statistical inference and how to do science best, but this is new to me (Bob). I’m proud of the contribution we submitted to the ASA–I think it’s the best piece I’ve ever written. But I have to say that I go into the debate over inference (and science in general) with some trepidation. First, it is intrinsically gutsy to think you have something to say about how to do science best. Second, I’m the smallest of small-frys in the world of neuroscience–so it’s not like I have notable success at doing science to point to as a support for my claims. Finally, this ongoing debate has a long history and is populated by giants I look up to, most of whom (unlike me) have specialized in studying these topics. In my case, I’ve been learning on the go for the past ten years or so, starting from a foundation that involved plenty of graduate-level stats, but which didn’t even equip me to properly understand the difference between Bayesian and frequentist approaches to statistics.

As I wade into this fraught debate, I thought it might help me to reflect a bit on my own meta-epistemology–to articulate some basic premises that I hold to in terms of thinking about how to fruitfully engage in debate over inference and the philosophy of science. These premises are not only my operating rules, but also my philosophical courage–they explain why I think a noob like me can and should be part of the debate, and why I encourage more of my colleagues in the neurosciences and psychological sciences to tune in and jump in.

There are no knock-out punches in philosophy. This comes from one of my amazing philosophy mentors, Gene Cline. It has taken me a long time to both understand and embrace what he meant. As a young undergrad philosophy major I was eager to demolish–to embarrass Descartes’ naive dualism, to rain hell on Chalmer’s supposedly hard problems of consciousness, and to expose the circular bloviation of Kant’s claims about the categorical imperative. Gene (gradually) helped me understand, though, that if you can’t see any sense in someone’s philosophical position then you’re probably not engaging thoughtfully with their ideas, concerns, or premises (cf Eco’s Island of the Day Before). It’s easy to dismiss straw-person or exaggerated versions of someone’s position, but if you interpret generously and take seriously their best arguments, you’ll find that no deep philosophical debate is easily settled. I initially found this infuriating, but I’ve come embrace it. So I now look with healthy skepticism at those who offer knock-out punches (e.g. (Morey, Hoekstra, Rouder, Lee, & Wagenmakers, 2015)). I hope that in discussing my ideas with others to a) take their claims and concerns seriously, taking on the best possible argument for their position, and b) not to offer my criticisms as a sure and damning refutation… these only seem to exist when we’re not really listening to each other.i

Inference works great until it doesn’t. As Hume pointed out long ago, there is no logical glue holding together the inference engine. Inference assumes that the past will be a good guide to the future, but there is no external basis for this premise, nor could there be (A rare knockout punch in philosophy? Well, even this is still debated). Even if we don’t mind the circularity of induction, we still have to respect the fact that past is not always prelude: inference works great, until it doesn’t (c.f. Mark Twain’s amazing discussion in Life on the Mississippi). So whatever system of inference we want to support we should be clear-eyed that it will be imperfect and subject to error, and that when/how it breaks down will not always be predictable. This is really important in terms of how we evaluate different approaches to statistical inference–none will be perfect under all circumstances, so evaluations must proceed in terms of strengths/weaknesses and boundary conditions. The fact that an approach works poorly in one circumstance is not always a reason to condemn it. We can thoughtfully make use of tools that in under some circumstances are dangerous.

We don’t all want the same things. Science is diverse and we’re not all playing the game in exactly the same way or for the same ends. I see this every year on the floor of the Society for Neuroscience conference, where over 30,000 neuroscientists meet to discuss their latest research. The scope of the enterprise is hard to imagine, and the diversity in terms of what people are trying to do is staggering. That’s ok. We can still have boundaries between science and pseudoscience without having complete homogeneity of statistical, inferential, and scientific approaches. So beware of people telling you what you, as a scientist, want to know. Beware of someone condemning all use of a statistical approach because it doesn’t tell them what they,want to know. That’s my take on a good blog post by Daniel Lakens.

Nullius in verba* Ok – so we have to tread cautiously. But that does not devolve us into sophomoric inferential relativism (everyone’s right in some way; trophies for all!). We can still make distinctions and recognize differences. How? Well, to the extent that there is any “ground truth” in science it is the ability to establish procedures for reliably observing an effect. We could be wrong about what the effect means. But we’re not doing science if we can’t produce procedures that others can use to verify our observations. This is embodied in the founding of the Royal Society, which selected the motto Nullius in verba (verbum), which means “take no one’s word for it” or “see for yourself” (hat tip to a fantastic presentation by Cristobal Young on this). We can evaluate scientific fields for their ability to be generative this way–to establish effects that can be reliably observed and then dissected (not so fast, Psi research). We can also evaluate systems of inference in this way–for their ability (predicted or actual) to help scientists develop procedures to reliably observe effects. By this yardstick some methods of inference will be demonstrably bad (conducting noisy studies and then publishing the statistically significant results as fact while discarding the rest—bad!). But we should expect there to be multiple reasonable approaches to inference, as well as constant space for potential improvement (though usually with other tradeoffs). Oh yeah–this is a very slippery yardstick. It is not easy to discern or predict the fruitfulness of an inferential approach, and there can be strong disagreement about what counts as reliably establishing an effect.

This emphasis on replicability as essential to science cuts a tiny bit against my above point that not all scientists want the same thing. Moreover, in the negative reaction to the replication crisis, I’ve seen some commentaries where there seems to be little concern or regard for the standard of establishing verifiable effects. This, to my mind, stretches scientific pluralism past the breaking point: if you’re not bothered by a lack of replicability of your research, you’re not interested in science.

Authority will only get you so far. The debate over inference has a long history. It’s important not to ignore that . But it is equally important not to use historical knowledge as a cudgel; appeals to authority are not a substitute for good argument. Maybe it is my outside perception, but I feel like quotes from Fisher or Jeffreys or Meehl or sometimes weaponized to end discussion rather than contribute to it.

Ok – so those are my current ideas for how to approach arguments about science and statistical inference: a) embrace real statistical pluralism without letting go of norms and evaluation; b) ground evaluation (as much as possible) in what we think can best foster generative (reproducible) research, c) listen and take the best of what others have to offer, and d) try not to lean too heavily on the Fisher quotes.

At the moment, I’ve landed on estimation as the best approach for the statistical issues I face. I’m confident enough in that choice that I feel good advocating for the use of estimation for other scientists with similar goals. In advocating for estimation, I’m not going to claim a knock-out punch against p values or other approaches, or that the goals estimation can help with are the only legitimate goals to have. Moreover, in advocating for estimation, my goal is not hegemony. Hegemony of misusing p values is where we are currently at, and we don’t need to replace one imperial rule with another. I am helping a journal re-orient its author guidelines towards estimation (with or in place of p values)—but my goal is a diverse landscape of publication options in neuroscience, one where there are outlets for different but fruitful approaches to inference.

Ok – those are my thoughts for now on how to fruitfully debate about statistical inference.  I’m sure I have a lot to learn.  I’m looking forward to the special issue that will soon be out from the ASA and the debate that will surely ensue. 

*Thanks to Boris Barbour for pointing out I misquoted the Royal Society Motto in the original post.

  1. Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E.-J. (2015). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review, 103–123. doi:10.3758/s13423-015-0947-8

Journal Articles Without p Values

Once we have a CI, a p value adds nothing, and is likely to mislead and to tempt the writer or readers to fall back into mere dichotomous decision making (boo!). So let’s simply use estimation and never report p values, right? Of course, that’s what we advocate in UTNS and ITNS–almost always p values are simply not needed and may be dangerous.

Bob and I are occasionally asked for examples of published articles that use CIs, but don’t publish any p values. Good question. I know there are such articles out there, but–silly me–I haven’t been keeping a list.

I’d love to have a list–please let me know of any that you notice (or, even better, that you have published). (Make a comment below, or email g.cumming@latrobe.edu.au )

Here are a few notes from recent emails on the topic:

From Bob

First, if we look at the big picture, it is clear that estimation is on the rise.  CIs are now reported in the majority of papers both at Psych Science (which enjoins their use) *and* at JEP:General (which encourages them) (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0175583#pone-0175583-g001 ).  That suggests some good momentum.  However, you’ll see that there is no corresponding decline in NHST, so that means folks are primarily reporting CIs alongside p values.  It’s unclear, then, if this change in reporting is producing the desired change in thinking (Fidler et al. 2004 found that in medical journals CIs are reported but essentially ignored). 

As for specific examples…

·         Adam Claridge-Chang’s lab developed cool software for making difference plots, and their work now reports both p values and CIs but focuses primarily on the CIs.  The work is in neurobiology…so probably not the best example for psych graduate students

o   Eriksson, A., Anand, P., Gorson, J., Grijuc, C., Hadelia, E., Stewart, J. C., Holford, M., and Claridge-Chang, A. (2018), “Using Drosophila behavioral assays to characterize terebrid venom-peptide bioactivity,” Scientific Reports, 8, 1–13. https://doi.org/10.1038/s41598-018-33215-2.

·         In my lab, we first started reporting CIs along with p values but focusing discussion on the CIs.  For our latest paper we finally omitted the p values altogether, and yet the sky didn’t fall (Perez et al., 2018).  This is behavioral neuroscience, so a bit closer to psych, but still probably not really a great example for psych students.

o   Perez, L., Patel, U., Rivota, M., Calin-jageman, I. E., and Calin-jageman, R. J. (2018), “Savings memory is accompanied by transcriptional changes that persist beyond the decay of recall,” Learning & Memory, 25, 1–5. https://doi.org/10.1101/lm.046250.117.25.

·         I don’t read much in Psych Science, but here’s one paper that caught my eye that seems sensitive primarily to effect-size issues: 

o   Hirsh-Pasek, K., Adamson, L. B., Bakeman, R., Owen, M. T., Golinkoff, R. M., Pace, A., Yust, P. K. S., and Suma, K. (2015), “The Contribution of Early Communication Quality to Low-Income Children’s Language Success,” Psychological Science, 26, 1071–1083. https://doi.org/10.1177/0956797615581493.

·         Overall, where I the most progress is with the continued rise of meta-analysis and/or large data sets that are pushing effect-size estimates to the forefront of the discussion.  For example,

o   This recent paper examining screen time and mental health in teens.  It uses a huge data set, so the question is not “is it significant” but “how strong could the relationship be”.  They do a cool multiverse analysis, too. 

§  Orben, A., and Przybylski, A. K. (2019), “The association between adolescent well-being and digital technology use,” Nature Human Behaviour, Springer US. https://doi.org/10.1038/s41562-018-0506-1.

o   Or the big discussion on Twitter on if a significant finding of egodepletion of d = .10 means anything

§  https://twitter.com/hardsci/status/970015349499465729

More from Bob

I just came across this interesting article in Psych Science:

Nave, G., Jung, W. H., Karlsson Linnér, R., Kable, J. W., and Koellinger, P. D. (2018), “Are Bigger Brains Smarter? Evidence From a Large-Scale Preregistered Study,” Psychological Science, 095679761880847. https://doi.org/10.1177/0956797618808470.

This paper has p values alongside confidence intervals.  But the sample size is enormous (13,000 brain scans) so basically everything is significant and the real focus is on the effect sizes. 

It strikes me that this would be a great paper for debate about interpreting effect sizes.  Once controlling for other factors, the researchers find a relationship between fluid intelligence and total brain volume, but it is very weak: r = .19 95% CI[.17, .22] with just 2% added variance in the regression analysis.  The researchers describe this as “solid”.  I think it would be interesting to have students debate: does this mean anything, why or why not?  

There’s also some good measurement points to make in this paper—they compared their total brain volume measure to one extracted from by the group that collected the scans and found r = .91… which seems astonishingly low for using the exact same data set.  If about 20% of the variance in this measure is error, it makes me wonder if a relationship with 2% of the variance could be thought of as meaningful. 

Oh, and the authors also break total brain volume down into white matter, gray matter, and CSF.  They find the corrected correlation with fluid intelligence to be r = 0.13, r = 0.06, and r = 0.05, with all 3 being statistically significant.  This, to me, shows how little statistical significance means.  It also make me even more worried about interpreting these weak relationships… I can’t think of any good reason why having more CSF would be associated with higher intelligence.  I suspect that their corrections for confounding variables were not perfect (how could they be) and that the r = .05 represents remaining bias in the analysis.  If so, that means we should be subtracting an additional chunk of variance from the 2% estimate. 

Oh yeah, and Figure 1 shows that their model doesn’t really fit very well to the extremes of brain volume. 

From Geoff

In this post, I note that an estimated 1% of papers in a set of HCI research papers report CIs but don’t seem to have signs of NHST or dichotomous decision making, but I don’t have any citations.

So, between us (ha, almost all Bob), Bob and I can paint a somewhat encouraging picture of progress in adoption of the new statistics, while not being able to pinpoint may articles that don’t include p values at all. As I say, let’s know of any you come across. Thanks!


Teaching The New Statistics: The Action’s in D.C.

The Academy Awards are out of the way, so we can focus on what’s really important: the APS Convention, May 23-26, 2019, in Washington D.C.

For the first time for many years I won’t be there, but new-statistics action continues at the top level. After Tamarah and Bob’s great success last year, APS has invited them back. They will give an update of their workshop on teaching the new statistics–and, of course, Open Science.

The Convention website is here. You can register here. Note that workshops require additional registration–which is not expensive, and even less so for students. Register by 15 April for early-bird rates.

The workshops site is here, with lots of juicy goodies. But Tamarah and Bob’s will undoubtedly be the highlight!