Danny Kahneman: From p values to Nobel Prize

You meet a red-headed person who is a bit short-tempered then, later, another who is similarly touchy. You start to believe that red hair signals ‘watch out’. Really? You are leaping to a conclusion from an extremely small sample! But humans seem to have a strong tendency to draw conclusions from small samples–we tend to believe that even a tiny sample is likely to closely resemble the population it came from.

This ‘law’–actually a fallacy–was described in the classic 1971 article by Amos Tversky and Daniel Kahneman. This article presented evidence that even research psychologists trained in mathematics and statistics often tend to severely underestimate the extent of sampling variabiity.

A related finding was that they tend to greatly overestimate how close the result of a replication experiment is likely to be to the original experiment. Sound familiar? Decades later, the dance of the p values and significance roulette convey versions of the same message–especially in relation to p values. Replicate, and the results may well be surprisingly different, alas–especially if p values are involved.

I mention this because I’ve just finished reading The Undoing Project: A Friendship That Changed Our Minds, by Michael Lewis. You may know some of Lewis’s earlier books, perhaps Moneyball, The Big Short, or Flashboys.

In Undoing, Lewis tells the story of Amos Tversky and Danny Kahneman, two brilliant minds and two very different people. They talked and laughed for days and days and decades, dreamed up novel experiments, and identified all sorts of strange ways that people reason and make decisions. They also fought for Israel in several wars, their most important contributions coming from their psychological insights.

It’s a gripping tale, well told, and an insight into one highly successful way to do creative and collaborative science. After the law of small numbers came behavioral economics and in 2002, sadly after Tversky’s death, the Nobel Prize in economics.

In ITNS (pp 359-360) we discuss the example of the flying instructor who gave praise for a smooth landing by a trainee pilot, and harsh words for a rough landing. Regression to the mean can explain what the instructor observed: after praise, most often a less good landing, but after harsh words quite likely a better landing. Lewis explained that this example came straight from Kahneman’s observation of pilot training in the Israeli air force in the early days. Kahneman’s insight that regression to the mean could account for the puzzling observations led to changes in the way pilots were trained.

Lewis’s book could use some editing, and you can easily skip Chapter 1, but even so it’s a great read. You can get a flavor from this article in The New Yorker by Cass Sunstein and Richard Thaler–which is definitely worth a look.

Enjoy…
Geoff

P.S. Personal note. Kahneman is married to Anne Treisman, herself a distinguished psychologist, who was my doctoral advisor in Experimental Psychology at Oxford.

Lewis, M. (2016). The undoing project. A friendship that changed our minds. W. W. Norton. ISBN 978-0-393-25459-4
Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76, 105-110.

Posted in ITNS, NHST, Replication

Funnel plots, publication bias, and the power of blogs

On March 21, 2017 Uri Simonsohn revealed an interesting new blog post on funnel plots, arguing based on some simulations that they are not as useful for detecting publication bias as might be thought http://datacolada.org/58.  It’s an interesting post, and worth reading.  As far as I could tell, though, the problems only apply to meta-analyses that incorporate a range of studies in which researchers expect very different effect sizes… which to me seems kind of against the idea of doing the meta-analysis to begin with.  So I’m not sure if the circumstances in which funnel plots are miseleading are very common… maybe?  I’m working on a meta-analysis right now, but it compiles a large set of very similar studies and I don’t see any possibility that researchers are customizing their sample sizes based on effect size expectations in the way that produces trouble in the Simonsohn blog post.  But that’s just my project.

Anyways, there’s a meta-story here as well.  Just 10 days after Simonsohn’s blog post hit the interwebs, there was a write up about it in Nature  (Cressey, 2017) .  Wow.  The summary in Nature interviewed some folks to get their responses, but these seem to have drawn primarily from folks who initially responded via Twitter.

The process here is pretty amazing: a researcher posts some simulations and commentary on their blog, ‘reviews’ pour in from the Twitter-verse, and 10 days latter a high-profile journal summarizes that whole discourse.  What a neat model for a rapid-feedback form of science in the future.  The only problem is: only a very few people have blogs that could garner that much attention, regardless of the quality/importance of the post.  Maybe there should be a blog collective or something like that where researchers post.  Maybe this is the publishing model of the future?

 

References

Cressey, D. (2017, March 31). Tool for detecting publication bias goes under spotlight. Nature. Springer Nature. https://doi.org/10.1038/nature.2017.21728
Posted in Uncategorized

To what extent do new statistical guidelines change statistical practice?

In 2012 the Psychonomic Society (PS) adopted a set of forward-thinking guidelines for the use of statistics in its journals . The guidelines stressed the use of a priori sample-size planning, the reporting of effect sizes, and the use of confidence intervals for both raw scores and standardized effect-size measures.  Nice!

To what extent did these guidelines alter statistical practice?  Morris & Fritz (2017) report an natural experiment undertaken to help answer this question (Morris & Fritz, 2017) .  They analyzed papers printed before and after the guidelines were released (2013 and 2015; the 2013 data is actually after the guidelines were released, but all the papers analyzed were accepted for publication prior to the release).  The papers analyzed were from journals published by PS or from a journal of similar caliber and content but not subject to new guidelines.  In total, about 1000 articles were assessed (wow!).

What were the findings? Slow, small, but detectable improvement with tremendous room for further improvement:

  • Use of a priori sample size planning increased from 5% to 11% in PS journals but did not increase in the control journal.
  • Effect size reporting increased from 61% to 70%, though this increase was mirrored in the control journal
  • Use of raw score confidence intervals increased from 11% to 18%, with no change in the control journal
  • Confidence intervals for standardized effect sizes were reported in only 2 papers from 2015; an improvement from 0 papers in 2013, but hardly consistent with the PS guidelines to include these.

The authors conclude that more must be done, but don’t offer specifics.  Suggestions?

 

References

Morris, P., & Fritz, C. (2017). Meeting the challenge of the Psychonomic Society’s 2012 Guidelines on Statistical Issues: Some success and some room for improvement. Psychonomic Bulletin & Review. [PubMed]
Posted in Open Science, Stats tools, The New Statistics

From NHST to the New Statistics — How do we get there?

APS just wrapped up.  Geoff and I were privileges to help host a symposium on making progress moving the field away from p values towards the New Statistics.  Our co-conspirators were fellow text-book author Susan Nolan, Psychological Science editor Stephen Lindsay, and stats teaching wizard Tamarah Smith.  We each offered our perspectives on some of the road-blocks to abandoning the safety blanket of the p value.  The session was lively, with great audience feedback and discussion.  I’ve posted here each speakers’ slides:

  • Introduction by Geoff Cumming — a quick recap of the long history of calling for the end of the p value, and encouragement to make this time the time we really make the change. Slides are here.
  • The Textbook Writers Perspective by Susan Nolan — Susan reviewed some of the inertia holding back substantive change in statistics textbooks (both in her excellent textbook and for others).  She also offered hopeful insight from the sea-change that has occurred in the teaching of projective texts in clinical psychology.   Slides are here.
  • The Instructors’ Perspective by Tamarah Smith and Bob Calin-Jageman.  Tamarah and I considered the surprising complexities of making changes in the undergraduate statistics curriculum.  We reviewed some promising software tools for making the change (JASP, Jamovi, and a new set of extensions for SPSS) and discussed some of the strategies we had found helpful in incorporating the New Statistics into the classroom.  Slides are here.
  • The Editor’s Perspective by Stephen Lindsay.  Stephen discussed some of the very substantive policy changes implemented at Psych Science beginning with prior editor Eric Eich.  These include an emphasis on effect sizes, requirements for sample-size justification, and badges for different open-science practices.  Although these stringent requirements reduced submissions somewhat, they are having a real impact, hopefully towards better replicability.  Slides are here.

Some useful resources and papers discussed during the symposium:

  • Getting Started in the New Statistics – An set of links, guides, and resources for teaching the New Statistics.  Hosted via the Open-Science Framework.  A crowd-sourced effort; request to be an editor to join in.  https://osf.io/muy6u/
  • JASP – free, open-source alternative to SPSS.  Focuses on Bayesian statistics, but makes it pretty easy to use confidence intervals with most analyses and graphs.  https://jasp-stats.org/
  • JAMOVI – another free, open-source alternative to SPSS.  Also does pretty well with confidence intervals on some types of analyses.  https://www.jamovi.org/
  • ESPSS – a set of extensions to get SPSS to do New Statistics well. Still under development, but feel free to live on the bleeding edge.  Currently supports independent samples t-test, paired samples t-test, and correlations.  https://github.com/rcalinjageman/ESPSS
  • A fascinating paper Geoff contributed to examining the impact of the APS guidelines on different publication practices: (Giofrè, Cumming, Fresc, Boedker, & Tressoldi, 2017).  (full link below)
  • Stephen’s most recent editorial enjoining new standards for data sharing in Psychological Science (Lindsay, 2017) (BRAVO!) (full link below)
  • Stephen’s earlier editorial about replication and standards at Psychological Science (Lindsay, 2015).
  • The famous “Business not as usual” editorial from Eric Eich that got things moving in a great direction at Psychological Science (Eich, 2014). (full link below)
  • Susan Nolan’s excellent undergrad statistics textbook with Richard Heinzen: http://www.macmillanlearning.com/catalog/Product/statisticsforthebehavioralsciences-thirdedition-nolan
  • The APA’s 2.0 revision of Undergraduate Learning Goals for Psychology majors, which describes goals related to statistical/scientific thinking for psychology majors: http://www.apa.org/ed/precollege/about/psymajor-guidelines.pdf

References

Eich, E. (2014, January). Business Not as Usual. Psychological Science. SAGE Publications. https://doi.org/10.1177/0956797613512465
Giofrè, D., Cumming, G., Fresc, L., Boedker, I., & Tressoldi, P. (2017, April 17). The influence of journal submission guidelines on authors’ reporting of statistics and use of open research practices. (J. M. Wicherts, Ed.), PLOS ONE. Public Library of Science (PLoS). https://doi.org/10.1371/journal.pone.0175583
Lindsay, D. S. (2015, December). Replication in Psychological Science. Psychological Science. SAGE Publications. https://doi.org/10.1177/0956797615616374
Lindsay, D. S. (2017, April 17). Sharing Data and Materials in Psychological Science. Psychological Science. SAGE Publications. https://doi.org/10.1177/0956797617704015
Posted in NHST, Open Science, Teaching, The New Statistics

Getting the whole story: journals could be more encouraging

Even though replication is a cornerstone of the scientific method, psychology journals rarely publish direct replications (though that situation may be changing).  Why not?  Is it self-censorship, with authors not bothering to conduct or submit such studies?  Or is it that the journals discourage replications?

Here’s a paper with some answers: Martin & Clark, 2017

The authors scanned the editorial guidelines of over 1000 psychology journals, flagging any mention of replication.  The found that only 3% explicitly encourage replication papers.  What about the other journals?  Two thirds didn’t mention replications at all, 33% seemed to implicitly discourage replications, and 1% explicitly discouraged replications. Yikes!

The times they are a changing, but there’s still a long way to go:

References:

Martin, G. N., & Clarke, R. M. (2017). Are Psychology Journals Anti-replication? A Snapshot of Editorial Practices. Frontiers in Psychology, 8(April), 523. https://doi.org/10.3389/fpsyg.2017.00523

Posted in Open Science, Replication

from the APS Convention in Boston

Bob and I are in Boston this weekend for the annual APS Convention. It’s great to catch up, and discuss a million things about ITNS and this blog, and our future plans. Our publisher told us yesterday that early signs from the field are super-encouraging, which is great. It can of course be a big job to persuade your colleagues to adopt a different approach to intro statistics, but people are taking on this challenge. So the future for teaching OS and the new statistics is bright.

This year there aren’t as many sessions on Open Science issues as in the past 3 years, but still lots of goodies. There’s a real buzz about MPPS (or AIMPIPS!), the new APS journal, which should really help push OS forward.

Bob and I ran a symposium yesterday. The room was packed–standing room only–by close to 100 people. Our focus was on the challenges of putting TNS and OS into practice, and strategies that can help. After our 4 brief presentations we had more than 30 minutes of excellent discussion. Many, many people see the need and are working on how they can, in practice, get real change.

Below is the outline. Bob will post the 4 sets of slide shortly. Watch this space…

Geoff

Posted in ITNS, Open Science, Teaching, The New Statistics

Confirmatory Research – A special issue of JESP

Catching up a bit, but in November of 2016 the Journal of Experimental Social Psychology published a special issue dedicated just to confirmatory research.  http://www.sciencedirect.com.proxy.cc.uic.edu/science/journal/00221031/67/supp/C

The whole issue is well-worth reading:

  • There is  an excellent guide to pre-registration (ostensibly for social psychologists, but really for anyone).   (van ’t Veer & Giner-Sorolla, 2016)
  • Lots of interesting pre-registered studies, like this one  (McCarthy, Coley, Wagner, Zengel, & Basham, 2016)
  • Many Labs 3 is published, which is completely fascinating.  (Ebersole et al., 2016)
  • And some fascinating commentaries, including this one about using MTurk samples.  (DeVoe & House, 2016)

References

DeVoe, S. E., & House, J. (2016, November). Replications with MTurkers who are naïve versus experienced with academic studies: A comment on Connors, Khamitov, Moroz, Campbell, and Henderson (2015). Journal of Experimental Social Psychology. Elsevier BV. https://doi.org/10.1016/j.jesp.2015.11.004
Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., … Nosek, B. A. (2016, November). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology. Elsevier BV. https://doi.org/10.1016/j.jesp.2015.10.012
McCarthy, R. J., Coley, S. L., Wagner, M. F., Zengel, B., & Basham, A. (2016, November). Does playing video games with violent content temporarily increase aggressive inclinations? A pre-registered experimental study. Journal of Experimental Social Psychology. Elsevier BV. https://doi.org/10.1016/j.jesp.2015.10.009
van ’t Veer, A. E., & Giner-Sorolla, R. (2016, November). Pre-registration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology. Elsevier BV. https://doi.org/10.1016/j.jesp.2016.03.004
Posted in Open Science, Replication

A cool new journal is Open

APS (The Association for Psychological Science) recently launched its sixth journal: Advances in Methods and Practices in Psychological Science. A dreadful mouthful of a title–why not drop ‘Advances in’ for a start–but it looks highly promising. Maybe it will become known as AIMPIPS?

The foundation editor is Dan Simons, perhaps most widely known for the invisible gorilla, although he’s done lots besides.

The announcement of the new journal is here and the main site for the journal is here.

It’s worth browsing the Submission Guidelines, especially the ‘General Journal Information’ and ‘Statistics’ sections. Also the stuff about ‘Registered Replication Reports’ and ‘Registered Reports’–peer review before data collection, gasp! Open Science is in just about every sentence. Such guidelines could hardly have been dreamt of a few years ago. Here’s wishing it every success in helping make research better–and more open.

Geoff

Posted in Open Science, Replication

Publishing unexpected results as a moral obligation for scientists

Amen!

A revised European Code of Conduct for Research Integrity now specifically calls on researchers and publishers to not bury negative results.  Specifically, the guidelines formulate this principle for publication and dissemination:

  • Authors and publishers consider negative results to be as valid as positive findings for publication and dissemination.

My only complaint is this continued red herring of ‘positive’ and ‘negative’ results.  There are results.  We should explore if the results are reliable and valid, but your particular liking of the result shouldn’t enter into the equation.  Still, a big step forward.

I found out about this through this news blurb:

https://www.timeshighereducation.com/news/stop-binning-negative-results-researchers-told

Posted in Open Science

What the datasaurus tells us: Data pictures are cool

In various places in ITNS, especially Chapter 11 (Correlation) we discuss how important it is to make good pictures of data, to reveal what’s really going on. Calculating a few summary statistics–or even CIs–often just doesn’t do the job.
Many statistics textbooks use Anscombe’s Quartet to make the point: the 4 scatterplots below, all of which have the same (or very close to the same) mean and SD of both X and Y, and also correlation between X and Y.

Now some clever folks have worked out how to generate an unlimited number of datasets all with these same summary statistics (or very close) but weirdly different shapes and patterns. One of the pics is the dotted outline of the datasaurus, below. Click here to see a dozen pics cycle through. (At that site, click on the changing pic to go to the paper that describes what’s in the engine room.)

We may never trust a Pearson r value again, and that may not be a terrible thing!
Geoff
P.S. Thanks to Francis S. Gilbert for the heads up.

Posted in Statistical graphics