Red, Romance, and Replication

I have a new replication paper out today, a collaboration with DU student Elle Lehmann (Lehmann & Calin-Jageman, 2017) .  The OSF page for the paper with all the materials and data is here: (Calin-Jageman & Lehmann, 2015)

The paper replicates a set of previous findings showing that the color red dramatically increases romantic attraction for both women rating men (A. J. Elliot et al., 2010) and men rating women (A. Elliot & Niesta, 2008).  Elle and I conducted two replications: one in-person with a standard psychology participant pool, the other online with MTurk participants.  In each case we planned for an informative sample, used original materials, pre-registered our design and analysis plan, and used extensive exclusion criteria to ensure suitable participants (e.g. testing for color-blindness).  In both cases, we are sad to report that there was little-to-no effect of red on perceived attractiveness or desired sexual behavior.

Example of the types of stimuli used in red-romance studies (not the actual stimuli we used, though)

There were a few weaknesses: 1) for the in-person study we didn’t obtain nearly enough men to make a good test of the hypothesis, 2) for the online study we couldn’t control the exact parameters for the color red.  Still, we found no strong evidence that incidental red influences perceived attractiveness.

Beyond the (disappointing) replication results, there are some really interesting developments to this story:

  • Our replication work drew the attention of science journalist Dalmeet Singh who wrote a cool article summarizing the field and our contribution for Slate.  Dalmeet has made covering negative results a part of his beat–how great is that!
  • There have been some questions about these studies almost from the start.  Greg Francis highlighted the fact that the original study of women rating men by Elliot & Niesta (2008) is just too good to be true–every study was statistically significant despite very low power, something that ought not to regularly happen (Francis, 2013) .
  • Although there have been some studies showing red effects (though often in subgroups or only with some DVs), there is a growing number of studies reporting little-to-no effect of red manipulations on attraction: (Hesslinger, Goldbach, & Carbon, 2015) (Peperkoorn, Roberts, & Pollet, 2016) (Seibt, 2015) (Lynn, Giebelhausen, Garcia, Li, & Patumanon, 2013) (Kirsch, 2015) plus a whole raft of student-led precise replications that were part of the CREP project (Grahe et al., 2012) :
  • To help make sense of the data, Elle and I embarked on conducting a meta-analysis.  It has turned out to be a very big project.  We hope we’re nearly ready for submission.
  • Andrew Elliot, the original investigator, was extremely helpful in assisting with this replication.  Then, as the meta-analysis progressed, he became even more involved and has now joined the project as a co-author.  The project’s still not complete yet, but I’ve really enjoyed working with him, and I’m proud that this will (hopefully) become an example of how collegial and productive replication work can be towards better and more cumulative science.


Calin-Jageman, R., & Lehmann, G. (2015). Romantic Red – Registered Replications of effect of Red on Attractiveness (Elliot & Niesta, 2008; Elliot et al. 2010). Open Science Framework. [Source]
Elliot, A. J., Niesta Kayser, D., Greitemeyer, T., Lichtenfeld, S., Gramzow, R. H., Maier, M. A., & Liu, H. (2010). Red, rank, and romance in women viewing men. Journal of Experimental Psychology: General, 139(3), 399–417.
Elliot, A., & Niesta, D. (2008). Romantic red: red enhances men’s attraction to women. Journal of Personality and Social Psychology, 95(5), 1150–64. [PubMed]
Francis, G. (2013). Publication bias in “Red, rank, and romance in women viewing men,” by Elliot et al. (2010). Journal of Experimental Psychology. General, 142(1), 292–6. [PubMed]
Grahe, J. E., Reifman, A., Hermann, A. D., Walker, M., Oleson, K. C., Nario-Redmond, M., & Wiebe, R. P. (2012). Harnessing the Undiscovered Resource of Student Research Projects. Perspectives on Psychological Science, 7(6), 605–607.
Hesslinger, V. M., Goldbach, L., & Carbon, C.-C. (2015). Men in red: A reexamination of the red-attractiveness effect. Psychonomic Bulletin & Review, 22(4), 1142–1148.
Kirsch, F. (2015). Wahrgenommene Attraktivität und sexuelle Orientierung. Springer Fachmedien Wiesbaden.
Lehmann, G. K., & Calin-Jageman, R. J. (2017). Is Red Really Romantic? Social Psychology, 48(3), 174–183.
Lynn, M., Giebelhausen, M., Garcia, S., Li, Y., & Patumanon, I. (2013). Clothing Color and Tipping. Journal of Hospitality & Tourism Research, 40(4), 516–524.
Peperkoorn, L. S., Roberts, S. C., & Pollet, T. V. (2016). Revisiting the Red Effect on Attractiveness and Sexual Receptivity. Evolutionary Psychology, 14(4), 147470491667384.
Seibt, T. (2015). Romantic Red Effect in the Attractiveness Perception. In Proceedings of The 3rd Human and Social Sciences at the Common Conference. Publishing Society.
Posted in Open Science, Replication

“The best conference…”: More about SIPS

Here’s more about the recent SIPS Conference, from my colleague Fiona Fidler who was clever enough to be there.

(Some background: Fiona, of The University of Melbourne, is among other things a psychologist, ecologist, and historian of science. Her PhD thesis, From statistical significance to effect estimation: Statistical reform in psychology, medicine and ecology is a great read and, imho, the definitive telling of the NHST story. I’m still hoping that her planned book will appear soon.)

Fiona writes:

“They should market SIPS as a cure for academic depression. It’s probably the best conference I’ve ever been to. (Prior to SIPS, the 2014 APS conference held this title for me). There’s no sense in which it is a usual conference. For one thing, there are no talks. That’s not entirely true, there were 5-min lightning talks, mostly worked on the day, inspired by the previous day’s events, or conversations had earlier that morning, mostly used to propose new, not quite fully-formed ideas to see if the crowd had any interest. The rest of the time was spent in hackathons (e.g., developing a new modular methods syllabus), workshops (e.g., writing guidelines for being an open science advocate as a reviewer), or in deep discussion about ways to improve diversity in open science. All the materials developed and much of the SIPS discussion is captured here.

“Big news items include the new SIPS journal and of course, the COS-APA partnership.

“A particular highlight for Hannah Fraser (my new ecology postdoc) was the chance to talk to Katie Corker and Simine Vazire about how SIPS was started. We think we may only be a year away from having a SIPS for ecology (clearly need a new acronym!). Brian Nosek was very happy to offer support for starting a new group, and we spent some time with him talking about the first steps for that too.

“Lots of interest in philosophy of science. There’ll be a special section in the new sips journal devoted to falsification in a couple of months. They’ve all just discovered Dienes book, and Chalmers! And there’s a group very excited about Paul Meehl’s old lectures which are now available. They were happy to hear my stories about interviewing him. Apparently, I am now a historian. I guess this is part of ageing? The whole thing did make me feel like I was about 150 years old. It’s a very young crowd. Which is of course great, but comes with the inclination to think this whole movement started in 2011.

“You should… go next year. It’s super fun.”

Posted in Open Science

Today’s news from SIPS: Getting better…

A while ago I wrote about SIPS. Today came an email following the second SIPS meeting, a couple of weeks ago at COS in Charlottesville, VA. Below is some of the email:

“We had an invigorating conference, and are humbled by the amount of energy and support from all of you, including many who were not at the conference. 200 of you attended (out of about 1,500 on this mailing list…

Mark your calendars
SIPS 2018 will be in Grand Rapids, Michigan, USA June 23-27.
(SIPS 2019 will be somewhere in Europe, exact date and location TBD.)

We’re official!
We’re incorporated! This means we’ll soon be ready to offer official SIPS membership, complete with voting rights, discounted registration to next year’s conference, and many other perks! Look out for an email soon with instructions on how to become a SIPS member – we’d love your support!

We have a journal!
SIPS has officially partnered with the open access, UC press journal Collabra: Psychology. Stay tuned for more developments…

Get Involved
Want to catch up or jump in on projects started during SIPS 2017? Take a look at the projects on the OSF page, and feel free to contact the group leaders if you’d like to get involved!”

Alas I wasn’t there, but there’s clearly a massive amount of good OS work going on. Have a squiz at the program–scroll down to p. 6 and beyond to see some of the main projects underway.

You can join the SIPS mailing list here. I think it’s fantastic that, in particular, so many grad students and post-docs are taking the initiative to work on improving our discipline in a very practical and basic way–by improving the way we do our science.


Posted in Open Science

p values and outrageous results

If you were researching a muscle-building supplement and read that a test of the supplement produced an increase in muscle mass by 200% within a month, you’d be right to be skeptical.  Perhaps randomization had broken down, perhaps there was a problem of measurement, or perhaps differential dropout had skewed the results..  Of course, maybe the results will generalize, but in a way the experiment is too successful–the effect size is just too strong to fit with what we know about human physiology.  Given that extraordinary claims require extraordinary evidence, it is wise to suspect a problem with an experiment that produces an outrageous effect size, at least until additional evidence can be collected.

One of the many (many!) problems with p values is that they can disguise outrageous results.  When researchers fixate on if p is less than alpha, they tend to not even consider the effect sizes, short-circuiting the critical step of judging if the effect size obtained is at all reasonable.  Even worse, outrageous effect sizes produce small p values, so researchers can become even more confident in results they really ought to know are problematic.  To corrupt the Bard: p values let us obliviously suffer the slings and arrows of outrageous effect sizes.

This may seem hard to believe: surely p values are not so mesmerizing as to cut off all critical thinking about the data obtained.  If only.  Here are two examples I recently came across where researchers reported outrageous effect sizes without really appreciating that they were making extraordinary claims.  Apparently, peer reviewers also succumbed to p-hypnosis.

  • In one case, a respected lab published a paper showing an enormously gigantic effect of watching a violent tv show on children’s behavior.  The outrageous nature of the finding was never commented on until a graduate student reviewing the literature happened to read the paper and ‘do a double-take’ on calculating the effect size.  When the grad student contacted the original lab, it triggered a process culminating in the retraction of the paper.  Here’s the whole story in retraction watch:
  • And then here’s a great blog post from Daniel Lakens about a (in)famous study of how Isreali judges alter sentencing across the day.  Lots of folks have had issues with the study, but Lakens points out that the first and primary problem is that the effect size obtained is just far, far too large to be at all driven purely by time of the day.  His post is well worth a read.

So, add this to the many (many) reasons to avoid p values.  Sometimes experiments go horribly awry, leading to outrageous (and likely erroneous results).  If you only monitor p values, you could remain oblivious and/or even celebrate from within the smouldering wreck of an experiment.


p.s. – I (Bob) have been on hiatus from blogging most of the summer… I helped organize a teaching of neuroscience conference which ended up being fairly all-consuming (though well worth it).  I’m hoping to be back to weekly posts now.

Posted in NHST

Open Science is not all the same: What archaeology can teach us

There’s no simple dot point way to adopt Open Science and improve the trustworthiness of science. A fascinating story from archaeology illustrates that reality nicely. First, the story.

Archaeologists have long studied when the out-of-Africa spreading of modern humans first reached Australia. About 47,000 years ago has been the recent conclusion. But now stunning new finds and very fancy dating analyses have pushed that date back to a mind-boggling 65,000 years, long before modern humans are believed to have entered Europe about 45,000 years ago.

It’s worth reading about the painstaking digging under the tropical sun, then the also-painstaking lab work, in this brief article in The Conversation. Dating thousands of sand grains individually! Among other startling claims, the researchers say they found the world’s oldest know edge-ground hatchets. The research was announced in Nature.

Now for Open Science. In a brief companion article in The Conversation, the archaeologists write that:

“Reproducibility is not just a minor technicality for specialists; it’s a pressing issue that affects the role of modern science in society.” (Yay! Indeed!) and that “It might come as a surprise that archaeologists are at the forefront of finding ways to improve the situation.”

The researchers described their three-pronged strategy for improving the trustworthiness of their conclusions:

1. Replication in archaeology is of course not simply a matter of running a study again with a new group of participants. The researchers did something analogous by going back to a site that had previously yielded very old artifacts, and dug deeper and wider. They used modern precise methods to document more than 10,000 stone artefacts.

2. Dating the samples. Duplicate samples were sent, blind, to an independent laboratory. That lab found results that closely matched the researchers’ own dating analyses. (Phew!)

3. Data analysis. Instead of using standard packages (hello SPSS) and reporting only the results, the researchers wrote their own scripts in R, then published the scripts, along with the data and full analyses. This is a fine example of best Open Science practice–take note Psychology!

I hope those two articles made for an enjoyable little interlude in your day.

P.S. On a personal note, I’ve been off-air for a while as my wife and I have moved house–from suburban Melbourne within bike distance of La Trobe University, to Woodend, a very friendly town of about 4,000, about an hour up the railway line (or the freeway) from central Melbourne. Early days, but so far, so very good. (And 3 of our 7 grandchildren live just around the corner.)

Posted in Applied research, Open Science

Pictures of uncertainty: Dancing with Pierre in Paris

A while back I wrote a post about Pierre Dragicevic, an HCI researcher in Paris who for years has been working to persuade researchers in his field to adopt better statistical methods.

I wrote about his wonderful talk that presents lots of different dances–not only of means, but p values, and of CIs, dichotomous decisions, and several others things. (At that link, scroll down a little to ‘Materials’ for download of the slides, list of references, and more.) Each dance is a picture of uncertainty or, rather, a movie of how uncertainty is represented for successive samples in a simulation.

Click here to see all the dances in action.

He recently let me know that he’d given the talk again. You can see this latest talk here. At about 4.50, note the great quote from Andrew Gelman:

“Statistics has been described as the science of uncertainty. But, paradoxically, statistical methods are often used to create a sense of certainty where none should exist.”

That’s the heart of the NHST vs estimation question: a p value can easily give a seductive but illusory sense of certainty, but a CI puts the extent of uncertainty in our faces.

After watching the talk again, I mentioned to him that he might have brought out more strongly the advantage of CIs over p values that a CI gives us a useful indication of its dance–CI length indicates the general width of the dance–whereas a p values says almost nothing about the dance from which it comes. He agreed, but noted that the implications of dances need to be learned and, in his experience, students can have difficulty building good intuitions about dances.

That’s a good point. In ITNS Bob and I discuss CI dances, and interpretation of a CI in terms of how its length tells about what’s likely to happen on replication. We have found this approach to work very well with students, but I’d love to see empirical investigations of how effective our approach to teaching the new statistics is, in practice. Anyone up for the challenge of doing that?


Posted in ITNS, Open Science, Replication, Statistical graphics, Teaching

Danny Kahneman: From p values to Nobel Prize

You meet a red-headed person who is a bit short-tempered then, later, another who is similarly touchy. You start to believe that red hair signals ‘watch out’. Really? You are leaping to a conclusion from an extremely small sample! But humans seem to have a strong tendency to draw conclusions from small samples–we tend to believe that even a tiny sample is likely to closely resemble the population it came from.

This ‘law’–actually a fallacy–was described in the classic 1971 article by Amos Tversky and Daniel Kahneman. This article presented evidence that even research psychologists trained in mathematics and statistics often tend to severely underestimate the extent of sampling variabiity.

A related finding was that they tend to greatly overestimate how close the result of a replication experiment is likely to be to the original experiment. Sound familiar? Decades later, the dance of the p values and significance roulette convey versions of the same message–especially in relation to p values. Replicate, and the results may well be surprisingly different, alas–especially if p values are involved.

I mention this because I’ve just finished reading The Undoing Project: A Friendship That Changed Our Minds, by Michael Lewis. You may know some of Lewis’s earlier books, perhaps Moneyball, The Big Short, or Flashboys.

In Undoing, Lewis tells the story of Amos Tversky and Danny Kahneman, two brilliant minds and two very different people. They talked and laughed for days and days and decades, dreamed up novel experiments, and identified all sorts of strange ways that people reason and make decisions. They also fought for Israel in several wars, their most important contributions coming from their psychological insights.

It’s a gripping tale, well told, and an insight into one highly successful way to do creative and collaborative science. After the law of small numbers came behavioral economics and in 2002, sadly after Tversky’s death, the Nobel Prize in economics.

In ITNS (pp 359-360) we discuss the example of the flying instructor who gave praise for a smooth landing by a trainee pilot, and harsh words for a rough landing. Regression to the mean can explain what the instructor observed: after praise, most often a less good landing, but after harsh words quite likely a better landing. Lewis explained that this example came straight from Kahneman’s observation of pilot training in the Israeli air force in the early days. Kahneman’s insight that regression to the mean could account for the puzzling observations led to changes in the way pilots were trained.

Lewis’s book could use some editing, and you can easily skip Chapter 1, but even so it’s a great read. You can get a flavor from this article in The New Yorker by Cass Sunstein and Richard Thaler–which is definitely worth a look.


P.S. Personal note. Kahneman is married to Anne Treisman, herself a distinguished psychologist, who was my doctoral advisor in Experimental Psychology at Oxford.

Lewis, M. (2016). The undoing project. A friendship that changed our minds. W. W. Norton. ISBN 978-0-393-25459-4
Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76, 105-110.

Posted in ITNS, NHST, Replication

Funnel plots, publication bias, and the power of blogs

On March 21, 2017 Uri Simonsohn revealed an interesting new blog post on funnel plots, arguing based on some simulations that they are not as useful for detecting publication bias as might be thought  It’s an interesting post, and worth reading.  As far as I could tell, though, the problems only apply to meta-analyses that incorporate a range of studies in which researchers expect very different effect sizes… which to me seems kind of against the idea of doing the meta-analysis to begin with.  So I’m not sure if the circumstances in which funnel plots are miseleading are very common… maybe?  I’m working on a meta-analysis right now, but it compiles a large set of very similar studies and I don’t see any possibility that researchers are customizing their sample sizes based on effect size expectations in the way that produces trouble in the Simonsohn blog post.  But that’s just my project.

Anyways, there’s a meta-story here as well.  Just 10 days after Simonsohn’s blog post hit the interwebs, there was a write up about it in Nature  (Cressey, 2017) .  Wow.  The summary in Nature interviewed some folks to get their responses, but these seem to have drawn primarily from folks who initially responded via Twitter.

The process here is pretty amazing: a researcher posts some simulations and commentary on their blog, ‘reviews’ pour in from the Twitter-verse, and 10 days latter a high-profile journal summarizes that whole discourse.  What a neat model for a rapid-feedback form of science in the future.  The only problem is: only a very few people have blogs that could garner that much attention, regardless of the quality/importance of the post.  Maybe there should be a blog collective or something like that where researchers post.  Maybe this is the publishing model of the future?



Cressey, D. (2017, March 31). Tool for detecting publication bias goes under spotlight. Nature. Springer Nature.
Posted in Uncategorized

To what extent do new statistical guidelines change statistical practice?

In 2012 the Psychonomic Society (PS) adopted a set of forward-thinking guidelines for the use of statistics in its journals . The guidelines stressed the use of a priori sample-size planning, the reporting of effect sizes, and the use of confidence intervals for both raw scores and standardized effect-size measures.  Nice!

To what extent did these guidelines alter statistical practice?  Morris & Fritz (2017) report an natural experiment undertaken to help answer this question (Morris & Fritz, 2017) .  They analyzed papers printed before and after the guidelines were released (2013 and 2015; the 2013 data is actually after the guidelines were released, but all the papers analyzed were accepted for publication prior to the release).  The papers analyzed were from journals published by PS or from a journal of similar caliber and content but not subject to new guidelines.  In total, about 1000 articles were assessed (wow!).

What were the findings? Slow, small, but detectable improvement with tremendous room for further improvement:

  • Use of a priori sample size planning increased from 5% to 11% in PS journals but did not increase in the control journal.
  • Effect size reporting increased from 61% to 70%, though this increase was mirrored in the control journal
  • Use of raw score confidence intervals increased from 11% to 18%, with no change in the control journal
  • Confidence intervals for standardized effect sizes were reported in only 2 papers from 2015; an improvement from 0 papers in 2013, but hardly consistent with the PS guidelines to include these.

The authors conclude that more must be done, but don’t offer specifics.  Suggestions?



Morris, P., & Fritz, C. (2017). Meeting the challenge of the Psychonomic Society’s 2012 Guidelines on Statistical Issues: Some success and some room for improvement. Psychonomic Bulletin & Review. [PubMed]
Posted in Open Science, Stats tools, The New Statistics

From NHST to the New Statistics — How do we get there?

APS just wrapped up.  Geoff and I were privileges to help host a symposium on making progress moving the field away from p values towards the New Statistics.  Our co-conspirators were fellow text-book author Susan Nolan, Psychological Science editor Stephen Lindsay, and stats teaching wizard Tamarah Smith.  We each offered our perspectives on some of the road-blocks to abandoning the safety blanket of the p value.  The session was lively, with great audience feedback and discussion.  I’ve posted here each speakers’ slides:

  • Introduction by Geoff Cumming — a quick recap of the long history of calling for the end of the p value, and encouragement to make this time the time we really make the change. Slides are here.
  • The Textbook Writers Perspective by Susan Nolan — Susan reviewed some of the inertia holding back substantive change in statistics textbooks (both in her excellent textbook and for others).  She also offered hopeful insight from the sea-change that has occurred in the teaching of projective texts in clinical psychology.   Slides are here.
  • The Instructors’ Perspective by Tamarah Smith and Bob Calin-Jageman.  Tamarah and I considered the surprising complexities of making changes in the undergraduate statistics curriculum.  We reviewed some promising software tools for making the change (JASP, Jamovi, and a new set of extensions for SPSS) and discussed some of the strategies we had found helpful in incorporating the New Statistics into the classroom.  Slides are here.
  • The Editor’s Perspective by Stephen Lindsay.  Stephen discussed some of the very substantive policy changes implemented at Psych Science beginning with prior editor Eric Eich.  These include an emphasis on effect sizes, requirements for sample-size justification, and badges for different open-science practices.  Although these stringent requirements reduced submissions somewhat, they are having a real impact, hopefully towards better replicability.  Slides are here.

Some useful resources and papers discussed during the symposium:

  • Getting Started in the New Statistics – An set of links, guides, and resources for teaching the New Statistics.  Hosted via the Open-Science Framework.  A crowd-sourced effort; request to be an editor to join in.
  • JASP – free, open-source alternative to SPSS.  Focuses on Bayesian statistics, but makes it pretty easy to use confidence intervals with most analyses and graphs.
  • JAMOVI – another free, open-source alternative to SPSS.  Also does pretty well with confidence intervals on some types of analyses.
  • ESPSS – a set of extensions to get SPSS to do New Statistics well. Still under development, but feel free to live on the bleeding edge.  Currently supports independent samples t-test, paired samples t-test, and correlations.
  • A fascinating paper Geoff contributed to examining the impact of the APS guidelines on different publication practices: (Giofrè, Cumming, Fresc, Boedker, & Tressoldi, 2017).  (full link below)
  • Stephen’s most recent editorial enjoining new standards for data sharing in Psychological Science (Lindsay, 2017) (BRAVO!) (full link below)
  • Stephen’s earlier editorial about replication and standards at Psychological Science (Lindsay, 2015).
  • The famous “Business not as usual” editorial from Eric Eich that got things moving in a great direction at Psychological Science (Eich, 2014). (full link below)
  • Susan Nolan’s excellent undergrad statistics textbook with Richard Heinzen:
  • The APA’s 2.0 revision of Undergraduate Learning Goals for Psychology majors, which describes goals related to statistical/scientific thinking for psychology majors:


Eich, E. (2014, January). Business Not as Usual. Psychological Science. SAGE Publications.
Giofrè, D., Cumming, G., Fresc, L., Boedker, I., & Tressoldi, P. (2017, April 17). The influence of journal submission guidelines on authors’ reporting of statistics and use of open research practices. (J. M. Wicherts, Ed.), PLOS ONE. Public Library of Science (PLoS).
Lindsay, D. S. (2015, December). Replication in Psychological Science. Psychological Science. SAGE Publications.
Lindsay, D. S. (2017, April 17). Sharing Data and Materials in Psychological Science. Psychological Science. SAGE Publications.
Posted in NHST, Open Science, Teaching, The New Statistics