The joke of self-correcting science: The Andero lab and Nature Communications
In 2017, I (Bob) attended a National Academies of Science conference on reproducibility in science. Among many memorable events, I saw a talk by David Allison about his efforts to correct simple statistical mistakes in the published literature. He would notice a clear and obvious error, like a claimed interaction that wasn’t actually tested properly. He would then send a polite email to the authors to explain the situation so it could be corrected. If the authors didn’t respond, he’d move on to the editors. He went through case after case that he had pursued, and the result was almost always the same: indifference. Most of the authors he contacted either didn’t care or didn’t respond. The same was true for the editors. Despite intense efforts, actual corrections were extremely rare. Allison concluded that it was Quixotic to even try; that on the whole most scientists and most journals aren’t really committed to correcting even obvious errors. (At least, that’s my memory of the talk).
I remember finding Allison’s resignation at the end of his presentation to be depressing and cynical. But his experiences are certainly not unique. Twitter and the internet abound with similar stories that would make Kafka blush. It seems that, in most cases, pointing out a problem in a scientific paper is just shouting into the void, emails and fury that signify nothing. What’s strange is that scientists seem to shrug and accept it. Even worse, the silver-backs in the field are apt to pat you on the back if you’re upset about this state of affairs, treating you like you are childish or naive to expect more. What a strange and cynical state of affairs for a profession specifically dedicated to the truth. What would Socrates say?
The genre of “I found an error and after months of effort there is still no one doing anything about it” needs no more entries, but I’ve written one up below that involves a possible mis-match between posted data and a published figure in a memory paper. It’s a saga already 7 months old; so far there has been no meaningful progress (unless you count getting blocked on Twitter by the people you are trying to help as progress?)
Why did I decide to stop watching NFL to write about this today? Because yesterday someone pointed out on Twitter that I had made a mistake with some data I had posted. Ugh. I don’t have to tell you that seeing the Tweet sparked a wave of anxiety–What was the mistake? Would it change the paper? Did whatever error I had made affect other studies and analyses too? I was hanging out with my daughter, but it became difficult to think about anything else. Finally, I got a chance to open my computer to sort things out. It took some digging, but I figured it out: I had accidentally posted a version of the data with an exclusion mis-coded. I had apparently found the mis-coding but had not re-posted the data. It affected only 2 participants and only for that study, and (thank G-d) the error was only in the data I had posted, not in the actual paper. I explained the situation to the person who had pointed it out, posted the correct data, and had a drink. The experience was not pleasant–but that’s what we sign up for. If you don’t want to be in agony over the correctness of your papers, please and for real: find something else to do.
Anyway, yesterday’s anxiety reminded me of this figure/data mismatch I had found back in May. If you’re interested in another unsatisfying episode of indifference to error from both researchers and editors, then read on:
My story stars in May 2021 when I (Bob) came across a news article about a cool new finding in memory: a paper in Nature Communications showed that the same drug (Osanetant) can impair fear memory in male rats yet boost the same type of memory in female rats.
Sounded cool, so I pulled the paper. I was really intrigued by the drug x sex interaction, and thought it might be a good example for my students.
The first thing I noticed about the paper was that in the sample-size declarations, the authors declared that they had used a run-and-check approach.
That’s a shame — both that the authors didn’t know that this approach is problematic and that nobody along the review process seemed to notice, even though Nature journals had launched a rigor checklist to great fanfare precisely to catch this sort of thing. I tweeted at Nature Communications about it right away.
Despite this blemish I was geeked to see that the data for the paper was posted! I was able to download it, and was quickly making figures from the raw data that completely matched the published paper. So cool.
Then I ran into a into a bump. In one experiment, the researchers injected the drug directly into the amygdala of female rats, showing that its memory-enhancing effects are likely mediated directly by this brain structure. While the figure for the experiment (4b) showed a clear memory enhancement in the drug-injected rats compared to the vehicle-injected rats, the data posted for the paper did not. Here’s the figure from the paper (on the left) and a figure I made with esci in jamovi from the posted raw data (on the right). I stretched my figure out a bit strangely to try to match the scales. You can see the figures do not match–it seems the raw data is from completely different animals, and the pattern in the raw data is for the Osenatant-injected females to have slightly worse memory, not better.
It’s possible that I somehow misunderstood or mis-analyzed the data–but it was the same format/type as in other experiments where I could perfectly match my analyses to the published figures. I think the most likely explanation is that the wrong data was posted. I decided to write to the authors to let them know so that they could fix it (or so they could explain to me what I had understood incorrectly).
In drafting my email I decided to also ask for some data mentioned in the paper but not actually posted. For the key experiment demonstrating the sex x drug interaction (Figure 1), the caption stated “This experiment was replicated three times with similar results”. But I could not find this replication data in the data file for the published paper. I was bummed because my whole jam lately in stats class is giving different student teams different replications of the same study to analyze–it’s cool to bring real sampling variation into every classroom exercise.
On May 5, I wrote to the lab PI, Raul Andero— not someone I had ever met or corresponded with:
I recently read some of the news coverage of your paper in Nature Communication on sex differences in fear memory consolidation.
I have a keen interest in statistics, so I wanted to explore this paper as a possible example to use in the next edition of a stats textbook I’m working on.
I was wondering if you might be able to provide some assistance:
The caption to Figure 1A states that the experiment on a gender x drug interaction on memory was replicated three times with similar results. I couldn’t locate this data from within the data posted for the study. Would you be able to share that as well?
Thanks to having the data posted for the paper, I’ve been able to precisely reproduce almost all the behavioral analyses from the paper—that’s fantastic, thank you! The one figure where I am running into difficulty, though, is with figure 4b, which replicates the effect of osentant on memory in proestrus females but with IC delivery. I’m working with the data from the excel tab lablled “Proestr intra-CeA,Osanetant(4b)” — in this data I’m not able to reproduce the memory enhancement show in Figure 4b. Instead, I get a non-significant decrease in freezing behavior.
Is it possible the wrong data is posted? Or maybe I’m not understanding the data properly. I used the average of the 15 CS trials normalized to the 30s time period for each trial—but this seems to show *lower* freezing in the Osenetant group rather than higher.
You mention in the Results Report that “In case of tendencies towards statistical significancy, samples sizes were increased to ensure enough statistic power for analyses.” I posted a comment about this on Twitter, about the fact that this approach to determining sample sizes is known to greatly inflate the false-positive rate for a set of studies. I was wondering if you have any sense or maybe even records of how often/frequently you went through cycles of checking and then running additional samples. I’m curious about the behavioral results, specifically.
Thanks for any help you can provide
I received no response to this email.
After a few weeks (5/19) I emailed again to see if my initial email had been received. No response.
After another few weeks (6/11) I tried emailing the first author. No response.
I tried sending a DM to the Raul Andero on Twitter. No response, but he blocked me from sending further messages.
I tried emailing again at the end of June (6/30) and in July (7/8)
By September, I thought maybe I should try contacting the editor who handled the paper, writing to chief Life Science editor at Nature Communications, Nathalie Le Bot but cc:ing Andero and the paper’s lead author:
I am writing to ask for your assistance with a data discrepancy and with data availability for a recent Nature Communications article, Florido et al. (2021) (https://www.nature.com/articles/s41467-021-22911-9#article-info)
I have discovered that the data posted for Figure 4b does not match the figure in the paper. The paper states that animals treated with Osenetant showed an increase in freezing, but the posted data shows a decrease in freezing.
The caption for Figure 1a states “This experiment was replicated three times with similar results”. But data for these replications have not been posted.
I have reached out to the communicating author, Dr. Raul Andero (cc:ed) repeatedly via Twitter and email, but have had no response. I’ve also tried contacting the lead author (Dr. Antonio Luis Florido, cc:ed), but have yet to receive a response. It’s possible my messages have not been received.
As I know that Nature Communications has been able to communicate effectively with this team, I was hoping you might be able to facilitate notifying them of this data discrepancy and to also pass along my request for the replication data, which was stipulated to be available upon request.
No response from anyone, so I tried again on 9/9.
And then, on 9/29 just about 4 months after I tried to help flag the problem I got this back from Nature Communications:
Thanks for your message. We are following on your request with the authors .
All the best,
And since then…. nothing. I guess Nature Communications isn’t always the best at communicating. And the only action the Andero lab seems to have taken is to block me on Twitter.
I realize that tracking down an error takes time and effort. So does data sharing. But, this is part of what we sign up for–we’re supposed to care enough about our work that addressing problems and sharing our results is a priority, not something we ignore for months on end. Heck, this might not even be an error–it might be something I have misunderstood. If so, I imagine the Andero lab could gain great satisfaction from writing an email schooling me in what a stupid mistake I’ve made. That’d be fine by me. But it sure would be nice if the editors or the authors would show some sign that they care about this work.
I suppose that writing this up could be seen as “Bro-pen Science”–that I’m somehow bullying or overbearing for publicly calling out these authors for their invalid sample-size approach and/or this data/figure discrepancy. I guess. But in that case, what is the culture of interaction that is acceptable? If the authors use Twitter and the news media to promote their work, doesn’t that suggest they’re inviting public discourse about their research? Not sure I have any good answers for this. Maybe sharing stories of errors uncorrected isn’t the right approach. I’m willing to try any other approach offered to help fix our current poor practices of self-correction. What I’m not willing to accept, though, is complacency that “that’s just the way it is” (Hornsby & Range, 1989).