Transparency of reporting sort of saves the day…
I’m in the midst of an unhappy experience serving as a peer reviewer. The situation is still evolving but I thought I’d put up a short post describing (in general terms) what’s happened because I’d be happy to have some advice/input/reactions. Oh yeah, this is a post by Bob (not Geoff).
I am reviewing a paper that initially seemed quite solid. In the first round of review my main suggestion was to add more detail and transparency: to report the exact items used to measure the main construct, the exact filler items used to obscure the purpose of the experiment, any exclusions, etc.
The authors complied, but on reading the more detailed manuscript I found something really bizarre: the items used to measure the main construct changed from study to study, and often items that would seem to be related to the main construct were deemed filler from one study to the next.
Let’s say the main construct was self-esteem (it was not). In the first experiment there were several items used to measure self-esteem, all quite reasonable. But in a footnote giving the filler items I found not only genuine filler (“I like puppies”) but also items that seem clearly related to self-esteem… things as egregious as “I have high self esteem”. WTF? Then, in the next experiment the authors write that they measured their construct similarly but list different items, including 1 that had been deemed filler from experiment 1. Double-WTF! And, looking at the filler items listed in a footnote I again find items that would seem to be related to their construct. I also find a scale that seemed clearly intended as a manipulation check but which has not been mentioned or analyzed in either version of the manuscript (under-reporting!). The next experiments repeat the same story–described as measured in the same way but always different items and some head-scratching filler items.
There were other problems now detectable with the more complete manuscript. For example, it was revealed (in a footnote) that statistical significance of a key experiment was contingent on removal of a single outlier; something that had not been mentioned before! But the main one that has me upset is what seems to be highly questionable measurement.
One easy lesson I’ve learned from this is how important it is as a reviewer to push for full and transparent reporting. Without key details on how constructs were measured, what else was measured, what participants were excluded, etc. it would have been impossible to detect deficiencies in the evidence presented.
What has me agitated is what happens now. I sent back my concerns to the editor. If the problems are as severe as I thought (I could be wrong), I expect the paper will be rejected. But what happens next? These authors were clearly willing to submit a less-complete manuscript before. What if they submit the original version elsewhere, the one that makes it impossible to detect the absurdity of their measurement approach? The original manuscript seemed amazing; I have no doubt it could be published somewhere quite good. So has my push for transparent really saved the day, or will it just end up helping the authors better know what they should and shouldn’t include in the manuscript to get it published?
At this point, I don’t know. I’m still in the middle of this. But here are some possible outcomes:
- It’s all just a misunderstanding: The authors could reply to my review and clarify that their measurement strategy was consistent and sensible but not correctly represented in the manuscript. That’d be fine; I’d feel much less agitated.
- The authors re-analyze the data with consistent measurement and resubmit to the journal, letting the significance chips fall where they may. That’d also be fine. Rooting for this one.
- The authors shelve the project. Perhaps the authors will just give up on the manuscript. To my mind this is a terrible outcome–they have 3 experiments involving almost hundreds of participants testing an important theory. I’d really like to know what the data says when properly analyzed. The suppression of negative evidence from the literature is the most critical failure of our current scientific norms. I feel like, in some ways, once you submit a paper to review it almost *has* to be published in some way, especially with the warts revealed… wouldn’t that be useful for emptying the file drawer and also deepening how we evaluate each other’s work?
- The authors submit elsewhere, reverting to the previous manuscript that elided all the embarrassing details and which gave an impression of presenting very solid evidence for the theory. I suspect this is the most likely outcome. Nothing new to this. I remember one advisor in grad school who said (jokingly) that the first submission is to reveal all the mistakes you need to cover up. I guess the frustrating thing here is how uneven transparent reporting still is. I was one of 4 reviewers for this paper and I was the only one who asked for these additional details. If the authors want to go this route, I think they’ll have an easy time finding a journal that doesn’t push them for the details. How long until we plug those gaps? Why are we still reviewing for or publishing journals that don’t take transparent reporting seriously?
I’ll suppose I should organize a betting pool. Any predictions out there? What odds would you give for these different outcomes? Also, I’d be happy to hear your comments and/or similar stories.
Last but not least, here are some questions on my mind from this experience:
- How much longer before we can consider sketchy practices like this full-out research misconduct? I mean if you are working in this field can you any longer plead ignorance? At this point shouldn’t you clearly know that flexible measurement and exclusions are corrupt research practices? If this situation is as bad as I think it is, does it cross the threshold to actual misconduct? I could forgive those who engaged in this type of work in the past (and I know I did, myself), but at this point I don’t want any colleagues who would be willing to pass off this type of noise-mining as science.
- Would under-reporting elsewhere transform a marginal case of research misconduct into a clear case? Even if initially submitting a p-hacked manuscript doesn’t yet qualify as a clear-cut case of research misconduct, would re-submitting it elsewhere after the problems have been pointed out to you count as research misconduct?
- Does treating the review process like a confessional exacerbate these problems? My understanding (which some have challenged on Twitter) is that the review process is confidential and that I can not reveal/publicize knowledge I gained through the review process alone. Based on that, I don’t think I would have any public recourse if the authors were to publish a less-complete manuscript elsewhere. My basis for criticizing it would be my knowledge of which items were and were not considered filler, knowledge I would only have from the review process. So I think my hands would be tied–I wouldn’t be able to notify the editor, write a letter to the editor, post to pubPeer, etc. without breaking the confidentiality of the review. I’m not sure if journal editors are as bound by this..so perhaps the editor at the original journal could say something? I don’t know. This is all still so hypothetical at this point that I don’t plan on worrying about it yet. But if I do eventually see a non-transparent manuscript from these authors in print I’ll have to seriously consider what my obligations and responsibilities are. It would be a terrible shame to have a prominent theory supported and cited in the literature by what I suspect to be nonsense; but I’d have to figure out how to balance that harm against my responsibilities for confidential review.
Ok – had to get that all out of my head. Now back to the sh-tstorm that is the fall 2019 semester. Peace, y’all.