Judging Replicability: Fiona’s repliCATS Project
Whenever we read a research article we almost certainly form a judgment of its believability. To what extent is it plausible? To what extent could it be replicated? What are the chances that the findings are true?
What features of the article drive our judgment? Likely influences include:
- A priori plausibility. To what extent is it unsurprising, reasonable?
- How large are the effects? How meaningful?
- The reputation of the authors, the standing of their lab and institution.
- The reputation of the journal.
- Interpretations, judgments, claims and conclusions made by the authors.
Open Science ideas will emphasise our attention to features of the research itself, including:
- Are there multiple lines of converging evidence? Replications?
- Was the research, including the data analysis strategy, preregistered?
- Sample sizes? Quality of manipulations (IVs) and measures (DVs)?
- Results of statistical inference, especially precision (CI lengths)?
- Are we told that the research was reported in full detail? Are we assured that all relevant studies and results have been reported?
- Any signs of cherry-picking or p hacking?
- How prone is the general research area to publish non-replicable results?
Automating the Assessment of Replicability
The SCORE project is a large DARPA attempt to find automated ways to assess the replicability of social and behavioural science research. As I understand it, teams around the world are just beginning on:
- Running replications of a large number of published studies, to provide empirical evidence of replicability, and a reference database of studies.
- Studying how human experts judge replicability of reported research–how well do they do, and what features (as in the lists above) guide their judgments?
- Building AI systems to take the results from (2) and make automated assessments of the replicability of published research.
It’s big, maybe up to US$6.5M. It’s ambitious. And Fiona has multiple teams working on various aspects, all with impossibly tight time lines.
A month or so ago I spent a fascinating 3 days down at the University of Melbourne, for research meetings, seminars, and more, as the teams worked on their plans. Brian Nosek was in town, giving great presentations, and consulting as to how his team and Fiona’s could best work together. Here’s the outline of repliCATS, from the project website:
- The repliCATS project aims to develop more accurate, better-calibrated techniques to elicit expert assessment of the reliability of social science research.
- Our approach is to adapt and deploy the IDEA protocol developed here at the University of Melbourne to elicit group judgements for the likely replicability of 3,000 research claims.
- The research we will undertake as part of the repliCATS project will include the largest ever empirical study on how scientists reason about other scientists’ work, and what factors makes them trust it.
- We are building a custom online platform to deploy the IDEA protocol. This platform will have a life beyond the repliCATS project: it will be able to be used in the future to enhance expert group judgements on a wide range of topics, in a number of disciplines.
If you are interested in repliCATS, subscribe here for updates. You can also follow Fiona @fidlerfm
I’m agog to follow how it goes, and to see the insights I’m sure they will find into researchers’ judgments about replicability.