Internal Meta-Analysis: Useful or Disastrous?
A recent powerful blog post (see below) against internal meta-analysis prompts me to ask the question above. (Actually, Steve Lindsay prompted me to write this post; thanks Steve.)
In ITNS we say, on p. 243: “To carry out a meta-analysis you need a minimum of two studies, and it can often be very useful to combine just a few studies. Don’t hesitate to carry out a small-scale meta-analysis whenever you have studies it would be reasonable to combine.”
The small number to be combined could be published studies, or your own studies (perhaps to appear in a single journal article) or, of course, some of each. Combining your own studies is referred to as internal meta-analysis. It can be an insightful part of presenting, discussing, and interpreting a set of closely related studies. In Lai et al. (2012), for example, we used it to combine the results from three studies that used three different question wordings to investigate the intuitions of published researchers about the sampling variability of the p value. (Those studies are from the days before preregistration, but I’m confident that our analysis and reporting was straightforward and complete.)
The case against
The blog post is from the p-hacking gurus and is here. The main message is summarised in this pic:
The authors argue that even a tiny amount of p-hacking of each included study, and/or a tiny amount of selection of which studies to include, can have a dramatically large biasing effect on the result of the meta-analysis. They are absolutely correct. They frame their argument largely in terms of p values and whether or not a study, or the whole meta-analysis, gives a statistically significant result.
Of course, I’d prefer to see no p values at all, and the whole argument made in terms of point and interval estimates–effect sizes and CIs. Using estimation should decrease the temptation to p-hack, although estimation is of course still open to QRPs: results are distorted if choices are made in order to obtain shorter CIs. Do that for every study and the CI on the result of the meta-analysis is likely to be greatly and misleadingly shortened. Bad!
Using estimation throughout should not only reduce the temptation to p-hack, but also assist understanding of each study, and the whole meta-analysis, so may reduce the chance that an internal meta-analysis will be as misleading as the authors illustrate.
I can’t see why the authors focus on internal meta-analysis. In any meta-analysis, a small amount of p-hacking in even a handful of the included studies can easily lead to substantial bias. At least with an internal meta-analysis, which brings together our own studies, we have full knowledge of the included studies. Of course we need to be scrupulous to avoid p-hacking any study, and any biased selection of studies, but if we do that we can proceed to carry out, interpret, and report our internal meta-analysis with confidence.
The big meta-analysis bias problem
It’s likely to be long into the future before many meta-analyses can include only carefully preregistered and non-selected studies. For the foreseeable, many or most of the studies we need to include in a large meta-analysis carry risks of bias. This is a big problem, probably without any convincing solution short of abandoning just about all research published earlier than a few years ago. Cochrane attempts to tackle the problem by having authors of any systematic review estimate the extent of various types of biases in each included study, but such estimates are often little more than guesses.
Our ITNS statement
I stand by our statement in favour of internal meta-analysis. Note that it is made the context of a book that introduces Open Science ideas in Chapter 1, and discusses and emphasises them in many places, including in the meta-analysis chapter. Yes, Open Science practices are vital, especially for meta-analysis! Yes, bias can compound alarmingly in meta-analysis! However, the problem may be less for a carefully conducted internal meta-analysis, rather than more.
Lai, J., Fidler, F., & Cumming, G. (2012). Subjective p intervals: Researchers underestimate the variability of p values over replication. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 8, 51-62. doi:10.1027/1614-2241/a000037