p intervals: Replicate and p is likely to be *very* different!
The Significance Roulette videos (here and here) are based on the probability distribution of the p value, in various situations. There’s more to the second video than I mentioned in my recent post about it. The video pictures the distribution of replication p, which is the p value of a single replication experiment, following an initial experiment that gave some particular initial p value.
In the video I illustrated how that distribution can be used to find the p interval, meaning the 80% prediction interval for replication p, for some particular initial p value. (All p values are two-tailed.) Here are some example p intervals:
So, if you run an experiment that gives p = .05, then, if you replicate–everything just the same, but with new samples–then you have an 80% chance of getting p within the interval (.0002, .65) and there is fully a 10% chance that p is greater than .65 and 10% it is less than .0002. An amazingly long interval! Just about any p value is possible! And that’s all true, no matter what the N, the power, or the true effect size!
For an initial p of .001, (***) the interval ends are lower, as we’d expect, but the interval is once again very long. In addition, the chance of not even obtaining p < .05 (*) is fully 17%, about 1 in 6.
That’s all another way of seeing the enormous unreliability of p. The world would be better if we all simply stopped using p values!