The Significance Roulette videos (here and here) are based on the probability distribution of the *p* value, in various situations. There’s more to the second video than I mentioned in my recent post about it. The video pictures the distribution of replication *p*, which is the *p* value of a single replication experiment, following an initial experiment that gave some particular initial *p* value.

In the video I illustrated how that distribution can be used to find the *p* interval, meaning the 80% prediction interval for replication *p*, for some particular initial *p* value. (All *p* values are two-tailed.) Here are some example *p* intervals:

So, if you run an experiment that gives *p* = .05, then, if you replicate–everything just the same, but with new samples–then you have an 80% chance of getting *p* within the interval (.0002, .65) and there is fully a 10% chance that *p* is greater than .65 and 10% it is less than .0002. An amazingly long interval! Just about any *p* value is possible! And that’s all true, no matter what the *N*, the power, or the true effect size!

For an initial *p* of .001, (***) the interval ends are lower, as we’d expect, but the interval is once again very long. In addition, the chance of not even obtaining *p* < .05 (*) is fully 17%, about 1 in 6.

That’s all another way of seeing the enormous unreliability of *p*. The world would be better if we all simply stopped using *p* values!

Geoff

## Leave a Reply