If you run an experiment, obtain p = .05, then repeat the experiment–exactly the same but with a new sample–what p value are you likely to get? The answer, surprisingly, is just about any value! In other words, the sampling variability of the p value is enormous, although most people don’t appreciate this.
Years ago I became fascinated with the unreliability of p, and wrote a paper exploring it, with simulations and pictures and formulas. (I can’t post the paper itself here, but if you would like it and have trouble getting access, email me at firstname.lastname@example.org and I’ll send you the pdf.)
I also posted to YouTube the first video of the dance of the p values, which illustrated the great variability of p and argued that this variability is one more reason for using confidence intervals and not using p values at all. Later I posted a second version of the dance, and also made a version to go with ITNS.
For the last year or two I’ve been playing with another approach to illustrating the enormous variability of p–significance roulette. It turns out that running a typical experiment to obtain a p value is equivalent to spinning a particular roulette wheel marked with 38 p values that range from *** (p < .001) to values up near 1. I’ve just posted a video. I hope you enjoy it. Even more, I hope it helps persuade people that it’s crazy to use p values, and that there are much better ways.
The shortcut to the Significance Roulette 1 video is: http://tiny.cc/SigRoulette1