When I gave a talk at the HFESA conference, I started of course with an example of the damage done by NHST. My chosen article describes three examples in the field of road safety of how reliance on statistical significance testing has cost lives.
Hauer, E. (2004). The harm done by tests of significance. Accident Analysis and Prevention, 36, 495–500. …for a pdf, Google the title.
Hauer concluded, in the abstract, that “the pervasive use of this statistical ritual [NHST] impedes the accumulation of knowledge and is unfit for use”.
Hauer’s first example is the right turn on red (RTOR) rule: In the U.S., which drives on the right, you are usually permitted to turn right through a red light after first stopping. (That’s astonishing and scary for those of us who don’t live with the rule!) When RTOR was introduced in many states in the 1970s, a number of small studies found higher crash and injury rates with RTOR than without. But each study was small and did not reach statistical significance. So decision makers were repeatedly told by researchers that “there is no evidence of a significant hazard”. Only years later were larger studies conducted, which provided precise estimates and strong evidence of harm. Meta-analysis, which was not widely used back then, would have told the same scary story.
NHST thus led to Type II errors—there was a true effect (increased danger), but it was missed, and, in effect, the null hypothesis was accepted. (Does one of the red flags of Chapter 6 spring to mind?)
Hauer told very similar stories for the paving of the shoulders of highways (at first, judged not to reduce crashes and injury) and increases in maximum speed limits (at first judged not to increase risk). NHST was the culprit both times.
The crib death story in ITNS, p. 246, is similar. NHST leads to Type II errors, especially when meta-analysis is not used to integrate evidence.
Preparing my talk, I suddenly appreciated the double whammy effect of NHST. Most of the Open Science discussion has been about the pervasive Type I errors generated by NHST practices: p < .05 is claimed, for effects later found not to be real, or to be much smaller than first reported.
My conclusion: Conventional practices based on NHST do damage by (i) giving lots of Type I errors, and also (ii) lots of Type II errors. A double whammy indeed!