1) No need for all that tortured nonintuitive normal/SD dependent tradition to measure distance from the test hypothesis: Just measure the information against the test hypothesis supplied by its P-value p by converting it to the Shannon information (now over 60 years history as “surprisal”, “logworth” and other names including S-value) s = -log(p). Unlike the P-value, the S-value is additive across independent tests (as Fisher exploited), equal-interval scaled, unbounded above so hard to confuse with a posterior probability; and when using base-2 logs has immediate translation into a coin-tossing experiment, e.g., p of 0.03 is s = -log(0.03) = 5 bits of information against the hypothesis, which is the same amount of information as 5 heads in a row supplies against fairness of a coin tossing set-up. The 1-sided 5-sigma physics criterion becomes about 22 bits or 22 heads in a row. And so on.

Yes what I am saying is The New Statistics is already old and in need of update – you should read my 2019 TAS-supplement paper and update your book accordingly:

Greenland, S. (2019). Some misleading criticisms of P-values and their resolution with S-values. The American Statistician, 73 suppl 1, 106-114, open access at

http://www.tandfonline.com/doi/pdf/10.1080/00031305.2018.1529625

2). OK, so we need a new term to refer to the value asserted by H0 and used to calculate the p value. Maybe ‘reference value for p’, or ‘H0 assumed value’? I think it was Bruce Thompson who used the term ‘non-nil null’, which I suspect you would label a contradiction.

1). The longer the distance, the stronger the evidence, of course. If MoE (margin of error) is the half-length of the CI (assumed here for simplicity to be symmetric) then an H0 assumed value that’s one-third of MoE beyond the end of the 95% CI gives approx p=.01, and two-thirds gives approx p=.001. We could no doubt work out the corresponding LR values. (LR is approx 7 for the point estimate vs. an end of the 95% CI, so LR increases from 7 as we move further from the CI.) In summary, strength of evidence increases fairly quickly as we move away from the 95% CI. The 5-sigma, etc, standard represents very very strong evidence. But once we move much beyond, say one MoE from an end of the CI (i.e. 4-sigma), our usual model probably is not a good guide. In practice the uncertainty due to sampling variability (as accounted for by that model) is probably overshadowed by bias or other problems not captured by that model. So in most cases we’re kidding ourselves if we report exact p values below, say, .001. (Accordingly, the APA Publication Manual recommends reporting exact, rather than relative, p values, except that p<.001 is preferred to any smaller exact value of p.) 3). A fair point that ratio measures are harder to represent well and think about clearly. Squared measures similarly. 4). Fair point. In UTNS I described my version as 'the CI-function' and marked the vertical axis also with corresponding p values.

]]>1) What does “some little distance” mean? The CI has to be pretty far from a parameter value to provide “strong” evidence in any sense I can think of. E.g., the 5-sigma requirement in physics corresponds to falling farther from the interval than the interval limits are from the center!

2) Please, the only correct English use of “null” for no difference, effect, or association – check your dictionary. One of the many ways Fisher screwed stats was misusing “null” for any tested value, just as Neyman screwed stats by calling CIs “confidence” intervals – a use which Arthur Bowley call a “confidence trick” in 1934. These abuses of English are every bit as misleading as “significance” for P<0.05.

3) I like cat's eye graphs, but I don't trust most readers to have an accurate mind's eye – especially when looking at ratio measures.

4) Nitpicking, but "Poole's P-value function"? Please, no: As Poole notes the P-value function is not his idea – it goes back at least to Birnbaum 1961. I just think Poole's 1987 exposition (actually in two articles in the Am J Public Health that year) is the clearest and most compelling to date.

Finally, just to emphasize: If I am seriously focusing on a single association, all I would need to see is the P-value function since all CIs and P-values can be read off that. But given that's asking for a bit much, I want to see the main results from it as given by a CI and P-values. And then I also want to see at least a fit P-value or some diagnostics for the model used to create those association-focused statistics (or at least have some assurance the analyst checked the model before giving us the focal results). So in my book the P-value remains a central concept of frequentist analyses.

]]>I fully agree that “nullistic conventions… need to be challenged and broken”. I agree that much of current practice needs drastic improvement, in relation to CIs as well as p values and other techniques. I agree that, if using p values, it can often be valuable and revealing to calculate them for more than one value of the null. I agree that p values around .05, corresponding to null values near an end of a 95% CI, provide only weak evidence against those null values. Yes, in typical situations, a 95% CI provides strong evidence only against null values at least some little distance from the interval.

However, I still contend that a CI is more likely to prove effective as a basis for good understanding and interpretation than one or more p values. (Or than one or more single values, each some transformation of a p value.) Yes, “p values can be calculated across the entire relevant spectrum of parameter values”. In UTNS, p. 105, I included a version of Poole’s p value function that illustrates how the p value varies across and beyond a CI. Also, in Chapter 6 of ITNS we explain how a CI can be used to eyeball the p value for any value of the null that is of interest, anywhere across or beyond the interval. A CI, especially when supplemented (either in the graph, or in the reader’s mind’s eye) with the cat’s eye figure, indicates how the relative strength of evidence against any null of interest varies as that null takes any chosen value across and beyond the interval.

We emphasise in ITNS, and in our TAS article, that an essential part of interpreting any CI is to pay attention to the full extent of the interval. So, for your example CI of [0.997, 2.59], we would want any reader to consider, in particular, the meaning in the research context of each of those interval endpoints. Yes, this is not always done, but it should be, and providing the CI is a good first step to enabling and encouraging that.

Geoff

]]>