Pawel (Pav) Kalinowski and Jerry Lai completed their PhDs a few years back. A recently published Frontiers article (citation below) reports what was primarily Pav’s research on how people understand confidence intervals (CIs). The short version is “for many people, not very well, but there is hope”.
Kalinowski, P., Lai, J., & Cumming, G. (2018). A cross-sectional analysis of students’ intuitions when interpreting CIs. Frontiers in Psychology: Quantitative Psychology and Measurement, 16 February. https://doi.org/10.3389/fpsyg.2018.00112 [free download at that link]
Random sampling: The dance of the means
I’ll talk about Pav’s work shortly, but first a few words about random sampling and the cat’s eye picture of a CI. The figure below shows the dance of the means–sample means (green dots) generated by simulation of repeated sampling from a normally distributed population. (The population has mean μ = 50, as marked by the central vertical line. Population SD = 46 and the size of each sample is N = 20, although those values are not important.) The pile of means at the bottom is the mean heap–all the sample means from previous samples. The curve is the sampling distribution of the sample means–what we expect theoretically if an infinitely large number of samples is taken. The figure is from the CIjumping page of ESCI intro chapters 3-8, which is a free download from here–click on the ESCI download tab.
Most sample means fall close to μ
The figure illustrates how most sample means fall close to μ (population mean, marked by central vertical line). Progressively fewer fall further from μ, and in the long run just 5% fall outside the two outer vertical lines. The curve is a summary of this pattern. The curve has SD given the name standard error (SE). The outer vertical lines mark the central 95% of the area under the curve, and are approximately 2 x SE either side of μ.
In real life we don’t, of course, know μ, and we have only a single mean. Given our mean, where is μ? That’s the core challenge of statistical inference. The figure tells us that our best bet is that our sample mean has fallen fairly close to μ, tho’ it could easily have fallen a little distance from μ and, just possibly rather further still.
The 95% CI on our sample mean has length equal to the distance between the outer verticals, which is approximately 4 x SE. Place a line of this length so it is centred on our sample mean and we have the CI. Now the critical step: In addition, centre the curve from the figure also on our sample mean–not on μ as in the figure above. Just for neat symmetry, also centre an upside down version of the curve on our mean. We get the upper picture in the figure below, which comes from the Frontiers article.
That upper picture is the cat’s eye picture on the 95% CI, with, for additional emphasis, the area spanned by the CI shown shaded. This is 95% of the total area between the two curves. The lower picture is the same for the 50% CI.
The cat’s eye picture of a CI
The cat’s eye picture makes salient what implicit in a CI represented, conventionally, by a mere line. It tells us that our mean has most likely fallen quite close to the unknown μ, with the ‘fatness’ or vertical extent of the cat’s eye telling us the relative likelihood, or plausibility, that various points along the CI are where μ is located. Most likely, μ is fairly close to the mean, but it could easily be out towards either end of the CI, or even, just possibly, a little beyond the end of the CI.
Note that nothing special happens exactly at the end of the CI. Just inside or just outside–virtually no difference in chances that μ lies there.
Note also that the 50% CI is about one-third the length of the 95% CI, which tells us that there is approximately a 50-50 chance that μ lies in the middle third of the 95% CI. A handy little fact to remember.
So, when you see a 95% CI, visualise the cat’s eye, to help your intuitions about where the population effect size that you are trying to estimate is most likely to lie. All the above is a long-winded way to say that the cat’s eye is the beautiful, but usually hidden, face of a CI.
But do people understand that the chances of where μ lies vary across (and beyond) a CI? Traditional strict Frequentist dogma distinguishes only inside and outside a CI, and doesn’t permit any distinctions of different points within the interval. That, however, flies in the face of the dance of the means and how sampling actually behaves. The cat’s eye tells true. To what extent do people appreciate that?
Pav defined the Subjective Likelihood Distribution (SLD) as the “cognitive representation of the relative likelihood of each point across and beyond a CI in landing on the population parameter. For example, a uniform SLD reflects the (incorrect) belief that every point inside a CI is equally likely to have landed on μ.” He used several empirical approaches to estimate the shape of the SLD of seniour undergraduate and graduate students.
In very brief summary, Pav found that students’ SLD curves varied widely in shape. Some were close to flat, although many were (correctly) higher close to the sample mean than further away. Pav also identified a number of basic misconceptions about CIs that were held by some students. Some did not understand that, for example, a 99% CI must be longer than a 95% CI–because it encompasses a larger %age of the area of the cat’s eye.
Pav then interviewed some of the students at length. He identified in finer detail the correct and wrong conceptions each student held about CIs. Then he introduced the cat’s eye picture. He found encouraging initial evidence that learning about that picture helped many of the students to a better understanding of CIs.
There is enormous scope for Pav’s work to be followed up in more detail, and for the teaching implications to be explored. We believe that the cat’s eye can be extremely useful to help students, including beginners, develop better intuitions about CIs. Researchers as well! So ITNS does illustrate and discuss cat’s eye pictures.
I’d be very interested to hear of experiences from the classroom of anyone who has used cat’s eyes as part of their teaching of CIs. Thanks!
P.S. And now the most important bit, the abstract:
We explored how students interpret the relative likelihood of capturing a population parameter at various points of a CI in two studies. First, an online survey of 101 students found that students’ beliefs about the probability curve within a CI take a variety of shapes, and that in fixed choice tasks, 39% CI [30, 48] of students’ responses deviated from true distributions. For open ended tasks, this proportion rose to 85%, 95% CI [76, 90]. We interpret this as evidence that, for many students, intuitions about CIs distributions are ill-formed, and their responses are highly susceptible to question format. Many students also falsely believed that there is substantial change in likelihood at the upper and lower limits of the CI, resembling a cliff effect (Rosenthal and Gaito, 1963; Nelson et al., 1986). In a follow-up study, a subset of 24 post-graduate students participated in a 45-min semi-structured interview discussing the students’ responses to the survey. Analysis of interview transcripts identified several competing intuitions about CIs, and several new CI misconceptions. During the interview, we also introduced an interactive teaching program displaying a cat’s eye CI, that is, a CI that uses normal distributions to depict the correct likelihood distribution. Cat’s eye CIs were designed to help students understand likelihood distributions and the relationship between interval length, C% level and sample size. Observed changes in students’ intuitions following this teaching program suggest that a brief intervention using cat’s eyes can reduce CI misconceptions and increase accurate CI intuitions.