What Should We Call Our Estimate of Cohen’s δ: d-unbiased, Hedges’ g, or Something Else?

In ITNS we used ‘dunbiased’ to refer to the debiased estimate of Cohen’s δ, which is Cohen’s standardised effect size in the population. In UTNS I used ‘dunb’. But now ‘Hedges’s g’ seems to be gaining currency as a label for that debiased estimate, despite g having been introduced by Larry Hedges back in the 1980s with a different meaning.

A bit of background

Cohen’s d for two independent groups, of size n1 and n2, with means M1 and M2 and SDs of s1 and s2 is

d = (M1M2) / (standardizer)

where ‘standardizer’ is some SD we choose as an appropriate unit of measurement for d. The numerator (difference between the means) is the effect size of research interest in original units and d is that ES re-expressed as a number of SDs; it’s a kind of z score.

Choice of standardizer is critical: d needs to be interpretable in the context. If our data are IQ scores on a well-established test, we might choose as standardizer σ = 15, the SD in the test’s reference population. But usually we’ll need to choose an estimate, calculated from the data, as standardizer. For two groups, it’s common to assume homogeneity of variance and use sp, the pooled estimate of the population SD. If one group is a control group, we might choose the SD of that group as standardizer, thus avoiding the assumption. Other choices are possible.

Unfortunately, d is a biased estimate: it overestimates δ, especially for small samples. A simple calculation debiases d. Of course, to interpret a value of d we need to know what standardizer was used and whether the reported value has been debiased. My question: What symbol should we use for debiased d?

A history of confusing labels

In 2011, in UTNS (p. 295), I wrote:

You’d think something as basic as dunb would have a well-established name and symbol, but it has neither. … In the early days the two independent groups d calculated using sp [pooled SD for the two groups, assuming homogeneity of variance] as standardizer was referred to as Hedges’ g.  For example the important book by Hedges and Olkin (1985), which is still often cited, used g in that way, and used d for what they explained as g adjusted to remove bias.  So their d is my dunb.  By contrast, leading scholars Borenstein et al. (2009) swapped the usage of d and g, so now their d is the version with bias, and Hedges’ g refers to my dunb.  Maybe hard to believe, but true.  The CMA [meta-analysis] software also uses g to refer to dunb.  In further contrast, Rosnow and Rosenthal (2009) is a recent example of other leading scholars explaining and using Hedges’ g with the traditional meaning of d standardized by sp and not adjusted to remove bias.  Yes, that’s all surprising, confusing, and unfortunate. 

Larry Hedges is one of the authors of Borenstein, et al. (2009), so presumably he supported the swapping of the labels. I asked him about these issues and he kindly replied with an account of the history. In his foundational articles of 1980-82 he used g for the biased estimate, to honour meta-analysis pioneer Gene Glass, and gU for the unbiased version. Then from around 1985 he started using d for the unbiased estimate, to correspond with δ (delta, Greek ‘d’). He reports that he doesn’t know who started using g for the unbiased estimate but that, by 2009, his co-authors felt that they should go with what seemed to have become standard practice.

Where to now?

My informal impression—I could be wrong—is that ‘Hedges’ g’ is increasingly being used for debiased d.

Bob and I need to decide what we’ll do in ITNS2 and in esci. Specifically, should we stick with ‘dunbiased’, or switch to using ‘Hedges’s g’. (Whichever we choose, we’ll no doubt note that both terms are in use.)

Despite the possible messiness of a long word as subscript, I’m currently leaning towards sticking with dunbiased. My thoughts:

  1. ‘Cohen’s d’, or simply ‘d’, is overwhelmingly the term used to denote the standardized ES. It’s used to introduce and explain the idea, and in journal articles—sometimes even if debiased values are reported. Further, dunbiased signals a particular variant of d, and even explains its key property—being an unbiased estimate. Guessing would probably give reasonable understanding.
  2. I suspect most researchers have heard of d, have interpreted values and perhaps used d in their own research, even if they don’t know about debiasing—which anyway isn’t an issue for N more than, say, 50. Many fewer would have heard of Hedges’ g, or be able to link it to d, let alone say how it relates to d. Both those links would need to be explained, and taught. The change of letter symbol seems arbitrary; there is no way to guess.
  3. It’s common (and useful) to refer to ‘the d family’ of standardized effect size measures. How strange that the most commonly needed member of that family is labeled ‘g’.
  4. It’s a great convention that a Roman letter estimates the corresponding Greek letter. So M, s, and r estimate µ, σ, and ρ respectively. Therefore it’s great that δ is widely used for the population value of Cohen’s d. Using g for the sample value suggests we’re estimating γ, which is never used. How weird to have to explain that the best estimate of δ is g.
  5. A mathematical statistician would sidestep all the above by using “delta hat” for the estimate, but I don’t think that’s a good universal solution for psychology, or many other research fields.
  6. In medicine, SMD, for “standardised mean difference” is widely used and a reasonable acronym. However, it can refer to a population value or sample estimate, and very often we’re left to wonder whether bias has been removed.
  7. On the other hand, it’s useful for a subscript to signal how d is calculated, perhaps ds when sp is the standardizer and we assume homogeneity of variance, and dC when the SD of the Control group is standardizer and we avoid that assumption. Using d and g permits subscripts to tell us about the standardizer. However, I don’t think any strong conventions have emerged as to which subscripts tell us what.

Given all that, I’m currently preferring dunbiased. However, has g become unstoppable? If so, the complexity of d, g, and δ is just one more baffling inconsistency we have to explain to bemused students.

Please let me have your thoughts.

Geoff

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. Chichester, UK: Wiley.

Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic Press.

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FA: Academic Press.

Leave a Reply

Your email address will not be published.

*