Marie,

Thank you for your comment.

**Hedges’ g* **

I confess I’m gradually coming around. Partly your results and arguments, partly discussion with Bob. If there is an underlying single σ, the pooled

Your main argument is strong: If variances are equal or close,

I take all your points about weird, irrelevant stuff influencing Glass’s

Geoff

Thank you again for this amazing feedback and your blog post!

**About the credibility of the assumption of equal population variances:** there are many authors who argued, like us, that the assumption of homogeneity of variances often does not hold (see for example Erceg-Hurn & Mirosevich, 2008; Zumbo & Coulombe, 1997). In a previous paper (Delacre et al. 2017), we develop many reasons why we think equal population variances are very rare in practice. Moreover, it’s very hard to check for the homogeneity of variances assumption, because:

– the assumption is about population parameters that we don’t know (σ1 and σ2);

– inferential statements about the homogeneity of variances assumptions based on assumptions tests often lack power to detect assumption violations.

Finally, when we look at figures in our preprint, we notice that when variances are equal across groups, Hedges’ *g* and Hedges’ *g** are either identical (Figure 2) or very close (Figure 3). The only exception is when both skewness and kurtosis are very large. Most of the time, there is therefore little cost in choosing Hedges *g** by default. On the contrary, Hedges *g* cannot be used in case of heterogeneity of variances.

**About Shieh’s d and Shieh’s g: **that’s indeed very interesting to notice that it’s not recommended, neither for interpretation nor for inferential purposes.

**About Glass’s d and Glass’s g: **what makes its use very complicated is the fact that its bias and variance depend on parameters that we can not control. For example, when distributions are skewed (which is very common, according to Micceri, 1989), the bias and variance of Glass’s

As a consequence, even if Glass’s *g* is easier to interpret, in appearance, I can hardly see how a measure can be very informative if we cannot control its bias and variance. I realize, as we mentioned in the preprint that the standardizer of Hedges’ *g** is not easy to interpret per se. I think the easiest way to do so is by comparison with Hedges’ *g*. One limitation is that Cohen’s *d* is very often interpreted based on Cohen’s benchmarks and like many of us (I think) I don’t really like them because they are too arbitrary and don’t take the context into account. Note, however, that some authors have proposed more appropriate benchmarks (see for example Funder et al., 2019), where they describe an effect as small, medium or large *in comparison with* commonly published effects (with a correction to compensate for the publication bias). I wish I could provide a more satisfactory solution, and I hope that your blog post will be an opportunity to open debates and to have new insights.

Best regards,

Marie

Delacre, M., Lakens, D., &; Leys, C. (2017). Why psychologists should by default use 521 Welch’s t-test instead of Student’s t-test. International Review of Social Psychology, 522 30 (1), 92–101. https://doi.org/10.5334/irsp.82

Erceg-Hurn, D. M. & Mirosevich, V. M. (2008). Modern robust statistical methods: An easy way to maximize the accuracy and power of your research. American Psychologist, 63(7), 591. DOI: https://doi.org/10.1037/0003-066X.63.7.591

Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156-168. https://doi.org/10.1177/2515245919847202

Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105(1), 156–166. DOI: https://doi.org/10.1037/0033-2909.105.1.156

Zumbo, B. D. & Coulombe, D. (1997). Investigation of the robust rank-order test for non-normal populations with unequal variances: The case of reaction time. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 51(2), 139. DOI: https://doi.org/10.1037/1196-1961.51.2.139

]]>Thanks for reporting this. We will work this out for the next release of esci.

In the meantime, users interested in using the esci package directly in R can find updated instructions and code examples here: https://osf.io/d89xg/wiki/tools:%20esci%20for%20R/

Bob

]]>Have run into one persistent error with the subset of ESCI R functions that offer the option of either raw data or summary data. I have reported this error along with a simple reproducible example (reprex) on the esci github page here (https://github.com/rcalinjageman/esci/issues/7).

Would be thrilled for any guidance on how to get around this one. Am all too happy for a short-term band-aid solution, or even to be pointed in the right general direction so I can fork the esci library, implement a fix, and then have students load that version. Whatever is easiest in the midst of pandemic-busy-ness!

Thank you so very much for the incredible resource of these powerful and user-friendly new versions of the fabulous ESCI analysis suite!

]]>Joanna,

Welcome to the fascinating world of stats–fascinating I hope you find it, and I hope our pictures and simulations help make it all make sense!

‘Normal’ in ESCI lets you investigate the smooth curve of the normal distribution, which you could think of as the distribution of all the infinite number of values in a population. The standard error (SE) arrives once we start sampling from that population, so play with CIjumping. Collect the dropping means into the mean heap, turn on the smooth curve on the heap, then the SE is the standard deviation of that smooth curve. See Figs 4.6 to 4.8.

Bob and I are currently working hard on the second ed. of ITNS, which will use **esci on the web** (by Gordon Moore), as well as **esci in jamovi** (for data analysis and making cool figures) by Bob. You can start playing with either of these new goodies, at the ESCI menu at our site.

Enjoy, and I hope you do find stats to be fascinating, as well as highly useful and important!

Geoff

Best – Joanna Meringoff ]]>

]]>that article was so good

I am a teacher of children with severe/ profound intellectual disabilities. This one study has transformed education policy and perpetuated the belief that everyone can achieve high standards and grade level work if the teachers only believe in them promoting the belief that a full inclusion class is best for everyone. Some of my kids have not reached the developmental milestones of an 8 month old child. Parents have incredibly unrealistic expectations that we can make their kids normal if we only hold them to higher expectations. Another interesting thing is this study was conducted using a lot of English language learners. Wouldn’t language and literacy acquisition skills account for higher IQ scores from first to second grade, especially if first grade was the first time they were learning English? This is why all of my friends quit teaching, we can’t cure disability and raise a child’s IQ by 50-80 points but apparently congress thinks we can.

]]>Thanks Patrick, well said. There’s a famous list of more than 500 wordings that have been used to magically dress up a p > .05 as, pretty much, significant.

https://mchankins.wordpress.com/2013/04/21/still-not-significant-2/

There has even been systematic study of the use of such tricks, with evidence that they have appeared more often over the years.

https://statmodeling.stat.columbia.edu/wp-content/uploads/2016/06/Pvalues.pdf

My question is ‘How do we know that p=.08 is not desperately running AWAY from significance?’

Yet one more reason to consign p values to the dustbin of history.

Geoff

Is there a similar phenomenon of tests with p=0.04 slouching away from significance?

]]>