Effect Sizes for Open Science

For the last 20 years or so, many journals have emphasised the reporting of effect sizes. The new statistics emphasised also the reporting of CIs on those effect sizes. Now Open Science places effect sizes, CIs, and their interpretation centre stage.

Here’s a recent article with interesting discussion and much good advice about using effect sizes in an Open Science world:
Pek, J., & Flora, D. B. (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological Methods, 23(2), 208-225. http://dx.doi.org/10.1037/met0000126

The translational (simplified) abstract is down the bottom.

Unfortunately, Psychological Methods is behind the APA paywall, so you will need to find the article via a library. (Sometimes it’s worth searching for the title of an article, in case someone has secreted the pdf somewhere online. Not in this case at the moment, I think.)

A couple of the article’s main points align with what we say in ITNS:

Give a thoughtful interpretation of effect sizes, in context
Choose effect sizes that best answer the research questions and that make most sense in the particular situation. Often interpretation is best done in the original units of measurement, assuming these have meaning–especially to likely readers. Use judgment, compare with past values found in similar contexts, give practical interpretations where possible. Where relevant, consider theoretical and possibly other aspects or implications. (And, we add in ITNS, consider the full extent of the CI in discussing and interpreting any effect size.)

Use standardised effect sizes with great care
Standardised (or units-free) effect sizes, often Cohen’s d or Pearson’s r, can be invaluable for meta-analysis, but it’s often more difficult to give them practical meaning in context. Beware glib resort to Cohen’s (or anyone else’s) benchmarks. Be very conscious of the measurement unit–for d, the standardiser. If, as usual, that’s a standard deviation estimated from the data, it has sampling variability and will be different in a replication. (In my first book, UTNS, I introduced the idea of the rubber ruler. Imagine the measuring stick to be a rubber cord, with knots at regular intervals to mark the units. Every replication results in the cord being stretched to a different extent, so the knots are further or less far apart. The Cohen’s d value is measured in units of the varying distance between knots.)

There’s also lots more of interest in this article, imho.

Translational Abstract
We present general principles of good research reporting, elucidate common misconceptions about standardized effect sizes, and provide recommendations for good research reporting. Effect sizes should directly answer their motivating research questions, be comprehensible to the average reader, and be based on meaningful metrics of their constituent variables. We illustrate our recommendations with four different empirical examples involving popular statistical methods such as ANOVA, categorical variable analysis, multiple linear regression, and simple mediation; these examples serve as a tutorial to enhance practice in the research reporting of effect sizes.

Leave a Reply

Your email address will not be published. Required fields are marked *