As I explained, **ITNS2 **will be accompanied by Bob’s data analysis software, **esci**, in R, and Gordon’s web-based simulations and tools, all of which are based on, and go beyond, my Excel-based **ESCI**. Together the web-based goodies comprise **esci web**, which you can open in your browser **here**. (Or use the ESCI menu above and choose **esci web** from the dropdown.) From today, **esci web** has four components, with perhaps two yet to come.

** distributions, d picture, and correlation** are visual statistical tools, developed in JavaScript. We’d love to have your feedback.

See the curves, explore *z *scores, find areas, find critical values.

What does *d* = 0.2 look like? How much overlap of distributions? What about *d* = 0.5, 1.0, 1.5, …?

What do you think is the *r* value in each of these scatterplots?

——— Don’t read on just yet. Have an eyeball of the scatterplots. What is each *r*?

——— Last chance… look back up…

OK, the correlation is .3 in all cases. True, if possibly strange. (All the data sets come from a bivariate normal distribution, and in all cases the data set correlation is .3.)

Pro tip: Eyeball, or turn on, a cross through the means, as in lower right. Then eyeball the approximate comparative number of dots in (top right + lower left) quadrants and the (top left + lower right) quadrants. Correlation is a tussle between those first two (the *matched *quadrants) and the second two (the *unmatched*).

Investigate that and other cool things in **correlation**.

As I say, access **esci web** **here**, and please let us have your comments.

Enjoy,

Geoff

]]>…as I was asked recently. A question every author loves to hear. The short answer is **ITNS**, preferably to be followed by **ITNS2**, coming in 2021 we hope. Here’s an overview:

Main changes from the first edition: **fabulous new software**:

**esci **(in R) for data analysis and great graphs with CIs, by Bob, and

**esci web** (in javascript) for dance simulations and tools, by Gordon Moore

There’s even more about **Open Science**, and some new examples–timely studies that have used Open Science practices.

The first introductory textbook to combine **the new statistics** (CIs, estimation, meta-analysis) with **Open Science** practices, from the start and all through. Starts at the very beginning, but goes far enough to include meta-analysis, regression, and simple two-way designs. Basic formulas only, many pictures and interactive demos. Lots of examples. Lots of online resources to support teachers and students.

A streamlined version of **ESCI**, software that runs under Excel, is used throughout the book. More information **here**. Read Contents and Chapter 1 **here**. Publisher’s website **here**. Support materials are **here**. **ESCI intro **is **here**. Amazon page **here**.

The original book, aimed at upper year undergraduates through to researchers. Explains in detail why the dichotomous thinking of NHST is damaging and should be replaced by **the new statistics**. **Estimation **and **meta-analysis** are introduced from the start. Some is just a little technical, for example three chapters on meta-analysis. Predates Open Science. No regression. Accompanied by original **ESCI**, running under Excel. More information **here**. Publisher’s website **here**. **ESCI **is **here**. Amazon page **here**.

Whichever you choose, I hope the book, software and all the materials serve you well. Together let’s change the world, towards better research and statistical practices.

Geoff

]]>I’m delighted to report that they have now posted a **preprint **of their results **here**. We’d love to have **your comments and suggestions**.

Max explored six approaches to calculating a CI for the DR. He used simulation to investigate their properties, especially coverage, and identified two that give excellent CIs. He provides (**here**) R code to allow any researcher to calculate the CI on the DR for their own data, for a range of measures. All Max’s simulation materials are available on OSF **here**, so anyone can recreate or extend Max’s work.

Below is Figure 1 from the preprint, as an example of how the DR and its CI may be reported in a forest plot.

In the figure, DR = 1.40 is reported along with three conventional measures of heterogeneity, all with CIs. Both the RE (Random Effects) and FE (Fixed Effect) diamonds are shown in the forest plot, so it’s easy to eyeball DR, which is simply the length of the RE diamond divided by that of the FE diamond. DR = 1 suggests little or no heterogeneity, and increasing values of DR suggest increasing heterogeneity. One vital message is given by the CI on the DR, which is [0, 3.09], so this meta-analysis, which integrates only 10 studies, can give us only a very imprecise estimate of heterogeneity.

Along with the DR, the figure reports the 95% prediction interval (PI) for true effect sizes as a further estimate of heterogeneity. Borenstein et al. (2017) advocated use of the PI, which is reported here to be 0.285. The red line segment just under the RE diamond pictures that length. Informally, that segment illustrates the likely extent of spread of true effect sizes. The PI is 4 x *T*, where *T* is the estimated population SD of true effect sizes. The very long CI reported for *T* indicates once again a very imprecise estimate of heterogeneity.

In the preprint we conclude that the DR, and its CI, can be valuable for students as they learn about meta-analysis, and for researchers as they interpret and communicate their meta-analyses.

**It would be great to have any comments about Max’s work and the preprint. Thanks!**

Geoff

Max: mrcairns994@gmail.com Geoff: g.cumming@latrobe.edu.au

Borenstein, M., Higgins, J. P., Hedges, L. V., & Rothstein, H. R. (2017). Basics of meta-analysis: *I*^{2} is not an absolute measure of heterogeneity. *Research Synthesis Methods, 8*, 5-18. doi:10.1002/jrsm.1230

We are now releasing Gordon’s **dances** in beta, and seek your feedback. Developed in JavaScript, **dances** opens in your browser via **this link**. ITNS2 will be accompanied by Bob’s data analysis software, **esci**, in R, and Gordon’s web-based simulations, all of which are based on, and go beyond, my Excel-based **ESCI**. The first and most important of Gordon’s simulations is **dances**, which replaces and goes beyond **CIjumping** in ESCI.

Below are four examples of **dances** bringing key statistical ideas alive. These are frozen images: It’s ** way **more convincing watching the simulations dancing down the screen.

Getting started with **dances**:

- Open
**dances**in a browser - Click on the ‘
**?**’ at top right in the control panel (left side of screen) to turn on popout tips, which give brief explanations when the mouse hovers over labels or controls. - Use the three big buttons. Play as you wish. Click ‘Clear’ to start again.

Take repeated samples of size *N* = 20 from the pictured normally distributed population. Watch the pattern of values (blue open circles) jump around from sample to sample. Watch the means (green dots) from successive samples dance down the screen: So much variation, even with samples of size 20! This is the **dance of the means**.

Place 95% CIs on each of the dancing means, again with samples of *N* = 20. CIs that don’t capture the population mean, mu (blue line), are red. In the short term, red CIs seem to come very haphazardly, sometimes rarely, sometimes in clumps. In the long term, however, very very close to 95.0% of CIs will capture mu and 5.0% will be red.

This happens when CIs are all the same length, being based on the population SD, sigma, assumed known. Remarkably, it also happens when, as in the picture below, CIs vary in length because they are based on sample SDs, when sigma is assumed not known. Either way, we are seeing the **dance of the CIs**.

The falling means pile up to form the **mean heap**; means in the heap keep their colour, red or green. In the long run, the mean heap shape will closely match the theoretically expected, normally distributed, sampling distribution curve.

The **central limit theorem** states that, almost whatever the shape of the population distribution, the sampling distribution of sample means will be approximately normal. Furthermore, the larger the samples, the closer the sampling distribution will be to normal.

In **dances** you can draw whatever weird shape of population distribution you choose, then take samples of some chosen size, *N*, and compare the mean heap with the normal curve.

The figure below shows that, even with my hand-drawn, highly skewed population, and samples as tiny as *N* = 3, the mean heap is much less skewed than the population, and surprisingly close in shape to the symmetric normal curve.

Run a replication, exactly the same as the original experiment but with a new sample, and find that the *p* value is likely to be very different. The sampling variability of the *p* value is surprisingly large: Alas, we simply shouldn’t trust any *p* value.

The figure below shows the **dance of the CIs** and the corresponding *p* values—which vary from <.001 to more than .8! Deep blue patches mark *p*>.10, through to bright red patches for *p*<.001. This is the **dance of the p values**!

Population mean, mu, is 60, and SD, sigma, is 20. The null hypothesis is H0: mu0 = 50, so the effect size in the population is half of sigma, or Cohen’s delta = 0.50, conventionally considered to be a medium-sized effect. With *N* = 16, the power is about .50, which is typical for many research fields in psychology and some other disciplines.

The running simulation is way more vivid than any picture, especially when sounds are turned on, ranging from a bright trumpet for *p*<.001 down to a deep trombone for *p*>.10.

Change *N*, or population effect size, and see generally lower or higher *p* values but, most surprisingly, in every case the values of *p* still jump around dramatically.

For videos of such dances, search YouTube for ‘**dance of the p values**’ and ‘**significance roulette**’.

Figures and dances like those shown here will come in Chapters 4, 5, and 6 in ITNS2.

Meanwhile, please have a play with Gordon’s wonderful **dances** and let us have your thoughts and suggestions. Thanks.

Geoff

]]>The **dance of the p values** was my first go at making vivid the amazingly large sampling variability of the

But that’s when we know the population mean. What about a more realistic situation when all we know from the initial experiment is the *p *value? What is a close replication, just the same but with a new sample, likely to give? In other words, what’s **replication p**

In most cases replication *p *can take just about any value

For explanation and all the formulas, see **this paper** (cited 375 times). For the demo, search YouTube for ‘Significance Roulette’ to find two videos. Or they are **here **and **here**.

The figure above is a wheel that’s equivalent to **the distribution of replication p **following an initial experiment that gives

Of course, if the initial *p *value is different we’ll need a different wheel. Below is the wheel for initial *p *= .01. More red (*p *< .001) and less deep blue (*p *> .10), but still an alarmingly wide spread of possibilities.

You don’t believe me? It took me ages to accept what the formulas and simulations were telling me. But they are correct. We really, really can’t trust a *p *value–which seems to promise certainty, a clear outcome. Far, far better to use the **confidence interval**, whose length makes the degree of uncertainty salient, even tho’ that’s often a depressing message.

Geoff

]]>From Fiona’s review:

“Together with his overview of the replication crisis, this introduction would be useful for undergraduates or general readers.

“Fraud, bias, negligence and hype are the themes of *Science Fictions*. Some of the cases Ritchie presents… are intriguing and disturbing combinations of all four.

“This comprehensive collection of mishaps, misdeeds and tales of caution is the great strength of Ritchie’s offering.”

Then come the **really interesting bits**, including discussion of Richie’s interpretation of the problem and his view of what science should be and what he sees it as having become. Fiona can expertly set that all in perspective, as a scholar of the history and philosophy of science, and now metascience. Definitely worth the read. She’d recommend the book also, despite some flaws.

Geoff

Richie, S. (2020). * Science fictions: How fraud, bias, incompetence, and hype undermine the search for truth*. Metropolitan Books.

This post describes the approach I’m considering–if you have any feedback, I’d be very happy to have it.

**The goals**

I’m writing functions that will provide estimates with CIs for different effect sizes (means, mean differences, R, proportions, etc.)

**Goal 1 – **I’d like all functions to work with both summary and raw data. For example, I want users to be able to do:

`estimate_mean(mtcars, mpg, conf.level = 0.99)`

As well as

`estimate_mean(m = 10, s = 2, n = 5)`

**Goal 2 – **I’d like to try to make the functions compatible with tidyverse style and tidy evaluation (to the extent I understand either of these). So, for tidy style my goal is to use camel_case for function names. For tidy evaluation, it should be possible to send in column names unquoted, and to arbitrarily expand the number of columns to be processed. As in:

`estimate_mean(data = mtcars, y = mpg, cyl, wt, gear)`

**Goal 3 **– I’d like the package to be clean and easy to use. I don’t want different functions for slightly different styles of input (raw data vs. summary data). I **do **want the auto-fill for a function to be informative.

**Goal 4 **– I’d like the code to be efficient and easy to maintain. My approach has been to write a basic function that deals with summary data only, and then to use that as the basis for processing raw data, hopefully avoiding code duplication.

**Current Implementation**

To meet these goals, I’ve been tinkering around with Rs S3 “classes” and the UseMethod dispatcher that enables you to route function calls to different implementations based on the types of objects passed. I’ve found this to be quite a journey. I’ve devised a working approach (code listing is below), but I’m eager for feedback, as it feels a bit icky–I am not confident that I’m dealing with unquoted arguments well/properly.

**Detected unquoted arguments… is there a better way**? To dispatch properly, I’d like to detect if the user has passed in column name as an unquoted argument (rather than, say a vector). However, there doesn’t seem to be a way to dispatch by an unquoted argument (they don’t have a natural class, and checking the class before quoting causes an error). You can quote first, but literally anything can be quoted, so you don’t end up with a substantively different object if the original argument was unquoted vs. not. There does not seem to be a ‘is.unquoted’ or the like to check. So I’ve hit upon testing by using:

`qpassed <- try(eval(y), silent = TRUE)`

If y is an unquoted column name, this eval throws an error, and qpassed obtains the class “try-error”. This feels a bit icky. And it doesn’t distinguish an unquoted column name from a typo in an otherwise valid expression.

**Is it ok to dispatch based on a calculated switch rather than what is actually passed?**In the set of functions I envision, the real key to how to dispatch is in the format of y. If y is a vector, I want to dispatch to a function that estimates from a vector. If y is an unquoted column, name, I want to dispatch to a function that handles a column from a data frame. If y is the beginning of a list of unquoted column names, I want to dispatch to a function that handles that list. And finally, if y is null it means summary data has been passed in, so I want to dispatch accordingly. The thing is, though, you**can’t**dispatch on y because it might be an unquoted column name, and these don’t have a class, can’t be evaluated before quoting, and therefore won’t be dispatched properly. I was stuck, and**then**it dawned on me that I could use the dispatching function to determine what type of y has been passed, and dispatch accordingly. What’s really strange about this, is I realized I could code the class of y in another variable (I called it switch) and ask UseMethod to use this calculated variable even though it wasn’t passed–and it does! And yet it does not actually pass the dispatching variable to the function that gets called! That’s not like any other type of class dispatch method I’ve ever seen… seems to work, but feels kind of wrong.**Is there no way to tell a vector from a single number?**I was surprised when writing the dispatch function to pass in a vector and have the class show up as numeric. It looks like R doesn’t assign a vector class to vectors. Is there some way to distinguish single numbers from vectors? (length?)**as_label rather than quo_name**— it looks like quo_name is being retired in favor of as_label… Not really a question, just a note to myself to update my code for this.

**Current Implementation: **

Here’s the dispatch function I’ve generated for estimating a single mean. It dispatches to one of 4 functions: 1) to one that works with summary data (.numeric), 2) to one that works with a vector (.vector), 3) to one that works with a single column name of data (.character), 4) or to one that works with a list of column names (.list). Although the dispatch function itself keels icky to me, I do like that these 4 functions are built stacked on top of each other (the list calls the single column, which calls the vector, which calls the numeric). I *think* that’s the way to go to avoid code duplication and make maintenance easy–but happy for feedback on this as well.

This is just a skeleton that I will use/elaborate for other functions. I am still planning on adding assertive-based input checking, and providing a rich, well-named, and consistent output object.

```
estimate_mean <- function(data = NULL,
y = NULL,
...,
m = NULL,
s = NULL,
n = NULL,
conf.level = 0.95) {
switch <- 1
if (!is.null(m)) {
if(!is.null(data)) stop("You have passed summary statistics, so don't pass the 'data' parameter used for raw data.")
if(!is.null(y)) stop("You have passed summary statistics, so don't pass the 'y' parameter used for raw data.")
} else {
if(!is.null(m)) stop("You have passed raw data, so don't pass the 'm' parameter used for summary data.")
if(!is.null(s)) stop("You have passed raw data, so don't pass the 's' parameter used for summary data.")
if(!is.null(n)) stop("You have passed raw data, so don't pass the 'n' parameter used for summary data.")
qpassed <- try(eval(y), silent = TRUE)
if(class(qpassed) != "try-error" & class(qpassed) == "numeric") {
if(!is.null(data)) stop("You have passed y as a vector of data, so don't pass the 'data' parameter used for data frames.")
class(switch) <- "vector"
} else {
dotlist <- rlang::quos(...)
if(length(dotlist) != 0) {
switch <- list()
} else {
switch <- "character"
}
}
}
UseMethod("estimate_mean", switch)
}
#' @export
estimate_mean.numeric <- function(m, s, n, conf.level = 0.95) {
sem <- s / sqrt(n)
moe <- qt(1 - (1-conf.level)/2, n-1)
ci.low <- m - moe
ci.high <- m + moe
res <- list(m = m,
sem = sem,
moe = moe,
ci.low = ci.low,
ci.high = ci.high)
formatted_mean <- stringr::str_interp(
"mean = $[.2f]{m} ${conf.level*100}% CI [$[.2f]{ci.low}, $[.2f]{ci.high}]"
)
class(res) <- "estimate"
return(res)
}
#' @export
estimate_mean.vector <- function(y, conf.level = 0.95) {
y <- na.omit(y)
m <- mean(y, na.rm = TRUE)
s <- sd(y, na.rm = TRUE)
n <- length(y)
res <- estimate_mean.numeric(m, s, n, conf.level = conf.level)
return(res)
}
#' @export
estimate_mean.character <- function(data, y, conf.level = 0.95) {
y_enquo <- rlang::enquo(y)
y_quoname <- rlang::quo_name(y_enquo)
res <- estimate_mean.vector(data[[y_quoname]], conf.level = conf.level)
return(res)
}
#' @export
estimate_mean.list <- function(data, ..., conf.level = 0.95) {
res <- list()
dotlist <- rlang::quos(...)
for (y_var in dotlist) {
y_name <- rlang::quo_name(y_var)
res[[y_name]] <- estimate_mean.character(data = data, y = !!y_name, conf.level = conf.level)
}
class(res) <- "estimate list"
return(res)
}
```

So…. if any R package-developers have thoughts or suggestions I’d be glad to have them. Thanks in advanced.

]]>Sometimes, of course, we can be delighted to find evidence that an effect is zero-to-negligible.

…as if good Open Science researchers would ever let their emotions intrude on their work.

Geoff

*[Update 7/4/2020 – Added reference to preprint on Cohen’s d for paired designs and put code in an actual code block]*

Lots of research questions boil down to estimating the difference between two means (*M*_{diff} = *M*_{group_of_interest} – *M*_{reference_group}). This is the ‘raw score’ effect size–it reports the difference between groups on the same scale of measurement they were measured on. Usually, that’s all you need (and an estimate of uncertainty). Sometimes, though, it’s nice to also obtain a standardized effect size, one that does not depend on the scale of measurement. In these cases, Cohen’s d is the go-to measure:

Cohen’s *d* = *M*_{diff} / *sd*_{but_which_sd?}

Cohen’s d turns out to be freaking complicated. First, there are issues with how to standardize the mean difference (which sd do you use?). This bumps up against the thorny issue of it is reasonable to assume equal variance. Then there’s the fact that Cohen’s d from a sample is slightly upwardly biased, so it needs to be corrected for bias, which causes some people to relabel it as Hedges g. And in case that wasn’t confusing enough, there’s an additional issue of how best to estimate the confidence interval of d. There are lots of solutions (some good, some bad), and most stats tools aren’t very clear on which approach they are using. That’s a surprising amount of complexity for what would have hoped would be an easy standardization of effect size.

In this blog post I am not going to wade through all these complexities . Instead, I will demonstrate three different ways you can easily obtain Cohen’s d and its CI. Each of these approaches will be very transparent about the all-important choice of the denominator (Lakens, 2013). Each uses the technique of Goulet-Pelletier & Cousineau (2018), which simulation studies suggest is generally the best approach (though perhaps not for paired designs–see the section on “approaches” at the end for details). In all cases, we are going to assume equality of variance between groups/measures–it turns out that without this assumption the CI on d becomes problematic.

All three of the approaches I’ll explain are based around the esci package for R that I (Bob) am currently working on (https://github.com/rcalinjageman/esci). As of 7/3/2020 this package is a rough draft–I’m now working through it to make the code beautiful (to the extent I can). You can use it as-is with some confidence–but be warned that I am tinkering and may yet make update-breaking updates to the package. I don’t have much documentation yet (does this page count?), but you can find a basic walk through the package here: https://osf.io/d89xg/wiki/tools:%20esci%20for%20R/

**Method 1 – Use esci in jamovi**

Let’s start with the easiest option: using a GUI. jamovi is a delightful point-and-click program for statistical analysis (https://www.jamovi.org/). It’s free, it’s open source, it runs on any platform (even Chromebooks), and it’s extensible with modules. I’d call it an SPSS replacement, but it is so much better than that. jamovi is built on R, so you can obtain R syntax for everything you do in jamovi (just turn on “Syntax mode”). Seriously, jamovi is great.

The esci package I’ve developed for R runs as a module in jamovi. Just: 1) run jamovi, 2) click the modules button near the top-right corner, 3) access the jamovi library, and 4) scroll down to esci and click install. You’ll now have an esci menu in your jamovi program (and it will stay there until you remove it–you only need to install a module 1 time per machine). There are screen-by-screen instructions here: https://thenewstatistics.com/itns/esci/jesci/

Once done, you can obtain cohen’s d for both independent and paired designs, and you can do so from raw data or from just summary data. The commands to use are:

- esci -> estimate independent mean difference (the estimation version of an independent t-test), or
- esci -> estimate paired mean difference (the estimation version of a paired t-test)

For example, here I’ve selected “estimate independent mean difference”. In the analysis page that appears I’ve selected the toggle-box for “summary data”. I then typed in the means, standard deviations, and sample sizes for my two groups. In an instant, I get output which includes Cohen’s d and its CI

Here’s a close-up of the output for Cohen’s d:

d_{unbiased}= 0.91 95% CI [0.30, 1.63] Note that the standardized effect size is d_unbiased because the denominator used was SDpooled which had a value of 2.15 The standardized effect size has been corrected for bias. The bias-corrected version of Cohen's d is sometimes also (confusingly) called Hedges' g.

esci explains to you what denominator was used and its value (important) and it clarifies that correction for bias has been applied. One thing missing (for now) is a reference for the approach to obtaining the CI, which really matters… I’ll fix that soon. Maybe there is additional information that would be useful? If so, let me know.

**Method 2 – Obtain Cohen’s d in R from summary data with estimateStandardizedMeanDifference**

Maybe you are an R power user and you just can’t even when it comes to using a GUI for data analysis. No problem. esci is a package in R. It’s not on CRAN (and probably won’t be for some time), but you can obtain it directly from github using the devtools package. Then you can use the function estimateStandardizedMeanDifference. Here’s a detailed code example that includes everything you would need to download and install the package:

```
# Setup -------------------------------------------
# First, make sure required packages are installed.
if (!is.element("devtools", installed.packages()[,1])) {
install.packages("devtools", dep = TRUE)
}
if (!is.element("esci", installed.packages()[,1])) {
devtools::install_github(repo = "rcalinjageman/esci")
}
# Second, load the required libraries
library("esci")
# Third, get some cohen's d
# Get d directly from summary data for a two-group design
estimate <- estimateStandardizedMeanDifference(m1 = 10,
m2 = 15,
s1 = 2,
s2 = 2,
n1 = 20,
n2 = 20,
conf.level = .95)
estimate
# Get d directly from summary data for a paired design
estimate <- estimateStandardizedMeanDifference(m1 = 10,
m2 = 15,
s1 = 2,
s2 = 2,
n1 = 20,
n2 = 20,
r = 0.80,
paired = TRUE,
conf.level = .95)
estimate
# Or, use raw data to estimate the raw mean difference with CI *and* d with CI
# THis boring example uses mtcars
data <- mtcars
data$am <- as.factor(data$am)
levels(data$am) <- c("automatic", "manual")
estimate <- estimateMeanDifference(data, am, mpg,
paired = FALSE,
var.equal = TRUE,
conf.level = .95,
reference.group = 1)
estimate
plotEstimatedDifference(estimate)
```

Note that for the paired data I passed a flag (paired = TRUE) and *also* an r value–that’s the correlation between the paired measures. If you don’t have it, you can often calculate it from summary data and the t-test results.

I have omitted the output here because it follows the exact same format as for jamovi above (after all, it’s running the same code under the hood).

**Method 3 – Obtain Cohen’s d and its CI from raw data with estimateMeanestimateMeanDifference**

Finally, let’s obtain Cohen’s d from raw data–and in the process obtain the raw-score mean difference and a nice plot that emphasizes the raw data and the effect size and its uncertainty.

Here’s a very uninspired example using the mtcars dataset–sorry it’s not very exciting, but there aren’t a lot of fun built-in datasets in R. We’ll compare the miles per gallon (mpg) for automatic vs. manual cars. The type of transmission is in the column “am” and it is coded as a numeric 0 (manual) or 1 (automatic). In this example I will make it into a factor (esci requires that a grouping variable be a factor) and relabel it to make the output easier to understand.

Again I’ve made the code complete, including everything needed to ensure esci is installed.

```
# Setup -------------------------------------------
# First, make sure required packages are installed.
if (!is.element("devtools", installed.packages()[,1])) {
install.packages("devtools", dep = TRUE)
}
if (!is.element("esci", installed.packages()[,1])) {
devtools::install_github(repo = "rcalinjageman/esci")
}
# Second, load the required libraries
library("esci")
# Now make a copy of mtcars and convert am to a labelled factor
data <- mtcars
data$am <- as.factor(data$am)
levels(data$am) <- c("automatic", "manual")
# Prepare yourself for some Cohen's d (and a nice plot)
estimate <- estimateMeanDifference(data, am, mpg,
paired = FALSE,
var.equal = TRUE,
conf.level = .95,
reference.group = 1
)
estimate
plotEstimatedDifference(estimate)
```

As you can see above, we use this function by passing the dataframe (data), the grouping variable (am) and the outcome variable (mpg). The reference.group parameter is optional–it specifies which level of your grouping variable factor that should serve as the reference group when calculating the effect size (*M*_{diff} = *M*_{group_of_interest} – *M*_{reference_group}). If you leave this parameter out, esci will use the first level of your grouping variable.

Again, the output for Cohen’s d follow the same as above, so I’m not going to go through it. But check out the cool plot:

**Approaches**

There are a number of different ways to estimate the CI on Cohen’s d. esci uses the method explained by Goulet-Pelletier & Cousinea: (Goulet-Pelletier & Cousineau, 2018). I’ll expand this blog-post at some point to explain it, but for now I strongly suggest reading the actual paper–it not only clearly explains the approach but it also compares it against many other options, including the ones used in popular R packages (see the appendix)… it turns out not all R packages emit good quality CIs for d!

I’m deeply indebted to these authors–I was able to adapt the code they provided into esci and they have repeatedly helped answer questions to improve the function.

One big issue, though — a recent preprint? I found on ResearchGate suggests that **all **approaches to obtaining a CI for d will fail with paired designs (Fitts, 2020). I’m still digesting this, and waiting to see the peer-reviewed version. But it is probably worth some extra caution with CIs for a paired design–the preprint shows they can have poor capture rates when r is very strong.

**To Read**

- Goulet-Pelletier, J.-C., & Cousineau, D. (2018). A review of effect sizes and their confidence intervals, Part I: The Cohen’s d family.
*The Quantitative Methods for Psychology*, 242–265. doi: 10.20982/tqmp.14.4.p242 - Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs.
*Frontiers in Psychology*. doi: 10.3389/fpsyg.2013.00863

Materials, including slides, are at https://osf.io/d89xg/

See the first slide for links to the data, and for help (it’s dead easy) to get esci running in jamovi. Also available from our site https://thenewstatistics.com/itns/esci/jesci/

Bob walked us through how **esci **(now in R) makes it easy to analyse data using estimation, for several different measures and designs. Extra data sets allowed us all to do it ourselves, and enjoy the great esci **pictures with CIs**.

He started by explaining why **teaching **this way is more fun, more successful, and leads to happier students. esci will be the cornerstone advance in the second edition of ITNS, which we’re working on right now.

Enjoy,

Geoff

]]>