Fun with esci in R: The simple two-group design

esci is now available as a module in jamovi and as a package in R (JASP coming soon, hopefully). Let’s have some fun with esci in R!

We’ll start with a simple two-group design. Specifically, we’ll use data from Experiment 4 of ​(Kardas & O’Brien, 2018)​. In this study, participants watched a video explaining how to do a simple mirror-tracing task ​(Cusack, Vezenkova, Gottschalk, & Calin-Jageman, 2015)​. Participants were randomly assigned to watch the training video either 1 time or 20 times. They then predicted how they would perform on the task (0-100%) and then completed the task (0-100%). Kardas & O’Brien found that watching the training video repeatedly boosted confidence (predicted scores) but not performance.

Opening the data – R

If you haven’t installed esci yet, you can do so with:


Once installed, we will load esci into memory and then we will store the Kardas & O’Brien data set bundled in esci, giving it the name mydata

mydata <- esci::data_kardas_expt_4

Analyze the data in R

We are going to analyze the effect of video Exposure on Prediction scores. We can do this with the estimate_mdiff_two command. We’ll want to store the result, so tell R to store it in a new variable called estimate.

estimate <- esci::estimate_mdiff_two(
  data = mydata,
  outcome_variable = Prediction,
  grouping_variable = Exposure,
  conf_level = 0.95,
  assume_equal_variance = TRUE

Note that we’ve decided to assume equal variance… but it’s probably a better default not to do this.. and it’s easy enough to change the command by setting assume_equal_variance to FALSE.

Inspect the result

What we get back in R is a list, a complex object that contains other objects. You can inspect this object in lots of different ways, but lets try listing the objects it contains:

 [1] "properties"                      "es_mean_difference_properties"  
 [3] "es_mean_difference"              "es_median_difference"           
 [5] "es_median_difference_properties" "es_smd_properties"              
 [7] "es_smd"                          "es_mean_ratio"                  
 [9] "es_mean_ratio_properties"        "es_median_ratio"                
[11] "es_median_ratio_properties"      "overview"                       
[13] "raw_data"             

We can see that our results has properties, and then a bunch of different objects that start with es — that’s short for effect size. We get a mean difference, a median difference, an smd, a mean ratio, and a median ratio. Many of these have their own properties as well. Finally, we also get an overview and raw_data.

Let’s see the overview:

> estimate$overview
  outcome_variable_name grouping_variable_name grouping_variable_level     mean  mean_LL  mean_UL median
1            Prediction               Exposure                       1 56.37795 52.90820 59.84770     60
2            Prediction               Exposure                      20 67.76224 64.49236 71.03212     71
  median_LL median_UL       sd min max q1   q3   n missing  df  mean_SE median_SE
1  53.57284  66.42716 22.07273   0 100 40 71.5 127       0 268 1.762318  3.279225
2  66.12580  75.87420 17.66669   0 100 59 81.0 143       0 268 1.660803  2.486881

You can see that overview is a table — it lists each group found in the data (1x and 20x exposure) and provides basic descriptive statistics: mean with confidence interval, median with confidence interval, standard deviation, etc.

Let’s take a look at the es_mean_difference table:

> estimate$es_mean_difference
        type outcome_variable_name grouping_variable_name effect effect_size        LL
1 Comparison            Prediction               Exposure     20    67.76224 64.492358
2  Reference            Prediction               Exposure      1    56.37795 52.908204
3 Difference            Prediction               Exposure 20 ‒ 1    11.38429  6.616553
        UL       SE  df     ta_LL    ta_UL
1 71.03212 1.660803 268 65.020985 70.50349
2 59.84770 1.762318 268 53.469143 59.28676
3 16.15202 2.421576 268  7.387331 15.38124

You can see that this table gives us the mean and confidence interval of the 20x group, of the 1x group, and of the difference between them, reporting (in row 3) the contrast between the 20x and 1x group. The 1x group, in this case is the reference group — we express the effect size relative to the 1x group. The mean difference in prediction scores is 11.38 95% CI [6.6, 16.15]. We also get the standard error, degrees of freedom, and the confidence interval at two alpha (90% CI in this case). Clearly, watching the instructional video make a pretty big difference in predictions, it boosted confidence by over 10 points on a 0-100 scale in this sample! There is some uncertainty about the size of the effect, but overall, it seems clear that more video exposure leads to more confidence.

Notice that we have some other ways of expressing the effect size. For one, we can examine median differences–probably a better idea in most cases in psychology, but not widely done.

> estimate$es_median_difference
        type outcome_variable_name grouping_variable_name effect effect_size        LL       UL       SE
1 Comparison            Prediction               Exposure     20          71 66.125803 75.87420 2.486881
2  Reference            Prediction               Exposure      1          60 53.572837 66.42716 3.279225
3 Difference            Prediction               Exposure 20 ‒ 1          11  2.933636 19.06636 4.115567
      ta_LL    ta_UL
1 66.909445 75.09055
2 54.606155 65.39385
3  4.230494 17.76951

This table is setup similarly to the es_mean_difference — we again get each group and the contrast between them. There is more uncertainty here (a difference of 11 points with a 95% CI [2.9, 19.06])… the data are consistent with a large median difference but also with a fairly small median difference of just 2.9 points (and valuers near the CI boundary are not very different in their compatibility with the data). So we’d still want to be cautious about concluding there is a meaningful median difference.

Want more ways to express this? Of course! We can also thing about the ratios between the group means or medians. Here’s the ratio of medians:

> estimate$es_median_ratio
  outcome_variable_name grouping_variable_name effect effect_size       LL       UL comparison_median
1            Prediction               Exposure 20 / 1    1.183333 1.037629 1.349498                71
1               60

The 20x group had a median 1.18x the 1x group, but the CI is broad [1.037, 1.349].. so somewhere between a very small to very large increase in median is compatible with this data.

And, of course, psychologists remain a bit obsessed with Cohen’s d. So let’s look at the es_smd table:

> estimate$es_smd
  outcome_variable_name grouping_variable_name effect effect_size       LL        UL numerator denominator
1            Prediction               Exposure 20 ‒ 1   0.5716119 0.327274 0.8149238  11.38429    19.86031
SE df d_biased 1 0.1244027 268 0.5732178

This is a fairly large effect: d = 0.57 95% CI [0.33, 0.81] and the confidence interval is fairly narrow — we could fairly easily plan a sensitive follow-up study to help confirm and better characterize this effect.

But wait, there are lots of approaches to Cohen’s d… what is the denominator that was used and what flavor of Cohen’s d have we produced? Take a look at es_smd_properties to find out.

> estimate$es_smd_properties
$message This standardized mean difference is called d_s because the standardizer used was s_p. d_s has been corrected for bias. Correction for bias can be important when df < 50. See the rightmost column for the biased value.

Ah, so this is ds – because it used the pooled standard deviation. If we had set assume_equal_variance to FALSE we’d have obtained d_avg, which uses the average of the group standard deviations as the normalizer.


We don’t just want a bunch of tables… let’s see this data.

This is easy in esci, we just pass our stored result (estimate) to an appropriate plot function. In this case, we’ll use plot_mdiff to visualize a mean or median difference:

  effect_size = "mean"

and we get this beautiful figure:

Which we can then customize to our heart’s content (it’s a ggplot2 plot object).

Want to see the median difference instead? Here we go:

  effect_size = "median"

and we get:

Evaluating a Hypothesis

Although Kardas & O’Brien conducted several studies on video exposure, this was the first study they conducted using mirror tracing as the performance task. Therefore, they probably didn’t yet have a clear quantitative prediction to test — they weren’t really ready for hypothesis testing. Imagine, though, that you are going to conduct a replication study. Based on Kardas & O’Brien, you believe increased video exposure produces a substantive change in confidence, and you decide to define this as at least a 5 point difference in means. In other words, you’re specifying an interval null. The skeptic’s hypothesis (the null hypothesis) is that any difference in confidence will be negligible (< 5 point difference) and your hypothesis is that it will be substantive (>5 point difference).

We can visualize your prediction against the results by tweaking our call to plot_mdiff just a bit:

  effect_size = "mean", 
  rope = c(-5, 5)

We’ve passed a two-element vector that defines the interval null. This is called a ROPE or region of practical equivalence. We’ve defined the rope using the concatenate function in R which creates vectors: c(-5, 5) — that means create a vector with elements -5 and 5 and send that to the function where it is expecting a ROPE to be defined.

Here’s what we get:

You can see the ROPE shaded in in red and pink, and you can visually compare the results with the predictions of you and the skeptic. The rules for declaring victory or simple: if the whole CI of the result is inside the ROPE, the skeptic wins, if the whole CI is outside, you win, and if there is overlap there is a draw. In this case, we can see that the CI on the difference is fully outside the ROPE (though not by a ton). If the ROPE had really been established a priori and a sensitive experiment designed to test the predictions, we’d now have a strong confirmatory indication that their is, indeed, a substantive effect of video exposure on confidence (well, strong statistical evidence… we’d still need to think about the internal and external validity of our study and the extent to which this supports our claim as well).

Want to conduct the hypothesis test a bit more formally? esci can help with the test_mdiff function, which takes arguments very similar to what we passed to plot_mdiff:

  effect_size = "mean",
  rope = c(-5, 5)

We get back a complex object, but one of its components is a table called interval_null which has this content:

                    test_type outcome_variable_name effect          rope
1 Practical significance test            Prediction 20 ‒ 1 (-5.00, 5.00)
  confidence                                                       CI
1         95 95% CI [6.616553, 16.15202]\n90% CI [7.387331, 15.38124]
              rope_compare p_result
1 95% CI fully outside H_0 p < 0.05 conclusion significant 1 At α = 0.05, conclude μ_diff is substantive TRUE


In this example, we conducted a hypothesis test on mean differences, but we could just as easily work with median differences just by changing the effect_size argument to “median”. How cool is it to be able to conduct interval null tests of differences in median!? Think how sophisticated you will feel!


We’ve taken a quick tour of analyzing a two group design in esci.

esci is still in development. I expect the visualization functions, like plot_mdiff, to still change a bit. But the overall workflow should hopefully be stable and sensible: you generate an estimate with an estimate_ function, you can then visualize it (plot_ functions) and/or evaluate a hypothesis with it (test_ functions). The estimate_function produces complex lists with all the results you need: overview table, various es_ tables reporting different effect sizes, and _properties lists with all the nitty-gritty details. And that’s that!

  1. Cusack, M., Vezenkova, N., Gottschalk, C., & Calin-Jageman, R. J. (2015). Direct and Conceptual Replications of Burgmer &amp; Englich (2012): Power May Have Little to No Effect on Motor Performance (J. M. Haddad, Ed.). Public Library of Science (PLoS). doi: 10.1371/journal.pone.0140806
  2. Kardas, M., & O’Brien, E. (2018). Easier Seen Than Done: Merely Watching Others Perform Can Foster an Illusion of Skill Acquisition. SAGE Publications. doi: 10.1177/0956797617740646

I'm a teacher, researcher, and gadfly of neuroscience. My research interests are in the neural basis of learning and memory, the history of neuroscience, computational neuroscience, bibliometrics, and the philosophy of science. I teach courses in neuroscience, statistics, research methods, learning and memory, and happiness. In my spare time I'm usually tinkering with computers, writing programs, or playing ice hockey.

Leave a Reply

Your email address will not be published. Required fields are marked *