statpsych for Estimation in R

One reason p values remain so popular is that it is so easy to produce them — there is a whole ecosystem of software that will help you churn out endless hypotheses tests, even if the researcher doesn’t actually have any clear hypotheses to test.

Rather than p all over your papers, you might be persuaded to report and interpret effect sizes along with measures of uncertainty (confidence intervals, credible intervals, etc.). At the moment (11/2022) this can sometimes be challenging–the analysis ecosystem doesn’t always make it clear or easy how to do this, especially for more complex designs.

Geoff has helped solve this problem with ESCI, a set of excel worksheets that can provide effect sizes and confidence intervals for lots of different study designs. We’ve also released esci, a module for jamovi — this module provides effect sizes and confidence intervals for lots of different designs in an environment that is really user-friendly and yet quite powerful.

What about R? Well, if you dabble in R even a bit you’ll find a fairly bewildering embarrassment of riches. You can calculate effect sizes and CIs for yourself, use some of the base R functions, or turn to a huge array of packages on CRAN or elsewhere. While the variety of options is empowering, it can also leave you wondering if you’re taking the best approach, if you can trust the code in an adopted package, etc.

In this series of pages I’ll introduce a great option for R: the statpsych package by Doug Bonett. The statpsych package is relatively new (first released 8/2022); it adopts a frequentest approach, meaning it provides effect sizes with confidence intervals. It doesn’t give Bayesian credible intervals or re-sampling based intervals (like DaBestR)

Why is the statpsych package worth learning?

  • Extensive – statpsych covers means, medians, proportions and correlations for simple designs and some complex designs (with more on the way)
  • Easy – statpsych takes in (mostly) summary data and then outputs CIs; the function names and parameters are consistent and easy to decipher
  • Well-documented – Each function has authoritative references, often to the amazing body of work Doug Bonett has amassed on estimation statistics. With statpsych you can easily get to know what’s going on under the hood, and can find clear and direct references to help you understand the expected behavior of the confidence intervals you are generating
  • Well-vetted – Doug’s done extensive simulations for each function and there are extensive tests written for the package.

Cool, right? Let’s dive in.

Installing and Loading statpsych

statpsych is on CRAN, so you can install it in R with this simple command:


You only need to run this once and statpsych will then be persistently installed on the R environment you are using (unless/until you decide to remove it).

To use statpsych, you have to options (R has soooo many options). The first option is to load the library, making all its functions accessible for the rest of that session\script. Once loaded, functions in statpsych are called directly. Here, for example, I am loading statpsych and then directly calling the ci.mean1 function to calculate the 95% CI on a mean

ci.mean1(alpha = 0.05, m = 67, sd = 2.7, n = 10)

You don’t *have* to do this. Instead, you can skip the library statement, and call each function using <package>::<function>. So, for example, this also works:

statpsych::ci.mean1(alpha = 0.05, m = 67, sd = 2.7, n = 10)

As you can see, by providing a more complete name for the function (package as well as function name) you can call the function directly without any previous library statement.

I’m not going to get into which of these approaches is best, just wanted to point out that both are options. Also, you can use the full function name even with the library statement. Personally, I like using the full function names so that dependencies are crystal clear as I review my code… but do as you like.

First example: CI for a continuous variable, no grouping

Let’s start with something really simple: obtaining the effect size and CI for acontinuous measurement where we have no grouping. This could be estimating the degree of depression amongst a group of patients exposed to prior trauma. It could be estimating the expression of a learning-related gene in the hippocampus of a normal rat.

For this example, I’m going to estimate the mean level of happiness at my university. The data is the first 20 responses from an online survey my stats students conducted in 2022. Part of the survey included the Satisfaction With Life Scale, a 5-item scale measured from 1 to 7 meant to gauge a participant’s satisfaction with life (a component of happiness). If you’ve never encountered this scale before, you can find out more about it here:

Before we begin, we can think a bit about the wisdom of what we’re about to do. First, the data is from a convenience sample, not a random sample. Second, the measure is on a bounded ordinal scale. Both of these facts violate the assumptions of the analysis we’re about to do… and yet in psychology it is pretty common to forge ahead anyways, so let’s put those valid qualms aside for the moment and carry on.

Here’s the code.

# Environment
#  If statpsych is not installed, install it, then load it
if (!require("statpsych")) install.packages("statpsych")

# Data - Satisfaction With Life Scale (SWLS) response for first 20 participants 
#  of a class-project survey at Dominican University.  SWLS is measured on a 
#  scale from 1 - 7.  
swls_data <- c(

# Descriptive statistics
#  Note the need to think carefully about NA (missing) responses
#  The calculation of valid sample size is especially important and
#   not immediately obvious when beginning R
swls_mean <- mean(swls_data, na.rm = TRUE)
swls_sd <- sd(swls_data, na.rm = TRUE)
swls_n <- sum(!

# Estimate effect size and 95% CI
#  Note that this function wants summary data only, no raw data
#  Also, it takes alpha, not the confidence level
  alpha = 1 - 0.95,
  m = swls_mean,
  sd = swls_sd,
  n = swls_n

The first part installs (if needed) and loads statpsych.

Then we load the data. In this case, with just one measure, I loaded it as a simple vector. Note that, as expected for survey data, we have some non-responses (the 19th participant did not complete the scale, their response is represented as NA (missing)).

The statpsych package deals mostly with summary data–it is up to us, the power R user, to obtain the descriptive stats to feed into statpsych. So the next section uses the mean and sd functions that are part of base R to obtain the mean and sd of our vector of data. Note that we need to tell these functions to ignore the NA value –otherwise they will fail. You’ll see that sample size is also a bit tricky– the length function in base R will count the NA value (as it should)–we need instead to count up all the values that are not NA. Once you think about it, you see why a programming language would need to work this way–but it gives you a sense of the fact that R, like any programming language can have some unexpected complexities which can lead the unwary seriously astray.

After all that work, we finally use statpsych to obtain the effect size, standard error and CI. The output will look like this:

     Estimate        SE       LL       UL
[1,] 5.168421 0.2719921 4.596987 5.739855

It is telling us the point estimate (the mean) is 5.2 (I’m rounding), the standard error is .3, and the 95% CI is [4.6. 5.7]. Easy, right? This says that data are compatible with the true population mean being between 4.6 and 5.7 (though values beyond those limits are not that much less compatible, either). Given the scale midpoint is 4, that suggests the typical student at my university is fairly satisfied with their life– though this conclusion should be checked by the non-random sample, the unsuitability of the measure for this type of analysis, and the lack of checks for the assumption of normality.

For the statpsych module, note that the preference is to specify not the confidence level (95%, 99%, etc.) but alpha. So, for a 95% confidence interval you give alpha = 0.05, and for a 99% confidence interval you give alpha = 0.01, etc. I like to specify this subtraction to make it fully clear what confidence level I am obtaining. Here, for example, is the 99% CI for the same data:

# Here's a 99% CI
  alpha = 1 - 0.99,
  m = swls_mean,
  sd = swls_sd,
  n = swls_n

and this outputs:

     Estimate        SE       LL       UL
[1,] 5.168421 0.2719921 4.385508 5.951334

Same point estimate, same standard error, but now the CI is broader: 99% CI [4.5, 6.0] (again, I am rounding!). The CI is wider to capture 99% of the expected level of sampling error.

So far, we’ve just called statpsych functions, so their output is printed in the R session. We can also store the output for additional manipulation:

# And here we can store the result
#  most of the time, statpsych returns a matrix
swls_CI <- statpsych::ci.mean1(
  alpha = 1 - 0.95,
  m = swls_mean,
  sd = swls_sd,
  n = swls_n

# You can access the elements in the matrix with row, column notation
# The LL of the CI is row 1, column 3
swls_CI[1, 3]
# You can also use the column names
swls_CI[1, "LL"]

As you can see, statpsych has returned a matrix, a compound variable that allows access to its elements with row, column notation. The columns are named, which makes it a bit easier to write clear code when accessing elements of the results.

I often end up converting the matrix statpsych returns into a list that enables list notation to access the elements in the results. Here’s how:

# You might find it convenient to convert a row of the matrix into a list
# As you can then use list notation to access results elements
swls_CI <- as.list(swls_CI[1, ])

Note that I converted the first row only. In some statpsych functions we get back multiple rows, each representing a different approach to constructing the CI — so you’ll want to pay attention to that.

Want to know more about how this function works? Easy, in your R session you can pull up help on a function with the ?

? statpsych::ci.mean1

This will bring up the documentation on the function. There is also a handy online reference here:

The documentation always includes an authoritative reference. A CI for a single mean is not exactly exotic, so the reference is to a textbook (Snedecor & Cochran, 1989, 8th edition). But there are often places where the CI and procedure might not be as familiar and the references are invaluable.

Some additional effect size measures for a quantitative variable without grouping

One thing I love about statpsych is that it is really comprehensive. Let’s find some additional ways to express the effect size for this data.

Instead of the mean, lets estimate the *median* level of satisfaction along with a 95% CI. That’s easy-ish with the ci.median1 function. Two things, though: First, this function is one of the rare statpsych functions that takes raw data. Second, statpsych does *no* data processing, checking, or cleaning. You need to pass it only valid data (no NA values)–but if you forget it will not necessarily choke and it certainly won’t give you an error message.

With those warnings, here is the code:

# Median and 95% CI for a quantitative variable, no grouping
# Note that this function does *not* screen out NA values -- you need to!
  alpha = 1 - 0.95,
  y = swls_data[!]

It should output:

     Median        SE  LL UL
[1,]    5.2 0.2989646 4.6  6

Telling us the median SWLS in this sample is 5.2 and that the 95% CI is [4.6, 6] — the data are compatible with median satisfaction being anywhere from above moderate to quite high (and values around those limits are not distinctly less compatible).

This is a CI that is *not* often taught in textbooks — you can check the documentation to get a reference. Want to know what’s actually going on under the hood, though ? Well, you can find the source code on CRAN. You can also just type the name of the function without any parentheses/inputs and R will spit back out the source code.

So, if you type:


You should see (in part), this:

ci.median1 <- function(alpha, y) {
 n <- length(y)
 y <- sort(y)
 z <- qnorm(1 - alpha/2)
 median <- median(y)
 c1 <- round((n - z*sqrt(n))/2)
 if (c1 < 1) {c1 = 1}
 ll <- y[c1]
 ul <- y[n - c1 + 1]
 a <- round(n/2 - sqrt(n))
 if (a < 1) {a = 1}
 ll1 <- y[a]
 ul1 <- y[n - a + 1]
 p <- pbinom(a - 1, size = n, prob = .5)
 z0 <- qnorm(1 - p)
 se <- (ul1 - ll1)/(2*z0)
 out <- t(c(median, se, ll, ul))
 colnames(out) <- c("Median", "SE", "LL", "UL")

Cool, right? You can see that the data are sorted, and then the index lower limit (c1) is calculated (with a floor of 1) and the data point at that place becomes the lower limit of the CI. And the upper limit is similar, but the index is n – c1 + 1 (which means, at the limit it can be the top score in the set). It’s useful to know this — that no matter what the confidence level, this approach cannot return any value beyond the smallest and largest scores in the data set.

Let’s go back to parametric statistics and express the effect size one additional way. Perhaps we have some population standard to compare our scores to. For example, the EAMMI2 project measured SWLS across many U.S. college campuses. We could compare the measure at my campus to the EAMMI mean, and we could express the difference in standard deviations (Cohen’s d).

The EAMMI mean was 4.47 across all campuses (N = 3132). So let’s calculate Cohen’s d between my sample and this population reference:

  alpha = 1 - 0.95,
  m = swls_mean,
  sd = swls_sd,
  n = swls_n,
  h = 4.47

This should output:

      Estimate        SE         LL       UL
[1,] 0.5642023 0.2553337 0.08864866 1.089538

Meaning that SWLS in the sample at my campus was 0.56 standard deviations higher than the EAMMI2 average. We have a lot of uncertainty estimating the *population* difference between my campus and the EAMMI2 population: 95% CI [0.09, 1.09] (again, I am rounding). The data are compatible with my campus being slightly more satisfied than the EAMMI2 population up to substantially more satisfied (and values around these limits are not notably less compatible). Again, this is comparing two convenience samples to each other. And it is also based on just the first 20 participants in the survey my students completed. Their final survey had 207 participants, and the mean over the whole data set was 4.44 (sd = 1.18) — very close to the EAMMI2 average! If we take that result to be accurate, then it would suggest that the CI from the first 20 participants did not capture the truth, as we would expect 5% of the time (for a 95% CI) based on sampling variation alone, and more often based on the numerous other sources of error we should expect from such a haphazard comparison (including model mis-specification, sampling bias, etc.)

Cohen’s d is a tricky effect size measure — more on that in a future post.

Wrapping Up

I hope that’s a very gentle introduction to statpsych, one that you can follow along even if you are brand new to R.

In future posts, we’ll explore the package in more depth–taking a look at comparisons between groups, complex designs, linear contrasts, and more.; Stay tuned.