Criminal justice and the perils of regression analysis

Regression analysis is incredible–it can literally help us predict the future based on patterns observed in the past. There are many pitfalls, however, to using regression. First, the correlations observed in the past may not apply to new cases or new contexts.  Second, even when the correlation holds, regression does best at predicting what the *typical* or *average* outcome will be–it usually does not do a great job at predicting individual outcomes. You can see this in ESCI: compare the CI for the predicted mean (the range that is plausible for the *typical* outcome) to the prediction interval (PI) for a specific prediction (the range that is plausible for an *individual* outcome). You’ll notice that the PI is much longer. Moreover, as the relationship gets stronger, the CI gets much shorted, but the PI is barely affected–that’s because the PI is influenced not only by the strength of the relationship but also by the natural variation in the Y variable.

In such abstract terms it may be difficult to get all worked up about the nuances of regression analysis.  Here’s a rather remarkable piece of science reporting, however, that examines the use of regression analysis to help make individualized predictions–in this case to provide ‘risk scores’ for those in the criminal justice system. These risk scores are being used to help influence many individual decisions–about bail levels, sentencing levels, and parole.   As the article makes clear, however, the predictions are often wildly inaccurate and potentially racially biased. It’s well worth reading as a case study in the complexities of applying regression analysis in the real world, and as an example of why statistical savvy is increasingly essential for good citizenship.


The article is “Risk scores attached to defendants unreliable, racially biased”. It appeared in the Milwaukee-Wisonsin Journal Sentinal on 5/30/206 and was written by Julia Angwin, Jeff Larson, Surya Mattu, Lauren Kirchner (of ProPublica). The full link is here:


I'm a teacher, researcher, and gadfly of neuroscience. My research interests are in the neural basis of learning and memory, the history of neuroscience, computational neuroscience, bibliometrics, and the philosophy of science. I teach courses in neuroscience, statistics, research methods, learning and memory, and happiness. In my spare time I'm usually tinkering with computers, writing programs, or playing ice hockey.

Leave a Reply

Your email address will not be published. Required fields are marked *