What the datasaurus tells us: Data pictures are cool
In various places in ITNS, especially Chapter 11 (Correlation) we discuss how important it is to make good pictures of data, to reveal what’s really going on. Calculating a few summary statistics–or even CIs–often just doesn’t do the job.
Many statistics textbooks use Anscombe’s Quartet to make the point: the 4 scatterplots below, all of which have the same (or very close to the same) mean and SD of both X and Y, and also correlation between X and Y.
Now some clever folks have worked out how to generate an unlimited number of datasets all with these same summary statistics (or very close) but weirdly different shapes and patterns. One of the pics is the dotted outline of the datasaurus, below. Click here to see a dozen pics cycle through. (At that site, click on the changing pic to go to the paper that describes what’s in the engine room.)
We may never trust a Pearson r value again, and that may not be a terrible thing!
P.S. Thanks to Francis S. Gilbert for the heads up.