Anscombe's Quartet
Posted4 months agoActive4 months ago
en.wikipedia.orgSciencestory
calmpositive
Debate
20/100
Data VisualizationStatisticsData Analysis
Key topics
Data Visualization
Statistics
Data Analysis
The discussion around Anscombe's Quartet highlights the importance of data visualization in understanding data, with commenters sharing related examples and experiences.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
1d
Peak period
15
24-30h
Avg / period
6.5
Comment distribution26 data points
Loading chart...
Based on 26 loaded comments
Key moments
- 01Story posted
Sep 8, 2025 at 5:29 AM EDT
4 months ago
Step 01 - 02First comment
Sep 9, 2025 at 8:14 AM EDT
1d after posting
Step 02 - 03Peak activity
15 comments in 24-30h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 10, 2025 at 9:03 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45166245Type: storyLast synced: 11/20/2025, 8:47:02 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
See also:
https://en.wikipedia.org/wiki/Datasaurus_dozen
usually there are more than 2 or 3 columns in our data :(
p(data | stats) = p(stats | data) * p(data) / p(stats).
and p(data) is only strong for a "blob / cloud" of points, so when there's some correlation the observed stats tell you that you likely have a blob having some degree of correlation.
We just spent five years since COVID appeared to argue about statistics, with tons of bad analysis of very complicated data fuelling political rage up to this day.
The US health secretary is currently using data with "strong structure" to deny vaccines and to falsely pin down convenient targets for everything from cancer to autism.
* https://en.wikipedia.org/wiki/G._E._M._Anscombe
:)
https://en.wikipedia.org/wiki/Gareth_Anscombe
:-)
https://github.com/stefmolin/data-morph
https://www.linkedin.com/posts/panela_loved-adding-ancombes-...
I recommend putting together the Quintet in one image, so that the original 4 charts, plus the new one are all visible and interpretable together. It will be learning aid for decades to come.
[0] https://en.wikipedia.org/wiki/Simpson's_paradox
But sometimes you are at the mercy of the data and your visualization of choice. Box plots, for example, are great at showing more than just how the data is centered, but it is possible to encounter situations where the box plots of the data remain static while the underlying data is clearly changing [0].
As always it is good to know about these things and continue to add to the arsenal (violin plots, in the example above) of tools and intuition needed to tease out the story behind the data.
0: https://www.research.autodesk.com/publications/same-stats-di...
This will require improvements to vision models, RL frameworks, etc, but will be interesting to see how much it can broaden current abilities.
Linear correlation is just one pattern the data can have.
Unfortunately many social science publications have reviewers who know only the basics and can't judge or accept statistically valid analysis that is outside their competence. Fit it into line or nothing.
And was just thinking about it the other day. I had a bug aggregating sleep-data from an iPhone, which comes in the form of sleep-samples.
I was trying to fix it, both by prodding Claude Code to fix the problem, and looking at debug logs of the sleep-samples, but we weren't getting anywhere. I asked Claude Code to graph the samples, and BAM, saw it right away. (the problem was that HealthKit returns you sleep-samples from ALL devices, not just the priority one)
Maybe not exactly the same thing as Anscombe/Tufte were getting at, but I was reminded of it, and the value of visualising data.
https://blog.revolutionanalytics.com/2017/05/the-datasaurus-...