Everything Is Correlated (2014–23)
Posted4 months agoActive4 months ago
gwern.netSciencestoryHigh profile
calmmixed
Debate
70/100
StatisticsCorrelationCausalityData Analysis
Key topics
Statistics
Correlation
Causality
Data Analysis
The article 'Everything is Correlated' explores the idea that everything is correlated due to various factors, sparking a discussion on the implications of this concept on statistical analysis and causality.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4h
Peak period
77
0-12h
Avg / period
14.1
Comment distribution127 data points
Loading chart...
Based on 127 loaded comments
Key moments
- 01Story posted
Aug 21, 2025 at 10:05 PM EDT
4 months ago
Step 01 - 02First comment
Aug 22, 2025 at 1:52 AM EDT
4h after posting
Step 02 - 03Peak activity
77 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 27, 2025 at 6:55 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 44980339Type: storyLast synced: 11/20/2025, 6:30:43 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Everything Is Correlated - https://news.ycombinator.com/item?id=19797844 - May 2019 (53 comments)
(It said "Related" before, of course: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que....)
> Since every piece of matter in the Universe is in some way affected by every other piece of matter in the Universe, it is in theory possible to extrapolate the whole of creation — every sun, every planet, their orbits, their composition and their economic and social history from, say, one small piece of fairy cake.
Given different T_zero configs of matter and energies T_current would be different. and there are many pathways that could lead to same physical configuration (position + energies etc) with different (Universe minus cake) configurations.
Also we are assuming there is no non-deterministic processed happening at all.
Why? We learn about the past by looking at the present all the time. We also learn about the future by looking at the present.
> Also we are assuming there is no non-deterministic processed happening at all.
Depends on the kind of non-determinism. If there's randomness, you 'just' deal with probability distributions instead. Since you have measurement error anyway, you need to do that anyway.
There are other forms of non-determinism, of course.
We infer about the past, based a bit on some material evidence we can subjectively partially get some acquaintance with. Through thick cultural biases. And the actual material suggestions should not come to far from our already integrated internal narrative, without what we will ignore it or actively fight it.
Future is pure fantasm, only bound by our imagination and what we take for unchallengeable fundamentals of what the world allows according to our inner model of it.
At least, that's one possible interpretation of the thoughts when an attention focus on present.
Having observed the sun go up and down reliably over my lifetime so far, I infer that it will keep doing so far quite a while, and that it has done so before I was born, too. Not much culture about it.
Every scientific measurement has elements of this.
And to be honest, we only learn about the present through a thick layer of interpretation and inference, too. The past and future aren't that special in that regard.
It becomes even more fun, when you add the finite propagation time of signals into the mix: here on earth we can never learn about the moon in the present, only how it was about 1s ago.
The original comment's was in a differnt spirit or at least how I interpreted it. It was more implying by looking at a very small slice of reality you should in theory be able to re-construct the whole universe because every particle and space quantum is being influenced (to tiniest degrees) by every other particle in the universe, which will not work if you dont know all the rules, no determinism and T_zero.
I don't know why you want to know T_zero. I assume T means time here, not temperature or so?
If you have eg randomness as your non-determinism, you can still build probability distributions of the rest of the universe and the rest of time. (And honestly, even if you have determinism in the laws, you always have measurement errors. Even in classical mechanics.)
After all, Feynman showed this is in principle possible, even with local nondeterminism.
(this being a text medium with a high probability of another commenter misunderstanding my intent, I must end this with a note that I am, of course, BSing :)
[1]: Arguments are ongoing about whether the universe has "real" numbers (in the mathematical sense) or not. However it is undeniable the Planck constants still provide a practical barrier to any hypothetical real valued numbers in the universe that make them in practice inaccessible.
> Bohm employed the hologram as a means of characterising implicate order, noting that each region of a photographic plate in which a hologram is observable contains within it the whole three-dimensional image, which can be viewed from a range of perspectives.
> That is, each region contains a whole and undivided image.
> "There is the germ of a new notion of order here. This order is not to be understood solely in terms of a regular arrangement of objects (e.g., in rows) or as a regular arrangement of events (e.g., in a series). Rather, a total order is contained, in some implicit sense, in each region of space and time."
> "Now, the word 'implicit' is based on the verb 'to implicate'. This means 'to fold inward' ... so we may be led to explore the notion that in some sense each region contains a total structure 'enfolded' within it."
No?
You can have two independent random walks. Eg flip a coin, gain a dollar or lose a dollar. Do that to times in parallel. Your two account balances will change over time, but they won't be correlated.
This, once developed, just happened to be a useful method. But given the abuse using those methods, and the proliferation of stupidity disguised as intelligence, it's always fitting to question it, and this time with this correlation noise observation.
Logic, fundamental knowledge about domains, you need that first. Just counting things without understanding them in at least one or two other ways, is a tempting invitation for misleading conclusions.
https://www.youtube.com/watch?v=VEIrQUXm_hY
https://www.youtube.com/watch?v=0xeMak4RqJA
And they were much, much worse off for it. Logic does not let you learn anything new. All logic allows you to do is restate what you already know. Fundamental knowledge comes from experience or experiments, which need to be interpreted through a statistical lens because observations are never perfect.
Before statistics, our alternatives for understanding the world was (a) rich people sitting down and thinking deeply about how things could be, (b) charismatic people standing up and giving sermons on how they would like things to be, or (c) clever people guessing things right every now and then.
With statistics, we have to a large degree mechanised the process of learning how the world works, and anyone sensible can participate, and they can know with reasonable certainty whether they are right or wrong. It was impossible to prove a philosopher or a clergyman wrong!
That said, I think I agree with your overall point. One of the strengths of statistical reasoning is what's sometimes called intercomparison, the fact that we can draw conclusions from differences between processes without understanding anything about those processes. This is also a weakness because it makes it easy to accidentally or intentionally manipulate results.
I cannot see the problem in that. To get to meaningful results we often calculate with simplyfied models - which are known to be false in a strict sense. We use Newtons laws - we analyze electric networks based on simplifications - a bank-year used to be 360 days! Works well.
What did i miss?
One rule of thumb for interpreting (presumably Pearson) correlation coefficients is given in [0] and states that correlations with magnitude 0.3 or less are negligible, in which case most of the bins in that histogram correspond to cases that aren't considered meaningful.
[0]: https://pmc.ncbi.nlm.nih.gov/articles/PMC3576830/table/T1/
EDIT: I also get the feeling that you think it’s okay to do an incorrect hypothesis test (c > 0), as long as you also look at the effect size. I don’t think it is. You need to test the c > 0.3 hypothesis to get a mathematically sound hypothesis test. How many papers do that?
In that A/B testing scenario, I think if someone wants to test whether the difference is zero, that's fine, but if the effect size is small, they shouldn't claim that there's any meaningful difference. I believe the pharma literature calls this scenario equivalence testing.
Assuming a positive difference in means is desirable, I think testing for a null hypothesis of a change of at least some positive value (e.g., +5% of control) is a better idea. I believe the pharma literature calls this scenario superiority testing.
I believe superiority testing is preferable to equivalence testing, and in professional settings, I have made this case to managers. I have not succeeded in persuading them, and thus do the equivalence testing they request.
I don't think the idea of a zero null hypothesis is necessarily mathematically unsound. In cases like the difference in means, a zero null hypothesis is well-posed. However, I agree with you that there are better practices, like a null hypothesis incorporating a nonzero effect.
I don't entirely agree with the arguments Gwern puts forth in the Implications section because some of them seem at odds with one another. Betting on sparsity would imply neglecting some of the correlations he's arguing are so essential to capture. The bit about algorithmic bias strikes me as a bizarre proposition to include with little supporting evidence, especially when there are empirical examples of algorithmic bias.
What I find lacking about Gwern's piece is that it's a bit like lighting a match to widespread statistical practice, and then walking away. Yes, I think null hypothesis statistical testing is widely overused, and that statistical significance alone is not a good determinant of what constitutes a "discovery". I agree that modeling is hard, and that "everything is correlated" is, to an extent, true because the correlations are not literally or exactly zero. But if you're going to take the strong stance that null hypothesis statistical testing is meaningless, I believe you need to provide some kind of concrete alternative. I don't think Gwern's piece explicitly advocates an alternative, and it only hints the alternative might be causal inference. Asking people who may not have much statistics training to leap from frequentist concepts taught in high school to causal inference would be a big ask. If Gwern isn't asking that, then I'd want to know what a suggested alternative would be. Notably, Gwern does not mention testing for nonzero positive effects (e.g., in the vein of the "c > 0.3" case above). If there isn't an alternative, I'm not sure what the argument is. Don't use statistics, perhaps? It's tough to say.
> I don't think the idea of a zero null hypothesis is necessarily mathematically unsound. In cases like the difference in means, a zero null hypothesis is well-posed. However, I agree with you that there are better practices, like a null hypothesis incorporating a nonzero effect.
I don’t think a zero null hypothesis is mathematically unsound of course. But I think it is unsound to do one and then look at the effect size as a known quantity. It’s not a known quantity, it’s a point estimate with a lot of uncertainty. The real underlying correlation may well be a lot lower than the point estimate.
And of course it’s hard to get people in charge interested in better hypothesis testing. That testing will result in fewer conclusions being drawn / fewer papers being published. It’s just another symptom of the core issue: it’s quite convenient to be able to buy the conclusions you want with money.
You didn't really miss anything. The article is incomplete, and wrongly suggests that something like "false" even exists in statistics. But really something is only false "with a x% probability of it actually being true nonetheless". Meaning that you have to "statistic harder" if you want to get x down. Usually the best way to do that is to increase the number of tries/samples N. What the article gets completely wrong is that for sufficiently large N, you don't have to care anymore, and might as well use false/true as absolutes, because you pass the threshold of "will happen once within the lifetime of a bazillion universes" or something.
Problem is, of course, that lots and lots of statistics are done with a low N. Social sciences, medicine, and economy are necessarily always in the very-low-N range, and therefore always have problematic statistics. And try to "statistic harder" without being able to increase N, thereby just massaging their numbers enough to get a desired conclusion proved. Or just increase N a little, claiming to have escaped the low-N-problem.
I do not think it is accurate to portray the author as someone who does not understand asymptotic statistics.
Nope. The correct way is rather something like "the measurements/polls/statistics x ± ε are consistent with this parameter's true value to be zero", where x is your measured value and ε is some measurement error, accuracy or statistical deviation. x will never really be zero, but zero can be within an interval [x - ε; x + ε].
For example, eat a lot and you will gain weight, gain weight and you will feel more hungry and will likely eat more.
Or exercise more and it becomes easier to exercise.
Earning money becomes easier as you have more money.
Public speaking becomes easier as you do it more and the more you do it, the easier it becomes.
Etc...
That's saying the same thing twice :)
Only if you don't injure yourself while exercising.
But I suspect that being able to figure out causation doesn't matter much from a survival or reproduction perspective because cause and effect are just labels.
Reality in a self-perpetuating cycle is probably like Condition A is 70% responsible and Condition B is 30% responsible for a problem but they feedback and exacerbate each other... You could argue that Condition A is the cause and Condition B is the effect because B < A but that's not quite right IMO. Also, it's not quite right to say that because A happened first, that A is the cause of a severe problem... The problem would never have gotten so bad to such extent without feedback from B.
Please explain.
Similarly, for every chemical and nuclear reaction, when something is gained, something else is lost. For example, when two ions bond covalently by sharing electrons, a new molecule is gained, but the two ions are no longer what they previously were. So there is a correlation between gain of reaction products and loss of reactants.
But perhaps such analogies cannot be found everywhere in theoretical physics. Perhaps such a non-correlation would be a sign of a novel discovery, or a sign that a theory is physically invalid. It could be a signal of something for sure.
How do I reconcile "for every chemical and nuclear reaction, when something is gained, something else is lost" with catalysts increasing rate but not being consumed themselves?
In fact you can show there are an uncountably infinite number of broken symmetries in nature, so it is mathematically possible to concoct a parallel number of cases where nature does not have some "zero sum game" by Noether's Theorem.
Your statement is just cherry picking a few and then (uncountably infinitely) overgeneralizing.
No decision making, no min-maxing actors etc.
Btw, when you have a single optimising actor, then moving along the efficiency barrier is also a set of trade-offs, which can be made constant-sum, if you set up your conversion factors just right; even if the optimisation itself is otherwise variable sum. (As a silly illustration: to produce more guns, you need to produce less butter.) But that observation doesn't really prove anything.
Catalysts increase reaction rate just as a train runs faster on a track. Is a railway a catalyst?
Are symmetries broken in nature or just models of nature? Or are you referring to accepted theories in theoretical physics, which was the entire point here?
People interpret "statistically significant" to mean "notable"/"meaningful". I detected a difference, and statistics say that it matters. That's the wrong way to think about things.
Significance testing only tells you the probability that the measured difference is a "good measurement". With a certain degree of confidence, you can say "the difference exists as measured".
Whether the measured difference is significant in the sense of "meaningful" is a value judgement that we / stakeholders should impose on top of that, usually based on the magnitude of the measured difference, not the statistical significance.
It sounds obvious, but this is one of the most common fallacies I observe in industry and a lot of science.
For example: "This intervention causes an uplift in [metric] with p<0.001. High statistical significance! The uplift: 0.000001%." Meaningful? Probably not.
And if we increase N enough we will be able to find these 'good measurements' and 'statistically significant differences' everywhere.
Worse still if we did not agree in advance what hypotheses we were testing, and go looking back through historical data to find 'statistically significant' correlations.
One interesting thing to keep in mind is that Ronald Fisher did most of his work before the publication of Kolmogorov's probability axioms (1933). There's a real sense in which the statistics used in social sciences diverged from mathematics before the rise of modern statistics.
So there's a lot of tradition going back to the 19th century that's misguided, wrong, or maybe just not best practice.
N being big means that small real effects can plausibly be detected as being statistically significant.
It doesn't mean that a larger proportion of measurements are falsely identified as being statistically significant. That will still occur at a 5% frequency or whatever your alpha value is, unless your null is misspecified.
But even though you know the measurement can't be exactly 0.000 (with infinitely many decimal places) a priori, you don't know if your measurement is any good a priori or whether you're measuring the right thing.
In a finite or countable number of trials you won't see a measure zero event.
> they're estimating the probability of rejecting the null if the null was true.
Right, but the null hypothesis is usually false and so it's a weird thing to measure. It's a proxy for the real thing you want, which is the probability of your hypothesis being true given the data. These are just some of the reasons why many statisticians consider the tradition of null hypothesis testing to be a mistake.
Significance does not tell you this. The p-value can be arbitrarily close to 0 while the probability of the null hypothesis being true is simultaneously arbitrarily close to one
(effect size) / (noise / sqrt(n))
Note that bigger test statistic means smaller p-value.
So very low p-values usually come from bigger effects or from very large sample sizes (n). That's why you can technically get p<0.001 with a microscopic effect, but only if you have astronomical sample sizes. In most empirical studies, though, p<0.001 does suggest the effect is going to be large because there are practical limits on the sample size.
I'm regularly working with datasets in the hundreds of thousands to millions, and that's small fry compared with what's out there.
The use of regression, for me at least, is not getting that p-gotcha for a paper, but as a posh pivot table that accounts for all the variables at once.
For example, I’ve encountered the belief that just by recording something at ultra high temporal resolution gives you “millions of datapoints”. This then has all sorts of effects on the breakdown of statistics and hypothesis testing (seemingly).
In reality, the replicability of the entire setup, the day it was performed, the person doing it, etc. means the n for the day is probably closer to 1. So to ensure replicability you’d have to at least do it on separate days, with separately prepared samples. Otherwise, how can you eliminate the chance that your ultra finicky sample just happened to vibe with that day’s temperature and humidity?
But they don’t teach you in statistics what exactly “n” means, probably because a hundred years ago it was much more literal in nature. 100 samples is because I counted 100 mice, 100 peas, or 100 surveys.
For example, if your phenomenon is observable at 50 Hz, maybe even 10 Hz, then any higher temporal resolution does not give you new information, because any two adjacent datapoints in the time-series are extremely correlated. Going the other way, at a very low sampling frequency you'd just get the mean, which might not reveal anything of interest.
If you bin 100 Hz data at 50 Hz, are they the same? Is the Fourier spectrum the same? If you have samples of different resolution you must choose the lowest common denominator for a fair statistical comparison. Otherwise, a recording between a potato and an advanced instrument would always be "statistically different", which doesn't make sense.
If you don't find "anything", the old adage goes "the absence of evidence is not the evidence of absence", so statistics don't really fail here. You can only conclude that your method is not sensitive enough.
There’s a lot of folks out there though who learned the mechanics of linear regression in a bootcamp or something without gaining an appreciation for the underlying theories, and those folks are looking for low p-value and as long as they get it it’s good enough.
I saw this link yesterday and could barely believe it, but I guess these folks really live among us.
https://stats.stackexchange.com/questions/185507/what-happen...
As an example, read just about any health or nutrition research article referenced in popular media and there's very often a pretty weak effect size even though they've achieved "statistical significance." People then end up making big changes to their lifestyles and habits based on research that really does not justify those changes.
[1] https://www.youtube.com/watch?v=lG4VkPoG3ko
> Using Effect Size—or Why the P Value Is Not Enough
> Statistical significance is the least interesting thing about the results. You should describe the results in terms of measures of magnitude –not just, does a treatment affect people, but how much does it affect them.
– Gene V. Glass
When wielded correctly, statistical significance is a useful guide both to what's a real signal worth further investigation, and it filters out meaningless effect sizes.
A bigger problem even when statistical significance is used right is publication bias. If, out of 100 experiments, we only get to see the 7 that were significant, we already have a false:true ratio of 5:2 in the results we see – even though all are presented as true.
Econometrics cares not only about statistical significance but also usefulness/economic usefulness.
Causal inference builds on base statistics and ML, but its strength lies in how it uses design and assumptions to isolate causality. Tools like sensitivity analysis, robustness checks, and falsification tests help assess whether the causal story holds up. My one beef is that these tools still lean heavily on the assumption that the underlying theoretical model is correctly specified. In other words, causal inference helps stress-test assumptions, but it doesn’t always provide a clear way to judge whether one theoretical framework is more valid than another!
This is such a bizarre sentence. The way its tossed in, not explained in any way, not supported by references, etc. Like I guess the implication being made is something like "because there is a hidden latent variable that determines criminality and we can never escape from correlations with it, its ok to use "is_black" in our black box model which decides if someone is going to get parole? Ridiculous. Does this really "throw doubt" on whether we should care about this?
The concerns about how models work are deeper than the statistical challenges of creating or interpreting them. For one thing, all the degrees of freedom we include in our model selection process allow us to construct models which do anything that we want. If we see a parole model which includes "likes_hiphop" as an explanatory variable we ought to ask ourselves who decided that should be there and whether there was an agenda at play beyond "producing the best model possible."
These concerns about everything being correlated actually warrant much more careful understanding about the political ramifications of how and what we choose to model and based on which variables, because they tell us that in almost any non-trivial case a model is at least partly necessarily a political object almost certainly consciously or subconsciously decorated with some conception of how the world is or ought to be explained.
It reads naturally in context and is explained by the foregoing text. For example, the phrase "these theoretical & empirical considerations" refers to theoretical and empirical considerations described above. The basic idea is that, because everything correlates with everything else, you can't just look at correlations and infer that they're more than incidental. The political implications are not at all "weird", and follow naturally. The author observes that social scientists build complex models and observe huge amounts of variables, which allows them to find correlations that support their hypothesis; but these correlations, exactly because they can be found everywhere, are not anywhere near as solid evidence as they are presented as being.
> Like I guess the implication being made is something like "because there is a hidden latent variable that determines criminality and we can never escape from correlations with it, its ok to use "is_black" in our black box model which decides if someone is going to get parole?
No, not at all. The implication is that we cannot conclude that the black box model actually has an "is_black" variable, even if it is observed to have disparate impact on black people.
Nothing in the statistical observation that variables tend to be correlated suggests we should somehow reject the moral perspective that that its desirable for a model to be based on causal rather than merely correlated variables, even if finding such variables is difficult or even, impossible to do perfectly. And its certainly also _meaningful_ to do so, even if there are statistical challenges. A model based on "socioeconomic status" has a totally different social meaning than one based on race, even if we cannot fully disentangle the two statistically. He is mixing up statistical and social, moral and even philosophical questions in a way which is, in my opinion, misleading.
Perfect is the enemy of good. That it would be desirable to construct a model based on causal variables is self-evident, but we don't have those, and if a correlative model can demonstrably improve people's material conditions, even if conditioned on variables you find "distasteful", what is your argument that such a model shouldn't be used?
Ironically, your "likes_hiphop" example would appear to be an unusually clean case of a variable that is likely to exert causal influence.
What do you think the causal effect of listening to lyrics like "Prolly leave my fuckin' show in a cop car" might be, on an impressionable teenage boy say?
From one of the most-streamed hip-hop songs of all time:
https://genius.com/Post-malone-rockstar-lyrics
https://newsroom.spotify.com/2024-05-20/best-hip-hop-songs-1...
>A model based on "socioeconomic status" has a totally different social meaning than one based on race, even if we cannot fully disentangle the two statistically.
I see no evidence Gwern disagrees with this claim. He just seems to be arguing the "cannot fully disentangle the two statistically" part.
The vast, vast majority of people understand the difference between media and real life. I mean I wouldn't go so far as to suggest that Post Malone is "good," either "morally" or aesthetically, but I don't think there is a strong case for lyrics, tv, or video games having a strong effect on violent behavior. But if it were the case it would be good to identify it accurately. There is plenty of violent "rock" music too, after all. The Columbine shooters weren't listening to hip hop.
Many people commit crimes. I'll bet criminals are more likely to listen to hip hop than the population at large is.
>it seems that the causal power of media is small
If the causal power of media is small, why are you concerned with Gwern's article? Even if he made claims that are blatantly racist, it wouldn't matter much, since the causal power of media is small.
>The vast, vast majority of people understand the difference between media and real life.
Suppose 99% understand that, and 1% don't. That can still be a big relative increase in the rate of crimes which do serious harm.
If you read the message of the song lyrics I linked, the clear implication (very common with this sort of music) is that criminal behavior will make lots of women want to have sex with you. This can easily be a self-fulfilling prophecy. Women listen to the lyrics and think to themselves "criminals sound cool and rebellious; criminal behavior is kinda hot -- all the other women are going for criminals; perhaps I will as well". Men who are trying to become attractive to women listen to the lyrics, and engage in crime alongside the other things they are doing which make them more attractive. Thus the prophecy becomes self-fulfilling, to society's detriment.
Anyways, as an exercise, ask ChatGPT to generate a list of top gangster rap artists. Then pick a few at random and ask if they've run into trouble with the law. There's a much higher rate of lawbreakers in this group than the population at large.
I should note that if Gwern's observations about correlations are true, then a negative result should be taken seriously, since positive correlations should be easy to find. Absence of strong correlations should reasonably be taken as a sign that a definitive connection is hard to come by. Of course, any good research in this field will attempt to control for confounds and if you ask me personally, I'm not optimistic about that prospect. But to the extent that this research says anything at all, the case isn't strong.
I'm not even saying you are per se wrong - it does seem reasonable that media that glorifies lawlessness might increase lawlessness. But if it does, it clearly only does so in a small population which also share a lot of other factors (like poverty, for example). Given that most humans enjoy hip hop without negative consequences, focusing on it as a potential intervention seems off base. A ban on hip-hop would be very unlikely to reduce crime, but a decrease in poverty would probably do so (accepting that we can't really figure out how to do that). A focus on hip hop is extremely flaccid.
This "crime is rare" argument is a fully general way to argue that almost nothing causes crime. Because crime is rare, by your logic, we know e.g. that gun ownership and poverty cannot be proximal causes, since gun ownership and poverty are both common.
Ultimately this line of reasoning is simply innumerate. Rare events can have common causes. For example, even if house fires are rare, they can still be caused by e.g. careless cigarette smokers, even if careless cigarette smokers are common.
>focusing on it as a potential intervention seems off base.
I never said it should be focused on as an intervention or it should be banned. All I said was likes_hiphop is "a variable that is likely to exert causal influence".
>A focus on hip hop is extremely flaccid.
Tell that to the guy who introduced "likes_hiphop" as a topic of discussion :-)
As much as I do think that good, parsimonious social science modeling _requires_ theoretical commitments, the test is whether TFA would say the same thing about political cause du jour - say, `is_white` in hiring in an organization that does outreach to minority communities.
Statistical analyses provide a reason to believe one hypothesis over another, but any scientist will extend that with an experimental approach.
Most of the examples given in this blog post refer to medical, sociological or behavioral studies, where properly controlled experiments are hard to perform, and as such are frequently under-powered to reveal true cause-effect associations.
While the methods alone cannot fix it all ("You can’t fix by analysis what you bungled by design" [1] after all), it gets somewhat closer to unbiased results.
[1]: https://www.degruyterbrill.com/document/doi/10.4159/97806740...
For dark-mode, we rely on https://invertornot.com/ to dynamically decide whether to fade or negate/invert. (Background: https://gwern.net/invertornot ) The service uses a small NN and is not always correct, as in these cases. Sorry.
I have filed them as errors with InvertOrNot, and will manually set them to invert.
https://gwern.net/dropcap
That said, holistic supposition can certainly be traced back as far as writting dawns. Here the focus on more modern/contemporary era is legitimate to keep the focus delimited on a more specific concern, but is a bit obfuscating this fact. Maybe it's already acknowledged in the document, I read it all yet.
At the same time, as I've been forced to wrestle with it more in my work, I've increasingly felt that it's sort of empty and unhelpful. "Crud" does happen in patterns, like a kind of statistical cosmic background radiation — it's not meaningless. Sometimes it's important to understand it, and treating it as such gets no one anywhere. Sometimes the associations are difficult to explain easily when you try to pick it apart, and other times I think they're key to understanding uncontrolled confounds that should be controlled for.
As much as this background association is present too, it's not always there. Sometimes things do have zero association.
Also, trying to come up with a "meaningful" effect size that's not zero is pretty arbitrary and subjective.
There's probably more productive ways of framing the phenomenon.
Any time we wade into solipsism, however, I think it's important to remember that statistical analysis is a tool, not an arbiter of Truth. We are trying to exist in the mess of a world we live in, and we're going to be using every possible advanced tool we have in our arsenal to do that, and the standard model of science is probably the best tool we have. At the same time, we should often return to that solipsism an remember that where we can improve the model, we should.
I read lot of papers that painstakingly show a correlation in the data, but then their theory about the correlation is a complete non sequitur.
Wait. Sir Arthur Conan Doyle lived at basically the exact same time as this Karl Pearson.
Is that why the Sherlock Holmes stories had handwriting analysis so frequently? Was there just pop science going around at the time that like, let's find correlations between anything and anything, and we can see that a criminal mastermind like Moriarty would certainly cross their T's this way and not that way?
Also correlation is often taken to mean linear correlation. For example, you can have two nonlinearly varying dataset to be perfectly (rank) correlated while their linear correlation can be close to zero.
People often attach undue meaning to correlation.
I noticed when doing a scatter plot of two variables and noticed that there were several "lines" of dots.
This generally implies that subsets of the two variables may have correlations or there is a third variable to be added.
I did some additional research and it is possible for two variables with large N to show correlation for short bursts even if both variables are random.
I mention for two reasons:
1. I was just doing the above and saw the OP article today
2. Despite taking multiple college level stats classes, I don't remember this ever being mentioned.