Good Examples of Causation does not Imply Correlation

  • Context: Graduate 
  • Thread starter Thread starter WWGD
  • Start date Start date
  • Tags Tags
    Correlation
Click For Summary

Discussion Overview

The discussion revolves around the concept that causation does not imply correlation, particularly focusing on examples where a causal relationship exists but the correlation coefficient is zero or near zero. Participants explore various mathematical relationships and real-world scenarios to illustrate this point, including quadratic functions and other non-linear relationships.

Discussion Character

  • Exploratory
  • Debate/contested
  • Mathematical reasoning
  • Conceptual clarification

Main Points Raised

  • Some participants suggest that non-linear relationships, such as those described by Hooke's law or power dissipation in resistors, can illustrate cases where causation exists but correlation is zero.
  • Others question the correlation of specific pairs, such as (x, x^2), and discuss the implications of signed versus unsigned outputs in correlation analysis.
  • Some participants propose examples like the correlation between day of the year and temperature, noting that extreme days can yield low correlation despite a causal relationship.
  • There is mention of missing variables impacting the observed correlation, with some suggesting that lurking variables could explain the lack of correlation in certain cases.
  • Participants express discomfort with the term "correlation" being interpreted strictly as linear correlation, suggesting a broader interpretation may be necessary.
  • Some argue that causation should ultimately lead to some form of correlation if measured properly, while others maintain that zero correlation can indicate no predictive power between variables.
  • Discussion includes a reference to Anscombe's quartet as a potential teaching example, highlighting the importance of context in interpreting correlation.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the interpretation of correlation or the examples provided. Multiple competing views remain regarding the nature of correlation in the context of causation, and the discussion reflects a variety of opinions on the implications of different mathematical relationships.

Contextual Notes

Some participants note the difficulty in constructing examples that clearly illustrate the concepts discussed, and there is acknowledgment of the limitations of correlation coefficients in capturing complex relationships.

Who May Find This Useful

This discussion may be useful for students or educators in statistics, mathematics, or related fields who are exploring the nuances of correlation and causation, as well as those interested in real-world applications of these concepts.

WWGD
Science Advisor
Homework Helper
Messages
7,804
Reaction score
13,107
Ok, so if the causality relation between A,B is not linear, then it will go unnoticed by correlation, i.e., we may have A causing B but Corr(A, B)=0. I am trying to find good examples to illustrate this but not coming up with much. I can think of Hooke's law, where data pairs (x, kx^2) would have zero correlation. Is this an " effective" way of illustrating the point that causation does not imply ( nonzero) correlation? Any other examples?
 
Physics news on Phys.org
If you apply a voltage across a resistor it causes power to dissipate in the resistor. The power is quadratic in the voltage so the linear correlation coefficient is zero.
 
  • Like
  • Love
Likes   Reactions: Abhishek11235, etotheipi and WWGD
Why would (x,x^2) not have a high correlation for positive x?
 
  • Like
Likes   Reactions: Delta2
Something like (x,xsin(x)) would have little correlation
 
BWV said:
Why would (x,x^2) not have a high correlation for positive x?
I haven't double-checked the actual values of the correlation ( difficult to do on the phone) but because points in a parabola do not closely resemble/fit points in a line.
 
BWV said:
Something like (x,xsin(x)) would have little correlation
Thanks. Can you find a causal relation des cribed by such pairs?
 
Correlation(x,x^2)~0.97 for x=1:100
 
BWV said:
Why would (x,x^2) not have a high correlation for positive x?
I wasn’t limiting it to positive x. The correlation is 0 for a balanced positive and negative sample
 
I guess the fact that it's quadratic isn't interesting here, (x,|x|) would have similarly small correlation. Basically anytime you have a signed input, and an unsigned output whose magnitude depends on the magnitude of the input.

The correlation of the charge on an ion and the angle of curvature when it passes through a magnetic field? Actually constructing these examples is annoying.

What about something like the correlation between day of the year and temperature. Days 1 and 365 are both cold (at least in the northern hemisphere), the middle days are warm, so correlation is zero.
 
  • #10
Office_Shredder said:
I guess the fact that it's quadratic isn't interesting here, (x,|x|) would have similarly small correlation. Basically anytime you have a signed input, and an unsigned output whose magnitude depends on the magnitude of the input.

The correlation of the charge on an ion and the angle of curvature when it passes through a magnetic field? Actually constructing these examples is annoying.

What about something like the correlation between day of the year and temperature. Days 1 and 365 are both cold (at least in the northern hemisphere), the middle days are warm, so correlation is zero.
Thanks, but it is not just any dataset, or, like you said, it is relatively-straightforward. I am looking for one describing a causal relation.
 
  • #11
Office_Shredder said:
I guess the fact that it's quadratic isn't interesting here, (x,|x|) would have similarly small correlation. Basically anytime you have a signed input, and an unsigned output whose magnitude depends on the magnitude of the input.

The correlation of the charge on an ion and the angle of curvature when it passes through a magnetic field? Actually constructing these examples is annoying.

What about something like the correlation between day of the year and temperature. Days 1 and 365 are both cold (at least in the northern hemisphere), the middle days are warm, so correlation is zero.
Oops! Realized I forgot to shift the ## y=kx^2 ## to avoid symmetry. Consider, e.g., ## y=k(x-1)^2##. That should do it.
 
  • #12
Using the word correlation to imply linear correlation is a little uncomfortable to me when used in the phrase, "Causation does not Imply Correlation". I always interpret "correlation" as general correlation in the converse.
 
Last edited:
  • Like
Likes   Reactions: FactChecker
  • #13
I think the examples given here all have zero general correlation.
 
  • #14
ultimately if measured properly, causation should result in linear correlation, some adjustment of variables will result in linear correlation in the examples above. In the quadratic example centered at the origin, for instance, a simple look at the data will reveal the relationship and all one has to do is take the absolute value of the input.
 
  • #15
Office_Shredder said:
I think the examples given here all have zero general correlation.
I think zero correlation means knowing the value of one would give you absolutely no information that is useful to predict the value of the other.
 
  • #16
For context, I may be teaching a small online class that includes this general area and was looking for examples that are " natural". I am thinking too of including Anscombe's quartet somehow. More interesting to me, but beyond the scope, is having different RVs with the same distribution: like the RVs counting heads or tails in a binomial with p=0.5.
 
  • #17
The other situation is a missing variable, where A impacts B, but does not show up statistically because the impact of C is not accounted for
 
  • #18
BWV said:
The other situation is a missing variable, where A impacts B, but does not show up statistically because the impact of C is not accounted for
I'm not sure, but encryption might be a good example.
 
  • #19
BWV said:
The other situation is a missing variable, where A impacts B, but does not show up statistically because the impact of C is not accounted for
You mean lurking variables?
 
  • #20
Jarvis323 said:
Using the word correlation to imply linear correlation is a little uncomfortable to me when used in the phrase, "Causation does not Imply Correlation". I always interpret "correlation" as general correlation in the converse.
Since this thread is in the statistics section I assumed that standard statistical correlation was implied, but you do make a good point. That isn’t the only meaning to the term.
 
  • Like
Likes   Reactions: Klystron
  • #21
Dale said:
Since this thread is in the statistics section I assumed that standard statistical correlation was implied, but you do make a good point. That isn’t the only meaning to the term.
I assume everything outside of General Discussion to be interpreted technically. " Big Picture" questions, no less important/interesting than the latter, I assume belong in GD.
 
  • #22
Dale said:
Since this thread is in the statistics section I assumed that standard statistical correlation was implied, but you do make a good point. That isn’t the only meaning to the term.
I assume everything outside of General Discussion to be interpreted technically. " Big Picture" questions, no less important/interesting than the latter, I assume belong in GD. Edit: Unless explicitly stated otherwise. The linked content below makes me think this is the way PF is organized.
 
  • #23
Causation implies you can make a prediction about the value is basically a tautology, and doesn't really help much. How do I figure out if there exists an arbitrarily shaped function which results in at least ##\epsilon## predictive power?

One method of testing this is by measuring the correlation. If it exists, then a predictive function exists (even if the relationship is not casual). The fact that correlation can be zero and you can still have perfect predictive power is an interesting result in my opinion.
 
  • #24
While we're talking about correlation. Anyone know if we can consider Spearman Rho for more than 2 datasets? Edit: I know we can use Kruskal -Wallis one-way Anova for simiular but just curious as to Spearman Rho.
 
  • #25
The spearman coefficient is the Pearson coefficient of pairs of integers describing the relative values of the data sets, so if there's a Pearson coefficient there is a spearman one.

I don't know of anything specific, but as far as the Pearson coefficient measures how good a line fits the data, you can certainly measure e.g. the variance explained by the first pca factor and take the square root of that. I think that won't give pearson's coefficient since the line returned by pca on two variables is not the best fit line, but I might be wrong about that.
 
  • #26
I think mutual information might be one of the purest and most relevant measures of correlation? I guess some measures of correlation are used so frequently (e.g. in linear statistics), that it's become common to use correlation as short for whichever measure a group of people are used to working with.
 
  • #27
Office_Shredder said:
One method of testing this is by measuring the correlation. If it exists, then a predictive function exists (even if the relationship is not casual).
One issue is that correlation is a statistical concept. If you have a stationary process with a finite set of states, then you can measure correlation, e.g. with mutual information. If you don't, then you can still have causality, but might not be able to use statistics at all.
Office_Shredder said:
The fact that correlation can be zero and you can still have perfect predictive power is an interesting result in my opinion.
I am skeptical about this.
 
  • #28
Jarvis323 said:
I think mutual information might be one of the purest and most relevant measures of correlation?
I think that is a pretty large abuse of terminology. Do you have any scientific reference that supports that claim?
 
  • #29
Dale said:
I think that is a pretty large abuse of terminology. Do you have any scientific reference that supports that claim?
To the contrary, using the term "correlation" as short for a specific type of linear statistical relationship is an abuse of terminology, although a convenient one if you are primarily using linear statistics. Correlation technically means any statistical relationship. Mutual information is a good measure here because it is one of the purest measures of statistical association. If there is any statistical relationship, then there will be mutual information.

In the context of the saying, it's also a good measure, because a statistical association doesn't imply causality, no matter if you're talking about correlation in the purest sense, or linear correlation. If you want to discuss the converse (does causality imply correlation?), I think it would be misleading and less interesting to use a narrow/restricted measure of correlation. Then again, due to the confusion with the word "correlation" becoming used so imprecisely in certain fields, it might be better just to ask if causality implies statistical association. Then it comes down to whether the process is stationary, or is the setting restricted properly so that the application of statistics is meaningful and core statistical assumptions can be made.

Likewise, in the context of all of the recent questions about causality and correlation, one should assume the broad definition of correlation (any statistical relationship) otherwise the questions are trivial, somewhat arbitrarily restrictive, and uninteresting.

Here is an early paper that you might be helpful.

http://www.economics.soton.ac.uk/staff/aldrich/spurious.PDF
 
Last edited:
  • #30
Jarvis323 said:
To the contrary, using the term "correlation" as short for a specific type of linear statistical relationship is an abuse of terminology, although a convenient one if you are primarily using linear statistics. Correlation technically means any statistical relationship. Mutual information is a good measure here because it is one of the purest measures of statistical association. If there is any statistical relationship, then there will be mutual information.

In the context of the saying, it's also a good measure, because a statistical association doesn't imply causality, no matter if you're talking about correlation in the purest sense, or linear correlation. If you want to discuss the converse (does causality imply correlation?), I think it would be misleading and less interesting to use a narrow/restricted measure of correlation. Then again, due to the confusion with the word "correlation" becoming used so imprecisely in certain fields, it might be better just to ask if causality implies statistical association. Then it comes down to whether the process is stationary, or is the setting restricted properly so that the application of statistics is meaningful and core statistical assumptions can be made.

Likewise, in the context of all of the recent questions about causality and correlation, one should assume the broad definition of correlation (any statistical relationship) otherwise the questions are trivial, somewhat arbitrarily restrictive, and uninteresting.
Well, " Any Statistical Relation" is hopelessly vague. Just what does that mean and how is it measured? And I don't see why it is uninteresting ( obviously it interests me, since I asked the question), because the definition of correlation : Spearman and Rho that I am aware of, entail simultaneous change of two variabled so that it seems unintuitive to have causation without simultaneous change.
 

Similar threads

  • · Replies 45 ·
2
Replies
45
Views
4K
  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 21 ·
Replies
21
Views
4K
  • · Replies 178 ·
6
Replies
178
Views
10K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 54 ·
2
Replies
54
Views
6K
  • · Replies 17 ·
Replies
17
Views
3K