Good Examples of Causation does not Imply Correlation

WWGD · Nov 10, 2020

Ok, so if the causality relation between A,B is not linear, then it will go unnoticed by correlation, i.e., we may have A causing B but Corr(A, B)=0. I am trying to find good examples to illustrate this but not coming up with much. I can think of Hooke's law, where data pairs (x, kx^2) would have zero correlation. Is this an " effective" way of illustrating the point that causation does not imply ( nonzero) correlation? Any other examples?

Dale · Nov 10, 2020

If you apply a voltage across a resistor it causes power to dissipate in the resistor. The power is quadratic in the voltage so the linear correlation coefficient is zero.

BWV · Nov 10, 2020

Why would (x,x^2) not have a high correlation for positive x?

BWV · Nov 10, 2020

Something like (x,xsin(x)) would have little correlation

WWGD · Nov 10, 2020

BWV said:

Why would (x,x^2) not have a high correlation for positive x?

I haven't double-checked the actual values of the correlation ( difficult to do on the phone) but because points in a parabola do not closely resemble/fit points in a line.

WWGD · Nov 10, 2020

BWV said:

Something like (x,xsin(x)) would have little correlation

Thanks. Can you find a causal relation des cribed by such pairs?

BWV · Nov 10, 2020

Correlation(x,x^2)~0.97 for x=1:100

Dale · Nov 10, 2020

BWV said:

Why would (x,x^2) not have a high correlation for positive x?

I wasn’t limiting it to positive x. The correlation is 0 for a balanced positive and negative sample

Office_Shredder · Nov 10, 2020

I guess the fact that it's quadratic isn't interesting here, (x,|x|) would have similarly small correlation. Basically anytime you have a signed input, and an unsigned output whose magnitude depends on the magnitude of the input.

The correlation of the charge on an ion and the angle of curvature when it passes through a magnetic field? Actually constructing these examples is annoying.

What about something like the correlation between day of the year and temperature. Days 1 and 365 are both cold (at least in the northern hemisphere), the middle days are warm, so correlation is zero.

WWGD · Nov 10, 2020

Office_Shredder said:

I guess the fact that it's quadratic isn't interesting here, (x,|x|) would have similarly small correlation. Basically anytime you have a signed input, and an unsigned output whose magnitude depends on the magnitude of the input.

The correlation of the charge on an ion and the angle of curvature when it passes through a magnetic field? Actually constructing these examples is annoying.

What about something like the correlation between day of the year and temperature. Days 1 and 365 are both cold (at least in the northern hemisphere), the middle days are warm, so correlation is zero.

Thanks, but it is not just any dataset, or, like you said, it is relatively-straightforward. I am looking for one describing a causal relation.

WWGD · Nov 10, 2020

Office_Shredder said:

I guess the fact that it's quadratic isn't interesting here, (x,|x|) would have similarly small correlation. Basically anytime you have a signed input, and an unsigned output whose magnitude depends on the magnitude of the input.

The correlation of the charge on an ion and the angle of curvature when it passes through a magnetic field? Actually constructing these examples is annoying.

What about something like the correlation between day of the year and temperature. Days 1 and 365 are both cold (at least in the northern hemisphere), the middle days are warm, so correlation is zero.

Oops! Realized I forgot to shift the ## y=kx^2 ## to avoid symmetry. Consider, e.g., ## y=k(x-1)^2##. That should do it.

Jarvis323 · Nov 11, 2020

Using the word correlation to imply linear correlation is a little uncomfortable to me when used in the phrase, "Causation does not Imply Correlation". I always interpret "correlation" as general correlation in the converse.

Office_Shredder · Nov 11, 2020

I think the examples given here all have zero general correlation.

BWV · Nov 11, 2020

ultimately if measured properly, causation should result in linear correlation, some adjustment of variables will result in linear correlation in the examples above. In the quadratic example centered at the origin, for instance, a simple look at the data will reveal the relationship and all one has to do is take the absolute value of the input.

Jarvis323 · Nov 11, 2020

Office_Shredder said:

I think the examples given here all have zero general correlation.

I think zero correlation means knowing the value of one would give you absolutely no information that is useful to predict the value of the other.

WWGD · Nov 11, 2020

For context, I may be teaching a small online class that includes this general area and was looking for examples that are " natural". I am thinking too of including Anscombe's quartet somehow. More interesting to me, but beyond the scope, is having different RVs with the same distribution: like the RVs counting heads or tails in a binomial with p=0.5.

BWV · Nov 11, 2020

The other situation is a missing variable, where A impacts B, but does not show up statistically because the impact of C is not accounted for

Jarvis323 · Nov 11, 2020

BWV said:

The other situation is a missing variable, where A impacts B, but does not show up statistically because the impact of C is not accounted for

I'm not sure, but encryption might be a good example.

WWGD · Nov 11, 2020

BWV said:

The other situation is a missing variable, where A impacts B, but does not show up statistically because the impact of C is not accounted for

You mean lurking variables?

Dale · Nov 11, 2020

Jarvis323 said:

Using the word correlation to imply linear correlation is a little uncomfortable to me when used in the phrase, "Causation does not Imply Correlation". I always interpret "correlation" as general correlation in the converse.

Since this thread is in the statistics section I assumed that standard statistical correlation was implied, but you do make a good point. That isn’t the only meaning to the term.

WWGD · Nov 11, 2020

Dale said:

Since this thread is in the statistics section I assumed that standard statistical correlation was implied, but you do make a good point. That isn’t the only meaning to the term.

I assume everything outside of General Discussion to be interpreted technically. " Big Picture" questions, no less important/interesting than the latter, I assume belong in GD.

WWGD · Nov 11, 2020

Dale said:

Since this thread is in the statistics section I assumed that standard statistical correlation was implied, but you do make a good point. That isn’t the only meaning to the term.

I assume everything outside of General Discussion to be interpreted technically. " Big Picture" questions, no less important/interesting than the latter, I assume belong in GD. Edit: Unless explicitly stated otherwise. The linked content below makes me think this is the way PF is organized.

Office_Shredder · Nov 11, 2020

Causation implies you can make a prediction about the value is basically a tautology, and doesn't really help much. How do I figure out if there exists an arbitrarily shaped function which results in at least ##\epsilon## predictive power?

One method of testing this is by measuring the correlation. If it exists, then a predictive function exists (even if the relationship is not casual). The fact that correlation can be zero and you can still have perfect predictive power is an interesting result in my opinion.

WWGD · Nov 11, 2020

While we're talking about correlation. Anyone know if we can consider Spearman Rho for more than 2 datasets? Edit: I know we can use Kruskal -Wallis one-way Anova for simiular but just curious as to Spearman Rho.

Office_Shredder · Nov 11, 2020

The spearman coefficient is the Pearson coefficient of pairs of integers describing the relative values of the data sets, so if there's a Pearson coefficient there is a spearman one.

I don't know of anything specific, but as far as the Pearson coefficient measures how good a line fits the data, you can certainly measure e.g. the variance explained by the first pca factor and take the square root of that. I think that won't give pearson's coefficient since the line returned by pca on two variables is not the best fit line, but I might be wrong about that.

Jarvis323 · Nov 11, 2020

I think mutual information might be one of the purest and most relevant measures of correlation? I guess some measures of correlation are used so frequently (e.g. in linear statistics), that it's become common to use correlation as short for whichever measure a group of people are used to working with.

Jarvis323 · Nov 11, 2020

Office_Shredder said:

One method of testing this is by measuring the correlation. If it exists, then a predictive function exists (even if the relationship is not casual).

One issue is that correlation is a statistical concept. If you have a stationary process with a finite set of states, then you can measure correlation, e.g. with mutual information. If you don't, then you can still have causality, but might not be able to use statistics at all.

Office_Shredder said:

The fact that correlation can be zero and you can still have perfect predictive power is an interesting result in my opinion.

I am skeptical about this.

Dale · Nov 11, 2020

Jarvis323 said:

I think mutual information might be one of the purest and most relevant measures of correlation?

I think that is a pretty large abuse of terminology. Do you have any scientific reference that supports that claim?

Jarvis323 · Nov 11, 2020

Dale said:

I think that is a pretty large abuse of terminology. Do you have any scientific reference that supports that claim?

To the contrary, using the term "correlation" as short for a specific type of linear statistical relationship is an abuse of terminology, although a convenient one if you are primarily using linear statistics. Correlation technically means any statistical relationship. Mutual information is a good measure here because it is one of the purest measures of statistical association. If there is any statistical relationship, then there will be mutual information.

In the context of the saying, it's also a good measure, because a statistical association doesn't imply causality, no matter if you're talking about correlation in the purest sense, or linear correlation. If you want to discuss the converse (does causality imply correlation?), I think it would be misleading and less interesting to use a narrow/restricted measure of correlation. Then again, due to the confusion with the word "correlation" becoming used so imprecisely in certain fields, it might be better just to ask if causality implies statistical association. Then it comes down to whether the process is stationary, or is the setting restricted properly so that the application of statistics is meaningful and core statistical assumptions can be made.

Likewise, in the context of all of the recent questions about causality and correlation, one should assume the broad definition of correlation (any statistical relationship) otherwise the questions are trivial, somewhat arbitrarily restrictive, and uninteresting.

Here is an early paper that you might be helpful.

http://www.economics.soton.ac.uk/staff/aldrich/spurious.PDF

WWGD · Nov 11, 2020

Jarvis323 said:

To the contrary, using the term "correlation" as short for a specific type of linear statistical relationship is an abuse of terminology, although a convenient one if you are primarily using linear statistics. Correlation technically means any statistical relationship. Mutual information is a good measure here because it is one of the purest measures of statistical association. If there is any statistical relationship, then there will be mutual information.

In the context of the saying, it's also a good measure, because a statistical association doesn't imply causality, no matter if you're talking about correlation in the purest sense, or linear correlation. If you want to discuss the converse (does causality imply correlation?), I think it would be misleading and less interesting to use a narrow/restricted measure of correlation. Then again, due to the confusion with the word "correlation" becoming used so imprecisely in certain fields, it might be better just to ask if causality implies statistical association. Then it comes down to whether the process is stationary, or is the setting restricted properly so that the application of statistics is meaningful and core statistical assumptions can be made.

Likewise, in the context of all of the recent questions about causality and correlation, one should assume the broad definition of correlation (any statistical relationship) otherwise the questions are trivial, somewhat arbitrarily restrictive, and uninteresting.

Well, " Any Statistical Relation" is hopelessly vague. Just what does that mean and how is it measured? And I don't see why it is uninteresting ( obviously it interests me, since I asked the question), because the definition of correlation : Spearman and Rho that I am aware of, entail simultaneous change of two variabled so that it seems unintuitive to have causation without simultaneous change.

Good Examples of Causation does not Imply Correlation

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect