Correlation and the Probability of Causation

In summary, when considering the probability of causation in a strictly controlled experiment with two outcomes, it is necessary to define the context in which the probabilities are being compared. While humans may assume P1 to be significantly higher than P2, this is based on our limited understanding and experience, and does not necessarily reflect the true probabilities. It is important to consider the underlying models and simplifications used in the analysis and be aware of the potential for incomplete information and uncertainty in making causal inferences.
  • #1
koenigcochran
15
0
I've always been told "correlation does not imply causation." However, I've never been told much about whether it can imply a probability of causation. Moreover, there seem to be competing and often misused definitions of "to cause", i.e., use in a syllogistic sense versus use in probabilistic sense. Please consider the following:

Imagine we conduct a strictly controlled experiment only once, and it has one of two outcomes:

(Outcome 1) Y is strongly correlated to X.
(Outcome 2) Y is not correlated to X at all.

Suppose the single experiment has outcome (1), and we call the probability that X causes Y, P1. Now let's go back in time, suppose it instead has outcome (2), and call the probability that X causes Y, P2.

Is P1 > P2? Why?




I realize this raises lots of questions: what do I mean by "cause"? What do we know about the experiment? What do we mean by strictly controlled?

For the sake of my curiosity, I invite you to provide your own assumptions in answer to these questions. I apologize for the vagaries, but it's the vagaries of this question that have me scratching my head! Please, feel free to point out what must be clarified, and some possible clarifications in response (point out the blanks and fill them in).

Thanks so much!
 
Physics news on Phys.org
  • #2
Hey koenigcochran and welcome to the forums.

I think the best way to think about this is through two ways: the idea that you have incomplete information and the other idea (that builds on this), is that there is a causal interaction chain between elements involving your incomplete information scenario (i.e. information you do not have).

Because of this, we often go to the pessimistic viewpoint that correlation doesn't necessarily imply causation, simply because we have to accept that your data is incomplete and that because of this, also accept the possibility that we have causal mechanisms that independent in one form or another, in relation to the observed data.

The idea of incomplete information assumptions as a worst case one is important, but it doesn't mean that you can still can't make conjectures based on a good line of thinking.

The thing you can't do is to say for sure that two things are causally linked which means that you can't just jump to conclusions automatically just because you see a obvious pattern and a 0.9+ coeffecient. At the same time though, there is no reason why you can't offer your own insights as to why something is the way it is.

Be aware that all of science is under uncertainty: we have gone from the Newtonian paradigm to the probabilistic/statistical paradigm, and as a result have to do many forms of inference under a high degree of uncertainty.
 
  • #3
The other thing also is the nature of your information.

For example most information we deal with is more or less a simplification of something more complex. Think of it as a projection from a huge higher dimensional space to something much lower.

As a result of this simplification of data, we are going to miss things and make bad inferences, especially if we forget what the simplification is and what it relates to in terms of the context of the un-projected, complete data as a whole.

So if you have this circumstance where you think you have the complete amount of data, but the data itself is a vast simplification that hides many of the internal mechanisms that contribute to the final 'simplified' numeric quantities you use in the analysis, just remember that simplifications, what is measured, and how they relate to each other and the context of the experiment makes a big impact on causality and also in analyzing correlation.

In fact it is a good idea to mention those things in a report so that other people can be aware of your analyses to draw their own conclusions whether favorable or not, in a constructive manner.
 
  • #4
Hi koenigcochran!

koenigcochran said:
I've always been told "correlation does not imply causation." However, I've never been told much about whether it can imply a probability of causation. Moreover, there seem to be competing and often misused definitions of "to cause", i.e., use in a syllogistic sense versus use in probabilistic sense. Please consider the following:

Imagine we conduct a strictly controlled experiment only once, and it has one of two outcomes:

(Outcome 1) Y is strongly correlated to X.
(Outcome 2) Y is not correlated to X at all.

Suppose the single experiment has outcome (1), and we call the probability that X causes Y, P1. Now let's go back in time, suppose it instead has outcome (2), and call the probability that X causes Y, P2.

Is P1 > P2? Why?

This is a cool question, OK, my two cents on this one:

There are an infinite number of experiments in which X is the cause of Y in (Outcome 1) and another infinite number of experiments in which X is not the cause of Y. And exactly the same thing happens in (Outcome 2).

So, to calculate (and compare) P1 and P2 would only make sense within a well define set of scenarios.

There is no way whatsoever to prove (or, interestingly, disprove) causality by just looking at the data, you need to understand the underlying model that causes X and Y to do so, and then the causality is as true as your model is.

We humans have the feeling that P1>P2 because our "well define sets of scenarios" is not the infinite mathematical possible cases but our daily experience which is full of underlying models that we build throughout our lives to have better chances to survive.

So if we see event Y happening right after event X (whether correlated or not) our collection of human underlying models will assume as the more likely scenario X to be the cause of Y. And so we humans will do with the example you pose by considering P1, not higher, but much higher than P2 and, in this context, we are right, P1>>P2.

In short, once you define the context (whether mathematical, physical, human...) you can talk about how causality relates to correlation.
 
Last edited:
  • #5
koenigcochran, you have more-or-less described Bayesian reasoning. Yes, if you have an a priori probability of an hypothesis, and can calculate the odds of an observation on the basis that the hypothesis is true, and again on the basis that it is false, then you can adjust your probability of the hypothesis in light of the data.
This is one reason (the main reason?) that a feasible mechanism for cause contributes greatly to one's confidence in it. Some would say that without such the a priori probability is zero, so can never rise above that. OTOH, we should always allow the possibility that we just haven't thought of the mechanism.
Btw, as indicated by earlier posts, need to distinguish between cause and causal connection. There may be a common cause.
 

What is correlation?

Correlation refers to the relationship between two variables. It measures how closely related two variables are and can range from a strong positive correlation (when both variables increase or decrease together) to a strong negative correlation (when one variable increases while the other decreases).

How is correlation calculated?

Correlation is typically calculated using a statistical measure called the correlation coefficient. This coefficient ranges from -1 to +1, with 0 indicating no correlation and values closer to -1 or +1 indicating a stronger correlation.

What does correlation tell us about causation?

Correlation does not necessarily imply causation. Just because two variables are strongly correlated does not mean that one directly causes the other. Other factors, known as confounding variables, may be influencing the relationship between the two variables.

What is the difference between correlation and causation?

Correlation refers to a relationship between two variables, while causation refers to the direct influence of one variable on another. Correlation only shows that two variables are related, while causation shows that one variable directly affects the other.

How can we determine causation from correlation?

Determining causation from correlation requires further research and analysis. Other factors and variables need to be considered to rule out potential confounding variables and establish a direct cause-and-effect relationship between the two variables.

Similar threads

  • Set Theory, Logic, Probability, Statistics
2
Replies
58
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
502
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
43
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
895
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
Back
Top