# B Can this be called 'a coincidence'?

Tags:
1. Dec 16, 2018

### entropy1

Suppose we have two variables A and B. A has a truly random distribution over {0,1} with P(0)=P(1)=0.5 . B has the same distribution.

Now suppose that A and B always show both a 1 or both a 0. This would be a strong correlation between A and B.

Now could this be called 'a coincidence'? And if not, in what way does it differ from a coincidence?

2. Dec 16, 2018

### Math_QED

How do you define "coincidence"?

3. Dec 16, 2018

### RPinPA

In probability terms, the observation would be that A and B are correlated. They are not independent. The word "coincidence" is not used.

In colloquial terms, I guess a coincidence is an unlikely event that occurs. So if A and B were supposed to be independent, but in 100 trials they always showed the same number, you could state that an event with probability 0.5^100 has occurred, which is possible though unlikely. You still wouldn't use the term "coincidence" which has no mathematical meaning.

A Bayesian would say "based on experiment clearly A and B are not independent" and recalculate the joint probability distribution P(A, B) to reflect the observation.

4. Dec 16, 2018

### FactChecker

The way to judge if it is a coincidence is to assume that they really are completely unrelated and calculate how strange the luck would be to get the results you got. The assumption that they are unrelated is called the "null hypothesis". This is a typical example of the most common application of probability. There are some standard levels of probability that are typically used before one would conclude (with the selected probability) that the null hypothesis is wrong. They are .05, 0.025, 0.01. They are called "confidence levels" of 95%, 97.5%, and 99%, respectively. Sometimes an extremely strict level is used. For example, before a nuclear physicist can claim that he has found a new particle, he must show that the probability of having his results would be less than 1 in 3,500,000 chance (probability=0.0000002857) if it was not a new particle. This is called a 5-sigmas criteria. 5-sigmas corresponds to around a 1 in 3,500,000 chance.

In your example, you say that A and B always show the same result. At the 0.05 probability level, it would require 5 experiments, all with the same results, to conclude with 95% confidence that they are related. Here are some other probability levels and the number of identical results that would be required:
0.05 5
0.025 6
0.01 7
1/3,500,000 = 0.0000002857 20

Last edited: Dec 17, 2018
5. Dec 17, 2018

### entropy1

I guess something like: that there is no relation between the outcomes of A and B although it may seem there is.

6. Dec 17, 2018

### Math_QED

What comes to mind when you say that:

- Correlation between random variables
- Independence between random variables

You might want to google these two terms, and decide what you are looking after :)

7. Dec 18, 2018

### entropy1

So I guess, since in the example I gave, A and B always yield identical values as outcome, that $P(B|A) = 1, P(B|\bar{A}) = 0$, which means the probability distribution of B is affected by A's outcome.

In this way, dependent variables should show a different correlation than independent variables. So does this mean that there must be a cause causing the difference? If not, what is the reason for the difference to occur?

If nothing is causing a certain correlation, can one speak of dependence at all, or should one rather speak of coincidence?

Last edited: Dec 18, 2018
8. Dec 18, 2018

### FactChecker

There does not have to be a cause that you care about. For instance, the population of both the U.S. and England increases. So they are correlated just because both have the same trend in time. That is usually not considered a "cause" of the correlation.
No. Even in the example of the populations of the U.S. and England, if I told you the population of one without telling you the year, that would help you to estimate the population of the other in that same year. I would not call that either a "cause" or a "pure coincidence" -- they are just correlated. I would reserve the term "pure coincidence" for things that are completely due to random luck. People generally use the term "pure coincidence" as just luck and the term "coincidence" as in "That is quite a coincidence" to indicate that they are suspicious of a cause-effect relationship.

9. Dec 18, 2018

### Buzz Bloom

Hi entropy:

Wikipedia gives the following definition.
https://en.wikipedia.org/wiki/Coincidence
A coincidence is a remarkable concurrence of events or circumstances that have no apparent causal connection with one another.​
An important word in the quote from your post is "always".

Now "always" suggests an infinite number of repeated events. Since it is impractical to have an infinite number of events, I have to assume that you intended "always" to imply just an extremely large number of events, say one billion for example. If you observed one billion consecutive events in which A and B produced the same value, I would have to assume that this alone implies that there is some currently unknown casual connection, and that therefore this would not be a coincidence. If the number of events were a small number, then one could calculated that there is a "reasonable" probability that the sequence is coincidental.

Regards,
Buzz

10. Dec 18, 2018

### FactChecker

I agree. And this entire subject should be discussed in terms of probability and confidence levels, not in vaguely defined terms like "coincidental".

11. Dec 18, 2018

### entropy1

By "coincidental" I mean correlations without any relevant underlying cause, so luck, chance. Subsequently I wonder if the correlation shown by dependent random variables can be ascribed to luck. So could one speak of chance (coincidence) in the case of dependence? What then would cause the difference in correlation between dependent and independent variables, and the difference between one set of dependent variables and another?

Last edited: Dec 18, 2018
12. Dec 18, 2018

### FactChecker

There can be a mixture of dependence and uncorrelated. Suppose one variable, X, is uniformly distributed on the set {0,1} and another variable, Y, completely independent of X and uniformly distributed on the set {-1,1}. Then a third variable, Z = X*Y, would be dependent on X and on Y but uncorrelated with X and correlated with Y.

13. Dec 19, 2018

### entropy1

What I mean is, if we have
1. $P(B|A)=0.8$ and $P(B|\bar{A})=0.2$, or:
2. $P(B|A)=0.6$ and $P(B|\bar{A})=0.4$,
then there must be a cause causing at least the difference between (1) and (2)? If not, then (1) and (2) are both due to chance and then I tend to call the dependence/correlation a coincidence. I even think it couldn't even be called a dependence, for there is nothing causing the dependence/correlation then.

Last edited: Dec 19, 2018
14. Dec 19, 2018

### FactChecker

You are better off using the traditional terms of probability so that everyone understands in great detail what it means and doesn't mean. In probability, saying that A and B are "dependent" only means that the probabilities of B change when we know A or not A. It does not mean that A causes B. It just means that the probability changes. The term "coincidence" does not have a well-defined meaning in probability.

Suppose you were throwing a uniformly distributed dart at the picture below. The events A and B have no "cause-effect" relationship. They may, or may not be probabilistically dependent, depending only on the probabilities.

Last edited: Dec 19, 2018
15. Dec 19, 2018

### RPinPA

I'm not sure what your (1) and (2) signify. I'm also not sure what you mean by probability.

That says to me that the fixed probability that B occurs, out of cases where A occurs, is 0.8. And that in cases where A does not occur, B occurs 0.2 of the time. These are constants of the population. By the law of large numbers, the larger the number of cases I test, the closer the fraction of occurrences of B will be to these two possibilities.
That's a completely different set of events with different probabilities. The cause of the difference between (1) and (2) is that you're clearly talking about different random variables in the two cases. If you meant those to describe the same events A and B, then I can't imagine what you mean by the probability changing. If I take a million outcomes where A happened, do you think I'm going to get closer to 0.8 or 0.6 of those where B also happened? It can't be both.

16. Dec 19, 2018

### FactChecker

I interpret them as just two different examples where the probability of B changes depending on whether or not A occurs. Therefore, A and B are dependent in both cases.

17. Dec 19, 2018

### FactChecker

To start with, independent variables can not be correlated because the probabilities do not depend on each other. The converse is not true. One can rig up examples where two random variables are dependent (the value of one changes the probable results of the other) but the correlation between them is rigged to be zero. My example in post #12 is like that.

18. Dec 19, 2018

### RPinPA

Part of the difficulty I'm having is that I'm not sure of OP's definition of probability. Let's talk about this sentence in the original question.
Does that mean you observed this in a small number of trials? Or do you mean that there is ZERO PROBABILITY that A and B are different? Those are quite different statements. One is about a sample, the other is about the entire population, the entire set of possible outcomes.

If we're really talking about what most of us mean by probability, then I'd say there has to be some underlying reason for dependence. And I think what OP is asking is, "is it possible for two random variables to be dependent without some kind of chain of causality connecting them?" Not necessarily A causing B or B causing A. So I believe my answer is "no". There is some underlying reason, though it may be inaccessible to us.

You definitely don't want to interpret A|B as B causing A. For instance, students are often given problems like this: Urn #1 contains 10 red balls and 1 blue ball. Urn #2 contains 2 red balls and 8 blue balls. If I draw a ball from a random urn and it is red, what is the probability it was urn #2?

You should find that it's a lot more likely that a red ball came from urn #1. P(Urn #2 | Red ball) < 0.5. That doesn't mean that a red ball caused you to pick urn #1.

19. Dec 19, 2018

### entropy1

I thought correlation was measured by $P(a_n=b_n)$. I think this is not correct after all. I read of the Pearson correlation. Which is the real measure for binary correlation?

20. Dec 19, 2018

### FactChecker

The official definition of the correlation is the expected value: $E( \frac{(X-\bar X ) *(Y-\bar Y)}{(\sigma_X \sigma_Y)})$, where $\bar X$ and $\bar Y$ are the means of $X$ and $Y$, respectively and $\sigma_X$ and $\sigma_Y$ are the standard deviations.
If it is positive, that tells you that $X$ being over its mean tends to indicate that $Y$ is also over its mean and conversely, $X$ being below its mean tends to indicate that $Y$ is also below its mean. The equation treats $X$ and $Y$ similarly, so the reverse tendencies ($Y$ above/below mean tends to indicate that $X$ above/below mean).
Corresponding things can be said if the correlation is negative, except $X$ above mean tends to imply $Y$ below mean. And other similar statements.

Last edited: Dec 19, 2018