Can this be called 'a coincidence'?

entropy1 · Dec 16, 2018

Suppose we have two variables A and B. A has a truly random distribution over {0,1} with P(0)=P(1)=0.5 . B has the same distribution.

Now suppose that A and B always show both a 1 or both a 0. This would be a strong correlation between A and B.

Now could this be called 'a coincidence'? And if not, in what way does it differ from a coincidence?

member 587159 · Dec 16, 2018

entropy1 said:

Suppose we have two variables A and B. A has a truly random distribution over {0,1} with P(0)=P(1)=0.5 . B has the same distribution.

Now suppose that A and B always show both a 1 or both a 0. This would be a strong correlation between A and B.

Now could this be called 'a coincidence'? And if not, in what way does it differ from a coincidence?

How do you define "coincidence"?

RPinPA · Dec 16, 2018

In probability terms, the observation would be that A and B are correlated. They are not independent. The word "coincidence" is not used.

In colloquial terms, I guess a coincidence is an unlikely event that occurs. So if A and B were supposed to be independent, but in 100 trials they always showed the same number, you could state that an event with probability 0.5^100 has occurred, which is possible though unlikely. You still wouldn't use the term "coincidence" which has no mathematical meaning.

A Bayesian would say "based on experiment clearly A and B are not independent" and recalculate the joint probability distribution P(A, B) to reflect the observation.

FactChecker · Dec 16, 2018

The way to judge if it is a coincidence is to assume that they really are completely unrelated and calculate how strange the luck would be to get the results you got. The assumption that they are unrelated is called the "null hypothesis". This is a typical example of the most common application of probability. There are some standard levels of probability that are typically used before one would conclude (with the selected probability) that the null hypothesis is wrong. They are .05, 0.025, 0.01. They are called "confidence levels" of 95%, 97.5%, and 99%, respectively. Sometimes an extremely strict level is used. For example, before a nuclear physicist can claim that he has found a new particle, he must show that the probability of having his results would be less than 1 in 3,500,000 chance (probability=0.0000002857) if it was not a new particle. This is called a 5-sigmas criteria. 5-sigmas corresponds to around a 1 in 3,500,000 chance.

In your example, you say that A and B always show the same result. At the 0.05 probability level, it would require 5 experiments, all with the same results, to conclude with 95% confidence that they are related. Here are some other probability levels and the number of identical results that would be required:
0.05 5
0.025 6
0.01 7
1/3,500,000 = 0.0000002857 20

entropy1 · Dec 17, 2018

Math_QED said:

How do you define "coincidence"?

I guess something like: that there is no relation between the outcomes of A and B although it may seem there is.

member 587159 · Dec 17, 2018

entropy1 said:

I guess something like: that there is no relation between the outcomes of A and B although it may seem there is.

What comes to mind when you say that:

- Correlation between random variables
- Independence between random variables

You might want to google these two terms, and decide what you are looking after :)

entropy1 · Dec 18, 2018

Wikipedia said:

two random variables are independent if the realization of one does not affect the probability distribution of the other

link

So I guess, since in the example I gave, A and B always yield identical values as outcome, that ##P(B|A) = 1, P(B|\bar{A}) = 0##, which means the probability distribution of B is affected by A's outcome.

In this way, dependent variables should show a different correlation than independent variables. So does this mean that there must be a cause causing the difference? If not, what is the reason for the difference to occur?

If nothing is causing a certain correlation, can one speak of dependence at all, or should one rather speak of coincidence?

FactChecker · Dec 18, 2018

entropy1 said:

In this way, dependent variables should show a different correlation than independent variables. So does this mean that there must be a cause causing the difference? If not, what is the reason for the difference to occur?

There does not have to be a cause that you care about. For instance, the population of both the U.S. and England increases. So they are correlated just because both have the same trend in time. That is usually not considered a "cause" of the correlation.

If nothing is causing a certain correlation, can one speak of dependence at all, or should one rather speak of coincidence?

No. Even in the example of the populations of the U.S. and England, if I told you the population of one without telling you the year, that would help you to estimate the population of the other in that same year. I would not call that either a "cause" or a "pure coincidence" -- they are just correlated. I would reserve the term "pure coincidence" for things that are completely due to random luck. People generally use the term "pure coincidence" as just luck and the term "coincidence" as in "That is quite a coincidence" to indicate that they are suspicious of a cause-effect relationship.

Buzz Bloom · Dec 18, 2018

entropy1 said:

Now suppose that A and B always show both a 1 or both a 0.

Hi entropy:

Wikipedia gives the following definition.

https://en.wikipedia.org/wiki/Coincidence
A coincidence is a remarkable concurrence of events or circumstances that have no apparent causal connection with one another.

An important word in the quote from your post is "always".

Now "always" suggests an infinite number of repeated events. Since it is impractical to have an infinite number of events, I have to assume that you intended "always" to imply just an extremely large number of events, say one billion for example. If you observed one billion consecutive events in which A and B produced the same value, I would have to assume that this alone implies that there is some currently unknown casual connection, and that therefore this would not be a coincidence. If the number of events were a small number, then one could calculated that there is a "reasonable" probability that the sequence is coincidental.

Regards,
Buzz

FactChecker · Dec 18, 2018

Buzz Bloom said:

If the number of events were a small number, then one could calculated that there is a "reasonable" probability that the sequence is coincidental.

I agree. And this entire subject should be discussed in terms of probability and confidence levels, not in vaguely defined terms like "coincidental".

entropy1 · Dec 18, 2018

By "coincidental" I mean correlations without any relevant underlying cause, so luck, chance. Subsequently I wonder if the correlation shown by dependent random variables can be ascribed to luck. So could one speak of chance (coincidence) in the case of dependence? What then would cause the difference in correlation between dependent and independent variables, and the difference between one set of dependent variables and another?

FactChecker · Dec 18, 2018

There can be a mixture of dependence and uncorrelated. Suppose one variable, X, is uniformly distributed on the set {0,1} and another variable, Y, completely independent of X and uniformly distributed on the set {-1,1}. Then a third variable, Z = X*Y, would be dependent on X and on Y but uncorrelated with X and correlated with Y.

entropy1 · Dec 19, 2018

What I mean is, if we have

##P(B|A)=0.8## and ##P(B|\bar{A})=0.2##, or:
##P(B|A)=0.6## and ##P(B|\bar{A})=0.4##,

then there must be a cause causing at least the difference between (1) and (2)? If not, then (1) and (2) are both due to chance and then I tend to call the dependence/correlation a coincidence. I even think it couldn't even be called a dependence, for there is nothing causing the dependence/correlation then.

FactChecker · Dec 19, 2018

You are better off using the traditional terms of probability so that everyone understands in great detail what it means and doesn't mean. In probability, saying that A and B are "dependent" only means that the probabilities of B change when we know A or not A. It does not mean that A causes B. It just means that the probability changes. The term "coincidence" does not have a well-defined meaning in probability.

Suppose you were throwing a uniformly distributed dart at the picture below. The events A and B have no "cause-effect" relationship. They may, or may not be probabilistically dependent, depending only on the probabilities.

RPinPA · Dec 19, 2018

I'm not sure what your (1) and (2) signify. I'm also not sure what you mean by probability.

entropy1 said:

##P(B|A)=0.8## and ##P(B|\bar{A})=0.2##

That says to me that the fixed probability that B occurs, out of cases where A occurs, is 0.8. And that in cases where A does not occur, B occurs 0.2 of the time. These are constants of the population. By the law of large numbers, the larger the number of cases I test, the closer the fraction of occurrences of B will be to these two possibilities.

entropy1 said:

or ##P(B|A)=0.6## and ##P(B|\bar{A})=0.4##

then there must be a cause causing at least the difference between (1) and (2)?

That's a completely different set of events with different probabilities. The cause of the difference between (1) and (2) is that you're clearly talking about different random variables in the two cases. If you meant those to describe the same events A and B, then I can't imagine what you mean by the probability changing. If I take a million outcomes where A happened, do you think I'm going to get closer to 0.8 or 0.6 of those where B also happened? It can't be both.

FactChecker · Dec 19, 2018

RPinPA said:

I'm not sure what your (1) and (2) signify.

I interpret them as just two different examples where the probability of B changes depending on whether or not A occurs. Therefore, A and B are dependent in both cases.

FactChecker · Dec 19, 2018

entropy1 said:

I think I have to ponder correlation vs. dependence some more, I realize...

To start with, independent variables can not be correlated because the probabilities do not depend on each other. The converse is not true. One can rig up examples where two random variables are dependent (the value of one changes the probable results of the other) but the correlation between them is rigged to be zero. My example in post #12 is like that.

RPinPA · Dec 19, 2018

Part of the difficulty I'm having is that I'm not sure of OP's definition of probability. Let's talk about this sentence in the original question.

entropy1 said:

Now suppose that A and B always show both a 1 or both a 0.

Does that mean you observed this in a small number of trials? Or do you mean that there is ZERO PROBABILITY that A and B are different? Those are quite different statements. One is about a sample, the other is about the entire population, the entire set of possible outcomes.

If we're really talking about what most of us mean by probability, then I'd say there has to be some underlying reason for dependence. And I think what OP is asking is, "is it possible for two random variables to be dependent without some kind of chain of causality connecting them?" Not necessarily A causing B or B causing A. So I believe my answer is "no". There is some underlying reason, though it may be inaccessible to us.

You definitely don't want to interpret A|B as B causing A. For instance, students are often given problems like this: Urn #1 contains 10 red balls and 1 blue ball. Urn #2 contains 2 red balls and 8 blue balls. If I draw a ball from a random urn and it is red, what is the probability it was urn #2?

You should find that it's a lot more likely that a red ball came from urn #1. P(Urn #2 | Red ball) < 0.5. That doesn't mean that a red ball caused you to pick urn #1.

entropy1 · Dec 19, 2018

I thought correlation was measured by ##P(a_n=b_n)##. I think this is not correct after all. I read of the Pearson correlation. Which is the real measure for binary correlation?

FactChecker · Dec 19, 2018

The official definition of the correlation is the expected value: ##E( \frac{(X-\bar X ) *(Y-\bar Y)}{(\sigma_X \sigma_Y)})##, where ##\bar X## and ##\bar Y## are the means of ##X## and ##Y##, respectively and ##\sigma_X## and ##\sigma_Y## are the standard deviations.
If it is positive, that tells you that ##X## being over its mean tends to indicate that ##Y## is also over its mean and conversely, ##X## being below its mean tends to indicate that ##Y## is also below its mean. The equation treats ##X## and ##Y## similarly, so the reverse tendencies (##Y## above/below mean tends to indicate that ##X## above/below mean).
Corresponding things can be said if the correlation is negative, except ##X## above mean tends to imply ##Y## below mean. And other similar statements.

WWGD · Dec 19, 2018

entropy1 said:

By "coincidental" I mean correlations without any relevant underlying cause, so luck, chance. Subsequently I wonder if the correlation shown by dependent random variables can be ascribed to luck. So could one speak of chance (coincidence) in the case of dependence? What then would cause the difference in correlation between dependent and independent variables, and the difference between one set of dependent variables and another?

There is something similar to this in Control Charts, where there is variability attributable to noise (which is what I think you call 'coincidence') and variability explained through attributable causes, which are factors that can be identified and manipulated to eliminate ( or at least lower ) variability.

haruspex · Jan 5, 2019

entropy1 said:

By "coincidental" I mean correlations without any relevant underlying cause, so luck, chance. Subsequently I wonder if the correlation shown by dependent random variables can be ascribed to luck. So could one speak of chance (coincidence) in the case of dependence? What then would cause the difference in correlation between dependent and independent variables, and the difference between one set of dependent variables and another?

As has been said, "coincidence" is not a defined term in probability theory. In common parlance it has two meanings: it can just mean a correlation, for a known reason or otherwise, but more usually it implies there is no known reason. As far as I am aware it never implies there is definitely no reason.
Thus, drawing attention to the coincidence of two events often carries the implication that there is a hidden reason.
Sometimes someone may ask "do you believe in coincidence?" . Again, there is an ambiguity. It may be facetious, suggesting there is a causal connection, or it may be more mystical, suggesting that causally unrelated coincidences happen more frequently than the laws of chance allow.

entropy1 · Jan 14, 2019

The following occurred to me lately:
Suppose that we have two binary variables with a dependence of P(A,B)=0.5 . (1)
Suppose we make a batch of measurements that show P(A,B)=0.75 . (2)

Now, it could be that (1) is the rule, and we have measured exception (2). In that case I would call (2) a coincidence. The probability of getting (1) is higher, the highest in fact, and we have measured a batch which has lower probability to be measured, because it is not the rule.
It could also be the other way round: that (2) is the rule and (1) is the exception.* In that case, if the probability of getting a batch of (2) is higher, the highest in fact, then we have to adjust the dependence to (2).

So, a coincidence would be that we have measured an exception, a less probable batch, and it would have to have a measure because we would almost always measure an exception.

* It could also be that the actual dependence is something else altogether.

FactChecker · Jan 14, 2019

That is vaguely correct. Your terminology is all wrong and vague. You should study or take a class in probability and statistics if you are interested in this. You are asking basic questions about hypothesis testing. See https://en.wikipedia.org/wiki/Statistical_hypothesis_testing.

entropy1 · Jan 14, 2019

FactChecker said:

That is vaguely correct.

Ok.

Your terminology is all wrong and vague. You should study or take a class in probability and statistics if you are interested in this.

I know. I don't know if the term correlation applies or the term dependence, or a combination of the two. I don't know the status of both. And english is not my native language. I will take an effort to study statistics if I get the opportunity.

You are asking basic questions about hypothesis testing. See https://en.wikipedia.org/wiki/Statistical_hypothesis_testing.

Ok.

WWGD · Jan 14, 2019

Specifically here, you may do a test for proportions. One usually deals with confidence intervals for proportions, using the sampling distribution of proportions. If you want to test for, say , the proportion being 0.5, you will get a confidence interval centered at the proportion.

sysprog · Feb 2, 2019

quote-that-s-too-coincidental-to-be-a-coincidence-yogi-berra-83-33-80.jpg

Can this be called 'a coincidence'?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Attachments

Attachments

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight