Bernouli's Urn Revisited
Define the propositions:
I : "Our urn contains N balls, identical in every respect except that M of them are red, the remaining N-M white. We have no information about the location of particular balls in the urn. They are drawn out blindfolded without replacement."
R_i : "Red on the i'th draw, i = 1; 2; ..."
Successive draws from the urn are a microcosm of the EPR experiment. For the first draw, given only the prior information I , we have
P(R_1|I ) = M/N (16)
Now if we know that red was found on the first draw, then that changes the contents of the urn for the second:
P(R_2|R_1,I ) = (M-1)/(N-1) (17)
and this conditional probability expresses the causal influence of the first draw on the second, in just the way that Bell assumed.But suppose we are told only that red was drawn on the second draw; what is now our probability for red on the first draw? Whatever happens on the second draw cannot exert any physical influence on the condition of the urn at the first draw; so presumably one who believes with Bell that a conditional probability expresses a physical causal influence, would say that P(R_1|R_2,I) = P(R_1|I).
But this is patently wrong; probability theory requires that
P(R_1|R_2,I) = P(R_2|R_1,I) (18)
This is particularly obvious in the case M = 1; for if we know that the one red ball was taken in the second draw, then it is certain that it could not have been taken in the first.
In (18) the probability on the right expresses a physical causation, that on the left only an inference. Nevertheless, the probabilities are necessarily equal because, although a later draw cannot physically affect conditions at an earlier one, information about the result of the second draw has precisely the same effect on our state of knowledge about what could have been taken in the first draw, as if their order were reversed.
Eq. (18) is only a special case of a much more general result. The probability of drawing any sequence of red and white balls (the hypergeometric distribution) depends only on the number of red and white balls, not on the order in which they appear; i.e., it is an exchangeable distribution. From this it follows by a simple calculation that for all i and j ,
P(R_i|I ) = P(R_j|I) = M/N (19)
That is, just as in QM, merely knowing that other draws have been made does not change our prediction for any specified draw, although it changes the hypothesis space in which the prediction is made; before there is a change in the actual prediction it is necessary to know also the results of other draws. But the joint probability is by the product rule,
P(R_i,R_j|I) = P(R_i,|R_j,I )P(R_j|I) = P(R_j|R_i,I)P(R_i|I) (20)
and so we have for all i and j ,
P(R_i|R_j,I ) = P (R_j|R_i,I) (21)
and again a conditional probability which expresses only an inference is necessarily equal to one that expresses a physical causation. This would be true not only for the hypergeometric distribution, but for any exchangeable distribution. We see from this how far Karl Popper would have got with his "propensity" theory of probability, had he tried to apply it to a few simple problems.
It might be thought that this phenomenon is a peculiarity of probability theory. On the contrary, it remains true even in pure deductive logic; for if A implies B, then not-B implies not-A. But if we tried to interpret "A implies B" as meaning "A is the physical cause of B", we could hardly accept that "not-B is the physical cause of not-A". Because of this lack of contraposition, we cannot in general interpret logical implication as physical causation, any more than we can conditional probability. Elementary facts like this are well understood in economics (Simon & Rescher, 1966; Zellner, 1984); it is high time that they were recognized in theoretical physics.