# Bell's derivation; socks and Jaynes

DrChinese
Gold Member
I copied this from another thread, as the subject matter overlaps in some respects and it might be of interest to readers here:

DrChinese refered to Jaynes. Jaynes (1989) thought that Bell was incorrectly performing a routine factorization of joint probabilities into marginal and conditional. Apparently Jaynes did not understand that Bell was giving physical reasons (locality, realism) why it was reasonable to argue that two random variables should be conditionally *independent* given a third. When Jaynes presented his resolution of the Bell paradox at a conference, he was stunned when someone else gave a neat little proof using Fourier analysis that the singlet correlations could not be reproduced using a network of classical computers, whose communication possibilities "copy" those of the traditional Bell-CHSH experiments. I have written about this in quant-ph/0301059. Jaynes is reputed to have said "I am going to have to think about this, but I think it is going to take 30 years before we understand Stephen Gull's results, just as it has taken 20 years before we understood Bell's" (the decisive understanding having been contributed by E.T. Jaynes.

http://arxiv.org/abs/quant-ph/0301059

I like your example of Luigi and the computers. I would recommend this paper to anyone who is interested in understanding the pros AND cons of various local realistic positions - and this is a pretty strong roundup!

gill1109
Gold Member
Jaynes (1989) thought that Bell was incorrectly performing a routine factorization of joint probabilities into marginal and conditional. Apparently Jaynes did not understand that Bell was giving physical reasons (locality, realism) why it was reasonable to argue that two random variables should be conditionally *independent* given a third. When Jaynes presented his resolution of the Bell paradox at a conference, he was stunned when someone else gave a neat little proof using Fourier analysis that the singlet correlations could not be reproduced using a network of classical computers, whose communication possibilities "copy" those of the traditional Bell-CHSH experiments. I have written about this in quant-ph/0301059. Jaynes is reputed to have said "I am going to have to think about this, but I think it is going to take 30 years before we understand Stephen Gull's results, just as it has taken 20 years before we understood Bell's" (the decisive understanding having been contributed by E.T. Jaynes).

Jaynes (1989) thought that Bell was incorrectly performing a routine factorization of joint probabilities into marginal and conditional. Apparently Jaynes did not understand that Bell was giving physical reasons (locality, realism) why it was reasonable to argue that two random variables should be conditionally *independent* given a third. When Jaynes presented his resolution of the Bell paradox at a conference, he was stunned when someone else gave a neat little proof using Fourier analysis that the singlet correlations could not be reproduced using a network of classical computers, whose communication possibilities "copy" those of the traditional Bell-CHSH experiments. I have written about this in quant-ph/0301059. Jaynes is reputed to have said "I am going to have to think about this, but I think it is going to take 30 years before we understand Stephen Gull's results, just as it has taken 20 years before we understood Bell's" (the decisive understanding having been contributed by E.T. Jaynes).
Thanks for the comment!

Actually, Jaynes did understand that Bell was giving a physical reason for it, because he cited Bell on that. Thus he thought that Bell thought that a logical dependence must be caused by a physical dependence. According to Jaynes, "Bell took it for granted that a conditional probability P (X |Y) expresses a physical causal influence, exerted by Y on X."

Now, it's still not entirely clear to me what to think of this, except for one thing: "reasonable to argue" is by far insufficient to deserve the name "theorem"...

Moreover, it appears that the locality condition as Bell formulated is insufficient to warrant his derivation. What other conditions are required for a valid factorisation inside the integral?

PS. that's an interesting paper, and while I have only read the introduction now, I'm quite happy to see the idea of a fifth possibility which seems a bit similar to what I have been thinking: just as the PoR may be interpreted as a "loophole" principle ("it can't be done"), also the Bell theorem/paradox could relate to such a principle. But that's food for another discussion.

gill1109
Gold Member
I prefer an alternative derivation to Bell's. The essence of all local hidden variables theories is that they allow the existence (in the theory), alongside of the outcomes of the actually performed measurements, also of the outcomes of the measurements which were not performed. These "counterfactual" outcomes are assigned to the same region of space-time as the factual outcomes, and locality is assumed in the sense that the outcomes in one wing of the experiment do not depend on the setting used in the other. This means that we a local hidden variables theory allows us to define four random variables X1, X2, Y1 and Y2, standing for the outcomes of each of the two possible measurements ("measurement 1, measurement 2") in each wing of the experiment (X and Y respectively). They take the values +/-1. It's easy to check that X1Y1 cannot exceed X1Y2+Y2X2+X2Y1-2. Therefore E(X1Y1) cannot exceed E(X1Y2)+E(Y2X2)+E(X2Y1)-2. Each of these expectation values is estimated in the CHSH experiment by the corresponding average of products of measurement outcomes belonging to the corresponding pair of settings.

morrobay
Gold Member
I got it from Jaynes' paper, equation (14). (I was lazy and didn't use LaTeX, so I wrote "x" instead of lambda. I'll quit doing that here.) I did say in my post that I still wanted to check to see what the corresponding equations in Bell's paper looked like. Now that you have linked to Bell's paper, let's play "spot the correspondence".

You are right that the equation I gave, equation (14) in Jaynes' paper, doesn't really have a corresponding equation in Bell's paper. But Jaynes' equation (14) is not the only equation in his paper that bears on the "factorization" issue. In fact, Jaynes' (14) is really just a "sub-expression" from his equation (12), which looks like this:

$$P(AB|ab) = \int{P(A|a, \lambda) P(B|b, \lambda) p(\lambda) d\lambda}$$

This equation is basically the same as the equation you gave from Bell's paper. Bell's equation is for the expectation value of a given pair of results that are determined by a given pair of measurement settings; Jaynes' equation is for the joint probability of a given pair of results conditional on a given pair of measurement settings. They basically say the same thing.

Jaynes' point is that to arrive at his equation in the first place, Bell has to make an assumption: he has to *assume* that the integrand can be expressed in the factored form given above. In other words, the integrand Bell writes down is not the most general one possible for the given expectation value: that would be (using Bell's notation)

$$P(a, b) = \int{A(B, a, b, \lambda) B(a, b, \lambda) p(\lambda) d\lambda}$$

The question then is whether one accepts Bell's implicit reasoning (he doesn't really go into it much; he seems to think it's obvious) to justify streamlining the integrand as he does. Jaynes does not accept that reasoning, and he gives the urn scenario as an example of why not. I agree that there is one key difference in the urn scenario: the two "measurement events" are not spacelike separated. Jaynes doesn't talk about that at all.

Edit: Bell's notation is actually a bit obscure. He says that A, B stand for "results", but he actually writes them as *functions* of the measurement settings a, b and the hidden variables $\lambda$. He doesn't seem to have a notation for the actual *outcomes* (the values of the functions given specific values for the variables). I've used A, B above to denote the outcomes as well as the functions, since Bell's notation doesn't give any other way to do it. In Jaynes' notation things are clearer; the equivalent to the above would be:

$$P(AB|ab) = \int{P(A|B, a, b, \lambda) P(B|a, b, \lambda) p(\lambda) d\lambda}$$

Edit #2: Corrected the equations above (previously I had A in the second factor in each integrand, which is incorrect). Also, Jaynes notes that there are two possible factorizations; the full way to write the equation just above would be:

$$P(AB|ab) = \int{P(A|B, a, b, \lambda) P(B|a, b, \lambda) p(\lambda) d\lambda} = \int{P(B|A, a, b, \lambda) P(A|a, b, \lambda) p(\lambda) d\lambda}$$

This is basically Jaynes' equation (15) with $\lambda$ integrated out.
What would the evaluation of this integral , the area , look like on a plot ? I understand that
the total area is equal to one.
Is it correct to say that the y axis denotes correlations and the x axis are detector settings
and the function includes cos2 or cos4 ?
And what are the units of this area ?

gill1109
Gold Member
"lambda" is everything which causes statistical dependence of the outcomes at the two locations. "Integral ... p(lambda) d lambda" can be read as "the average over lambda, of the expectation value of the product of the two outcomes given lambda". There is no assumption that lambda is a real number, or two real numbers ... it can be as complicated as you like.

The point is that the measurement results are seen as functions of the measurement settings and of a heap of variables describing the quantum system and the two measurement systems.

"the average over lambda, of the expectation value of the product of the two outcomes given lambda".
To be exact, the integral P(AB|ab) is the joint probability of the outcomes A and B given detector settings a and b. Expectation value of the product can be obtained from it in the usual way:
$E(a,b) = \sum_{i,j} A_i B_j P(A_iB_j|ab) = P(-1,-1|ab) + P(1,1|ab) -P(1,-1|ab) - P(-1,1|ab)$

Also note, in the original Bell's paper all randomness is encapsulated in λ, so the values of P(A|aλ) and P(B|bλ) are strictly 0 or 1. Bells A(a,λ) and B(b,λ) are connected with P(A|aλ) and P(B|bλ):
$P(A_i|λ) = \{ 1 if A(a,λ) = A_i, else 0 \}$
$A(a,λ) = \{ 1 if P(1|λ) = 1, else - 1 \}$

In the "Lyons and Lille" example from Socks paper there is an extra bit of "residual randomness" left over once all influences of common factors λ and local parameters a and b are factored out. That's why there are probability distributions instead of functions. This "residual randomness" is local and independent on either a,b, or λ. It does not change anything and the usual way to deal with it is to assimilate all such "residual randomness" into λ, as it was done in Bell's EPR paper.

I prefer an alternative derivation to Bell's. The essence of all local hidden variables theories is that they allow the existence (in the theory), alongside of the outcomes of the actually performed measurements, also of the outcomes of the measurements which were not performed. These "counterfactual" outcomes are assigned to the same region of space-time as the factual outcomes, and locality is assumed in the sense that the outcomes in one wing of the experiment do not depend on the setting used in the other. This means that we a local hidden variables theory allows us to define four random variables X1, X2, Y1 and Y2, standing for the outcomes of each of the two possible measurements ("measurement 1, measurement 2") in each wing of the experiment (X and Y respectively). They take the values +/-1. It's easy to check that X1Y1 cannot exceed X1Y2+Y2X2+X2Y1-2. Therefore E(X1Y1) cannot exceed E(X1Y2)+E(Y2X2)+E(X2Y1)-2. Each of these expectation values is estimated in the CHSH experiment by the corresponding average of products of measurement outcomes belonging to the corresponding pair of settings.
Isometricpion agreed with me in post #59 that Bell's derivation should apply to my thought experiment which I fully developed in post #75. It's somewhat combining Bell's sock illustration with his Lille-Lyon illustration, but in a way that in principle could be really tested in the living room. And I guess that the secret elements (which I put inside for the simulation) may be called "counterfactual", because the outcomes are defined by those elements. And the outcomes on one side are not affected by what happens on the other side (however their probabilities do of course depend on each other in the sense of Jaynes: they are correlated). Which implies, if I understand you right, that according to you the results cannot break Bell's original inequality. Correct? Or is there another unspoken requirement for heart attacks in Lille-Lyon, Bertlmann's socks and entangled electrons?

DrChinese
Gold Member
Isometricpion agreed with me in post #59 that Bell's derivation should apply to my thought experiment which I fully developed in post #75. It's somewhat combining Bell's sock illustration with his Lille-Lyon illustration, but in a way that in principle could be really tested in the living room. And I guess that the secret elements (which I put inside for the simulation) may be called "counterfactual", because the outcomes are defined by those elements. And the outcomes on one side are not affected by what happens on the other side (however their probabilities do of course depend on each other in the sense of Jaynes: they are correlated). Which implies, if I understand you right, that according to you the results cannot break Bell's original inequality. Correct? Or is there another unspoken requirement for heart attacks in Lille-Lyon, Bertlmann's socks and entangled electrons?
What he is saying is that it can break the inequality with smaller sample, but not by much. In fact, he says you should expect it sometimes. But in a larger randomized trial, such as Gill's Luigi's computers example, it is clear you cannot have such results. You will deviate fairly far from the CHSH boundary rather quickly. 30 SD with N=15000 might be typical.

The Lille-Lyon demonstration is kind of a joke to me, because it exploits the fair sampling assumption. As I am fond to say, you could use the same logic to assert that the true speed of light is 1 meter per second rather than c. The missing ingredient is always an explanation of WHY the true value is one thing and the observed value is something else. As a scientist, I don't see how you are supposed to ignore your recorded results in favor of something which is pulled out of the air.

The first impression that the results are "spooky" is therewith supported.

However, that could be just a coincidence or a calculation error. They should ask their teachers if it's OK like this and collect more data during the rest of the semester.

PS: can someone tell me please if there are no obvious mistakes, before I simulate more data.
If it isn't too much trouble, I would like to see the code that generated your results.

Edit: If it is too long to post here, perhaps we could get in touch by e-mail.

In regards to Jaynes’ view: Bell incorrectly factored a joint probability; it may be informative to analyze the data set presented by N. David Mermin in his article: “Is the moon there when nobody looks? Reality and the quantum theory.” The following represents the summary of the data.

A = Same Switch; A’ = Different Switch; B = Same Color; B’ = Different Color

P(A) = 14/45; P(B) = 24/45
P(B/A) =14/14
P(A’) = 31/45
P(B/A’) = 10/31

We can now calculate the probability of the lights flashing the same color. This should be done two ways for the purpose of resolving which argument is correct. Bell or Jaynes.

General Multiplication Rule (Dependent Events)

1. P( A and B) = P(A)*P(B/A) = (14/45)*(14/14) = .311
2. P(A’ and B) = P(A’)*P(B/A’) = (31/45)*(10/31) = .222

P(Same color) = .311 + .222 = .533

Specific Multiplication Rule (Independent Events)

3. P(A and B) = P(A)*P(B) = (14/45)*(24/45) = .166
4. P(A’ and B) = P(A’)*P(B) = (31/45)*(24/45) = .367

P(Same Color) = .166 + .367 = .533

Wow! Both methods give the same prediction of .533. This was unexpected and there may be an underlying reason for this. Mermin’s theoretical prediction for the lights flashing the same color is 1/3*1 + 2/3*1/4 = .500. The 45 runs closely match the theoretical. However, only the general multiplication rule aligns with the theoretical calculation term for term which tends to support Jaynes’ view. Assuming the above is correct with no mistakes, what do the above findings say about Bell’s derivation using the factored form of the joint probability and ultimately about Bell’s theorem?

DrChinese
Gold Member
In regards to Jaynes’ view: Bell incorrectly factored a joint probability; it may be informative to analyze the data set presented by N. David Mermin in his article: “Is the moon there when nobody looks? Reality and the quantum theory.” The following represents the summary of the data.

A = Same Switch; A’ = Different Switch; B = Same Color; B’ = Different Color

P(A) = 14/45; P(B) = 24/45
P(B/A) =14/14
P(A’) = 31/45
P(B/A’) = 10/31

We can now calculate the probability of the lights flashing the same color. This should be done two ways for the purpose of resolving which argument is correct. Bell or Jaynes.

General Multiplication Rule (Dependent Events)

1. P( A and B) = P(A)*P(B/A) = (14/45)*(14/14) = .311
2. P(A’ and B) = P(A’)*P(B/A’) = (31/45)*(10/31) = .222

P(Same color) = .311 + .222 = .533

Specific Multiplication Rule (Independent Events)

3. P(A and B) = P(A)*P(B) = (14/45)*(24/45) = .166
4. P(A’ and B) = P(A’)*P(B) = (31/45)*(24/45) = .367

P(Same Color) = .166 + .367 = .533

Wow! Both methods give the same prediction of .533. This was unexpected and there may be an underlying reason for this. Mermin’s theoretical prediction for the lights flashing the same color is 1/3*1 + 2/3*1/4 = .500. The 45 runs closely match the theoretical. However, only the general multiplication rule aligns with the theoretical calculation term for term which tends to support Jaynes’ view. Assuming the above is correct with no mistakes, what do the above findings say about Bell’s derivation using the factored form of the joint probability and ultimately about Bell’s theorem?
So let me see if I have this straight. If you apply the probability analysis (either dependent or independent in your example), you would predict .5333 (actually a minimum). The quantum prediction is .5 which agrees to actual experiments.

Well, I would say Bell's point works nicely. Focusing on his factorization is a mistake. Once you know of Bell, I think it is easier to simply require that counterfactual cases must have a probability >=0. Which is the requirement of realism, going back to EPR and the famous "elements of reality".

If it isn't too much trouble, I would like to see the code that generated your results.
Edit: If it is too long to post here, perhaps we could get in touch by e-mail.
I will post my code here if my shot in the dark completely missed - but I didn't yet automize the data treatment so I don't know yet (but I do see now that it's not clear-cut). For the moment it's simply a useful exercise for me, that helps me to better understand possible issues so that I find the right questions to ask. :tongue2:

Originally Posted by harrylin
It's somewhat combining Bell's sock illustration with his Lille-Lyon illustration, but in a way that in principle could be really tested in the living room.[..]
The Lille-Lyon demonstration is kind of a joke to me, because it exploits the fair sampling assumption. As I am fond to say, you could use the same logic to assert that the true speed of light is 1 meter per second rather than c. The missing ingredient is always an explanation of WHY the true value is one thing and the observed value is something else. As a scientist, I don't see how you are supposed to ignore your recorded results in favor of something which is pulled out of the air.
Sorry you lost me here; Bell presented that example to defend his separation of terms. What is your issue with it?

DrChinese
Gold Member
Sorry you lost me here; Bell presented that example to defend his separation of terms. What is your issue with it?
I thought you were using it to demonstrate that classical data can violate a Bell Inequality. If you weren't intending that, then my apologies. But if you were, then I will say it is not a suitable analogy. A suitable analogy would be one like particle spin or polarization.

I thought you were using it to demonstrate that classical data can violate a Bell Inequality. If you weren't intending that, then my apologies. But if you were, then I will say it is not a suitable analogy. A suitable analogy would be one like particle spin or polarization.
Bell was using it to make it plausible that classical data must obey his method of probability analysis. I mentioned why I find both Lille/Lyon and particle spin useless for illustrating such things as particle spin in post #55. For me Lille-lyon is too difficult to analyse and it doesn't include the detection aspects well. What do you find unsuited about Lille-Lyon?

In regards to Jaynes’ view: Bell incorrectly factored a joint probability; it may be informative to analyze the data set presented by N. David Mermin in his article: “Is the moon there when nobody looks? Reality and the quantum theory.” [..]
Now that you bring it up, I was going to bring up Mermin as a separate topic but perhaps the answer on my question is very simple: can anyone tell me how his equality of 0.5 follows from (or, as he presents it, is) Bell's inequality?

So let me see if I have this straight. If you apply the probability analysis (either dependent or independent in your example), you would predict .5333 (actually a minimum). The quantum prediction is .5 which agrees to actual experiments.

Well, I would say Bell's point works nicely. Focusing on his factorization is a mistake. Once you know of Bell, I think it is easier to simply require that counterfactual cases must have a probability >=0. Which is the requirement of realism, going back to EPR and the famous "elements of reality".

The data shows that the events A and B are dependent not independent, an assumption made by Bell. The P(A)*P(B/A) ≠ P(A)*P(B). Can you exlain how Bell got it right using an invalid assumption?

Now that you bring it up, I was going to bring up Mermin as a separate topic but perhaps the answer on my question is very simple: can anyone tell me how his equality of 0.5 follows from (or, as he presents it, is) Bell's inequality?
Rather than examining Mermin, you might want to look at Nick Herbert's exposition "quantumtantra.com/bell2.html" [Broken]. It's written in a style like Mermin's, but the example used is even simpler. This example was the one Bell used in talks to popular audiences, as he said that it was simplest known Bell inequality.

Last edited by a moderator:
[..] The data shows that the events A and B are dependent not independent, an assumption made by Bell. The P(A)*P(B/A) ≠ P(A)*P(B). Can you exlain how Bell got it right using an invalid assumption?
It appears that you don't have lambda in your analysis. That is however necessary to test his assumption (see the discussion on the first page of this thread).

Rather than examining Mermin, you might want to look at Nick Herbert's exposition "quantumtantra.com/bell2.html" [Broken]. It's written in a style like Mermin's, but the example used is even simpler. This example was the one Bell used in talks to popular audiences, as he said that it was simplest known Bell inequality.
Thanks, that may very well provide the answer on my Mermin question and it looks very interesting.

PS: I think that Herbert's proof deserves to be a separate topic - it looks really good and no need for a lambda!

Last edited by a moderator:
Thanks, that may very well provide the answer on my Mermin question and it looks very interesting.

PS: I think that Herbert's proof deserves to be a separate topic - it looks really good and no need for a lambda!
Yes, it would be nice to have a thread on Herbert's proof.