# Bell's derivation; socks and Jaynes

## Main Question or Discussion Point

Hello,

For this little discussion I base myself on Bell's paper on Bertlmann's socks:
http://cdsweb.cern.ch/record/142461

Although I have participated in a number of discussions about Bell's theorem, I always had the uneasy feeling not to fully understand the definitions of symbols and the notation - in particular how to account for lambda in probability calculations.

So, although I intend to discuss here the validity (or not) of Jayne's criticism of Bell's equation no.11, I'll start very much more basic. Using Bell's example of socks, I think that we could write for example:

P1(pink) = 0.5

Here P1(pink) stands for the probability to observe a pink sock on the left foot on an arbitrary day. An experimental estimation of it is found by taking the total from many observations, divided by the number of observations.

As the colour depends on Bertlmann's mood, we can then account for that mood as an unknown variable "lambda" (here I will just put X, for unknown). However, any local realistic theory that proposes such an unknown variable as explanation, still must predict the same observed result. Therefore, I suppose that if we include X as causal factor, we must still write:

P1(pink|X) = 0.5

Thus far correct?

Last edited:

Related Quantum Physics News on Phys.org
P1(pink) = 0.5

Here P1(pink) stands for the probability to observe a pink sock on the left foot on an arbitrary day. An experimental estimation of it is found by taking the total from many observations, divided by the number of observations.

As the colour depends on Bertlmann's mood, we can then account for that mood as an unknown variable "lambda" (here I will just put X, for unknown). However, any local realistic theory that proposes such an unknown variable as explanation, still must predict the same observed result. Therefore, I suppose that if we include X as causal factor, we must still write:

P1(pink|X) = 0.5

Thus far correct?
I don't think this is quite correct. Rather, if X correlates with Bertlmann wearing a pink sock then P1(pink|X)=(P1(pink)/P(X))>0.5. Instead, $\int{}P_1(pink|X)P(X)dX=P_1(pink)=0.5$ (obviously if X is a causal factor it must correlate with P1(pink)). I think what Bell is saying in equation 11 is that if one knew λ (in addition to the local conditions) there would be no residual correlations between the distributions of the measurements (after accounting for its effects).

I don't think this is quite correct. Rather, if X correlates with Bertlmann wearing a pink sock then P1(pink|X)=(P1(pink)/P(X))>0.5. Instead, $\int{}P_1(pink|X)P(X)dX=P_1(pink)=0.5$ (obviously if X is a causal factor it must correlate with P1(pink)). [..]).
Thanks for that clarification! I had not looked at it that way. However, X is like EPR's hidden function: Bertlmann's unknown and unpredictable mood determines what socks he will wear. X stands for the physical model, which is here an invisible random function (indeed, it happens in his head) that delivers one of {pink, not pink}. Obviously the chance to observe a Bertlmann pair of socks on Bertlmann's feet is simply 1. Then we must have, for the case that half of the time a pink sock is observed on the left foot:
P1(pink|X)=P1(pink)/P(X) = P1(pink)/1 =0.5

It's exactly the same as for a fair coin: P(head | fair coin) = 0.5.

I can imagine that someone would like to split the probability estimation up into unknown "knowns": then we can separate it into the cases that Bertlmann decides to put a pink sock on his left leg, and the cases that he decides to put another colour on his left leg. However, what we are interested in the result over many times, and then we are necessarily back at where we were here above. Thus, I don't see any use for that.

Last edited:
X stands for the physical model, which is here an invisible random function (indeed, it happens in his head) that delivers one of {pink, not pink}. Obviously the chance to observe a Bertlmann pair of socks on Bertlmann's feet is simply 1. Then we must have, for the case that half of the time a pink sock is observed on the left foot:
P1(pink|X)=P1(pink)/P(X) = P1(pink)/1 =0.5

It's exactly the same as for a fair coin: P(head | fair coin) = 0.5.
I misinterpreted what you meant by X. I took it to mean a variable taking on values from the set of moods, some subset of which would correlate with Bertlmann wearing a pink sock instead of the mood model itself. In the latter case, I certainly agree with your results.

Edit: A couple of papers that may be relevent to this discussion: Jaynes' view of EPR, a critque of Jaynes' view

Last edited:
@IsometricPion: thanks for the links! I suspect that our own discussion here, which is based on http://cdsweb.cern.ch/record/142461, will show that the Arxiv paper misses the point; we'll see!

instead of running to eq.11, I will first work out the example that Bell gave in his introduction, as he did not do so himself.
Note that in Bell's paper the pictures come after the text. I'll start with a partial re-take.

Elaborating on Bell's example of Bertlmann's socks, we could write for example:

P1(pink) = 0.5

Here P1(pink) stands for the probability to observe a pink sock on the left foot on an arbitrary day. An experimental estimation of it is found by taking the total from many observations, divided by the number of observations.

As the colour depends on Bertlmann's mood, we can account for that mood as an unknown function "lambda" (here I will just put X, for unknown). However, any "classical" theory that proposes such a physical model, still must predict the same observed result. Therefore, if we include X as invisible cause for the outcome, we must still write:

P1(pink|X) = 0.5
(Compare: P(head | fair coin) = 0.5)

Similarly we can write for the right leg:

P2(pink|X) = 0.5

Bell remarks:
Which colour he will have on a given foot on a given day is quite unpredictable. But when you see that the first sock is pink you can already be sure that the second sock will not be pink. Observation of the first, and experience of Bertlmann, gives immediate information about the second.
The fact that "pink" on the left foot implies "not pink" on the right foot implies a strong correlation between results. We can acknowledge that correlation as follows, with for convenience a slight change of notation:

P(L,R|X) =/= P1(L|X) P2(R|X)

Here L stands for "pink on left leg", and R stands for "pink on right leg".

Ok so far?

Last edited:
pink = 1, not pink = -1

then:
P(LR) = 0, because L and R are always different;

formally:
P(LR) = P(L|R)*P(R); P(R) = P(L) = 1/2;
but: P(L|R) = 0 <> P(L); (both socks have never the same colour)

corr = 0 - 1 = -1, full anti-correlation.

And using Bell reasoning:
P(LR) = P(L)*P(R) = 1/2 * 1/2 = 1/4;

corr = 2*1/4 - 2*1/4 = 0.

Two random socks, and completely independent, of course.

@ alsor: it appears that in this matter we both agree with Jaynes.:tongue2:
However, again you ran far ahead of me and I'm not sure if everyone who, so far, didn't "see" this point of Jaynes etc., could follow you. So, I'll continue my slow pace to make sure that everyone who watches this topic can follow me and that we all agree on the basic facts as well as notation. I'll catch up with you later.

Last edited:
And using Bell reasoning:
P(LR) = P(L)*P(R) = 1/2 * 1/2 = 1/4;

corr = 2*1/4 - 2*1/4 = 0.

Two random socks, and completely independent, of course.
This misrepresents Bell's model of local hidden variables. Equation 11 of his paper assumes one knows the values of the hidden variables, in this case Bartlmann's mood. So, P(L|bartlmann feels like wearing a pink sock on his right foot)=0 (since beyond his mood one also knows that he does not wear the same color socks) and P(R|bartlmann feels like wearing a pink sock on his right foot)=1. Thus P(L|mood=right, pink)*P(R|mood=right, pink)=1*0=0.

If instead one does not know his mood (or anything about it other than it can take on one of two sets of values), P(L)=P(L|RP)P(RP)+P(L|R¬P)P(R¬P)=0*0.5+1*0.5=0.5, by exchangeability. The problem Jaynes sees with Bell's reasoning is not his statistical or mathematical procedure/ability, rather he thinks Bell is to restrictive in what he (Bell) consideres to be valid variables for the probability distributions for a theory upholding local realism.

[..]The problem Jaynes sees with Bell's reasoning is not his statistical or mathematical procedure/ability, rather he thinks Bell is to restrictive in what he (Bell) consideres to be valid variables for the probability distributions for a theory upholding local realism.
Thanks for the correction; however, although indeed Bell doesn't make a blunder of that proportion, Jaynes certainly points out a subtle error in Bell's equation; according to Jaynes it is not correct.
Anyway, we're not there yet: the problem with the illustration of Bertlmann's socks is that it by far doesn't catch the complexity of the problem at hand. If the observations would always be perfectly anti-correlated, there wouldn't be a riddle.

Now, I'm afraid that his next illustration of Lille and Lyon matches it even less well; thus, for this discussion I have been trying to come up with a variant of Bertlmann's socks that addresses the fact that the local conditions affect the observed correlation, but I didn't come up with a good looking one (I thought of observation of white or yellow socks in daylight/artificial light, as well as mud on his socks, but I'm not satisfied). Any better suggestion? If not, we should perhaps move on to the introduction of eq.11.

Last edited:
Although I have participated in a number of discussions about Bell's theorem, I always had the uneasy feeling not to fully understand the definitions of symbols and the notation - in particular how to account for lambda in probability calculations.
Metaphors are unnecessary and sometimes confusing, imho. Why not just refer to Bell's original formulation of a local realistic QM expectation value. Where does lambda appear and what does it refer to?

Metaphors are unnecessary and sometimes confusing, imho. Why not just refer to Bell's original formulation of a local realistic QM expectation value. Where does lambda appear and what does it refer to?
While it may appear that he defines it very precisely, different people interpret it slightly differently in the literature. Moreover, I wasn't in the clear about notation. However, this discussion is already making it quite clear (I just needed a memory refresh!); we're now moving on to Bell vs Jaynes.

Last edited:
jtbell
Mentor
we're now moving on to Bell vs Jaynes.
I'm pretty much a spectator in these discussions, but I'd like to point out that there was a long thread here about three years ago, about Jaynes's objections to Bell:

This was before you joined PF, so you may not have seen this. It may or may not fit in with the direction you were planning to go.

It was split off from another thread, by the way, which is why it appears to start in the middle of a discussion.

I'm pretty much a spectator in these discussions, but I'd like to point out that there was a long thread here about three years ago, about Jaynes's objections to Bell:

This was before you joined PF, so you may not have seen this. It may or may not fit in with the direction you were planning to go.

It was split off from another thread, by the way, which is why it appears to start in the middle of a discussion.
Thank you! Indeed I had not seen that one... BTW I was also very much a spectator of another current thread in which I saw the suggestion to start this topic. Now I'll first check out the old thread.

Thank you! Indeed I had not seen that one... BTW I was also very much a spectator of another current thread in which I saw the suggestion to start this topic. Now I'll first check out the old thread.
Ok, I'm afraid that I will need some time to work through that old thread; and I'm very busy this week.

Still, I started reading it and I notice some disagreement about what Bell claimed to prove. There is no use getting into arguments about the meaning of "local realism" and philosophy. What the "local realist" Einstein insisted on, and what Bell claimed to be incompatible with QM, was "no spooky action at a distance". Or, as Bell put it in his first paper:
that the result of a measurement on one system be unaffected by operations on a distant system with which it has interacted in the past.
Those who deviate from that issue are shooting at straw men.

Bell puts it this way in his Bertlmann's socks paper:
What is held sacred is the principle of "local causality" - or "no action at a distance". [...] What [Einstein] could not accept was that an intervention at one place could influence, immediately, affairs at the other.
The focus of this discussion is Bell's attempt to prove that Einstein's "no action at a distance" principle is incompatible with QM, in the light of Jayne's first criticism.

Last edited:
I'm pretty much a spectator in these discussions, but I'd like to point out that there was a long thread here about three years ago, about Jaynes's objections to Bell:

This was before you joined PF, so you may not have seen this. It may or may not fit in with the direction you were planning to go.

It was split off from another thread, by the way, which is why it appears to start in the middle of a discussion.
I now had a better look at it, and I think that in particular posts #26 and #31 are important. Anyway I'll give a short summary of how I now see it.

If Jaynes' criticism focuses on Bell's equation no.11 in his "socks" paper, it was perhaps due to a misunderstanding about what Bell meant (his comments were based on an earlier paper).

P(AB|a,b,x) = P(A|a,x) P(B|b,x) (Bel 11)

Here x stands for Bell's lambda, which corresponds to the circumstances that lead to a single pair correlation (in contrast to my earlier X, which causes the overall correlation for many pairs).

According to Jaynes it should be instead, for example:

P(AB|a,b,x) = P(A|B,a,b,x) P(B|a,b,x)

Perhaps Jaynes thought that Bell meant:

P(AB|a,b,X) = P(A|a,X) P(B|b,X)

in which case Jaynes claimed that:

P(AB|a,b,X) = P(A|B,a,b,X) P(B|a,b,X)

This is really tricky. :uhh:

However, he really was disagreeing with the integral equation.
According to him, it should not be:

P(AB|a,b) = ∫ P(A|a,x) P(B|b,x) p(x) dx

but:

P(AB|a,b) = ∫ P(AB|a,b,x) P(x|a,b) dx

and thus:

P(AB|a,b) = ∫ P(AB|a,b,x) p(x) dx = ∫ P(A|B,a,b,x) P(B|a,b,x) p(x) dx

Is my summary of the disagreement correct?

What is the significance of little p(x) instead of P(x)?

DrChinese
Gold Member
According to Jaynes it should be instead, for example:

P(AB|a,b,x) = P(A|B,a,b,x) P(B|a,b,x)
http://bayes.wustl.edu/etj/articles/cmystery.pdf

As I read it, this is one of Jaynes's arguments. However, I think it is attacking a straw man. The essence of Bell's argument does not require the factorization so much as a definition of what realism is.

For a SINGLE photon, not a pair: does it have a well-defined polarization at 0, 120, and 240 degrees independent of the act of observation? Once you answer this in the affirmative, as any local realist must, the Bell conclusion (a contradiction between the assumption and QM's predictions) follows quickly. If you answer as no, then you already deny local realism so it is moot.

So I really don't see the significance here of Jaynes' argument. The only people that take it seriously are local realists looking for support for their position. The vast majority of scientists see it for what it is, something of a technicality with no serious implications for the Theorem whatsoever.

In other words, it would be helpful to see an example that somehow related specifically to photon polarization rather than urns (which does not seem to be much of an analogy).

Last edited:
Speaking of this paper, does anyone know what Jaynes is talking about in the end of page 14 and going on to page 15, concerning "time-alternation theories"? He seems to be endorsing a local realist model which makes predictions contrary to QM, and he claims that experiments peformed by "H. Walther and coworkers on single atom masers are already showing some resemblance to the technology that would be required" to test such a theory. Does anyone one know whether such a test has been peformed in the decades since he wrote his paper?

Is my summary of the disagreement correct?

What is the significance of little p(x) instead of P(x)?
I think it is clear that λ in Bell's paper corresponds to x here (rather than X). While I am less certain of Jaynes' meaning I think it is probably x as well (since it appears on the left side of the | indicating that it is a variable in some of his equations). Jaynes is pretty consistent, so I would expect everything he denotes by λ to refer to the same thing (i.e., all his λ's should correspond with x's rather than X's).

Jaynes refers to probabilities essentially as logical statements of uncertain truth value. His P(y|Y) correspond to logical statements where Y is the predicate and y is the antecedent the truth value of which one is uncertain (the amount of (rational) belief one has that y has a value between u and v is P(u≤y≤v|Y)=∫uv P(y|Y)dy). He refers to any probability not of this form as p(y), since one cannot ascribe a logical statement to such a probability without more information regarding its context. Since the context here is clear and consistently applied, I think it is just a matter of formalism (i.e., there is no substantial difference). (Jaynes defines what he means by these symbols in Appendix B of Probability Theory: The Logic of Science.)

Jaynes states what he thinks are Bell's hidden assumptions:
Jaynes in Clearing Up Mysteries said:
(1)...Bell took it for granted that a conditional probability P(X Y) [sic P(X|Y)] expresses a physical causal influence, exerted by Y on X. ...

(2)The class of Bell theories does not include all local hidden variable theories...
He goes on to mention a type of local hidden variable theory he does not think Bell's theorm covers, though I do not yet understand his arguement as to why it isn't covered.

I think a key point to this discussion is how to define local realism in terms of the functional dependence of probability distributions of outcomes of the EPR (thought) experiment. Once this is agreed upon (i.e., all the variables and symbols we are using are well-defined) the rest should just be a matter of mathematics (about which I think we all should be able to agree).

Last edited:
He goes on to mention a type of local hidden variable theory he does not think Bell's theorm covers, though I do not yet understand his arguement as to why it isn't covered.
Yes, that's what I was asking about in my previous post. He claims that ordinary Bell tests won't be able to test this "time-alternating" model, but some other experiments could test it.

1. Urn example is a red herring. The cases are not equivalent. There is no correspondence established between A,B,a,b on one hand and R1,R2 on the other. Specifically, λ was lost along the way. Let's try to put it back.

λ is going to be the complete state of the urn before the first draw - that's our hidden random variable. A would be the location of the ball to be drawn first - a freely chosen parameter, mutually independent from λ. a=a(A,λ) is the outcome - deterministic function of A and λ. Now the state of the urn after the first draw is γ=γ(A,λ) - another deterministic function of A and λ. And finally b=b(B,γ) - is yet another deterministic function.

Now b=b(B,γ)=b(B,γ(A,λ))=b(A,B,λ), and b clearly depends on A, that is ∃A,B1,B2,λ: b(A,B1,λ) ≠ b(A,B2,λ). Here b(...) is a deterministic function and A,B1,B2,λ are merely placeholders, arguments of ∃, loop variables is you wish. There is a clear causal link: given the same initial state of the urn, the choice of ball in the first draw causally affects the results of the second draw.

In contrast, in Bell's case we have explicitly denied this causal link as violating locality: ∀A,B1,B2,λ: b(A,B1,λ) = b(A,B2,λ). Note we are not talking here about randomness, conditional probabilities, observer's state of knowledge etc., we are simply trying to establish a link between the value of a deterministic function and its parameter. See the difference?

So the two cases have different physical models behind them and it is not a surprise the results of one are not applicable to another. In fact, the urn would be a great example provided first and second draw can be spacelike separated

2. Regarding time-dependence. In Bell's case λ includes everything that might possibly affect the experiment, except for settings A and B. I think it is safe to assume that absolute value of t does not matter (otherwise we're in for a rough ride). Any relative time delay in the experiment will appear as yet another random factor collectively included in λ and the integral in (12) will include integration over the whole range of it. As long as these delays are independent from choices A and B, it's within Bell's framework.

Last edited:
DrChinese
Gold Member
1. Urn example is a red herring. The cases are not equivalent. There is no correspondence established between A,B,a,b on one hand and R1,R2 on the other. ...
Thanks for clarifying. I still have a hard time figuring out how the choice of measurement angle (for Alice and Bob) fits in. But you don't need to try to explain that point unless you want to.

PeterDonis
Mentor
2019 Award
There is a clear causal link: given the same initial state of the urn, the choice of ball in the first draw causally affects the results of the second draw.
Yes, but that's not Jaynes' point. His point is that the choice of ball in the *second* draw cannot possibly causally affect the results of the *first* draw--yet logically, the two are not independent, so the probabilities don't factorize. (For example, if there is only one red ball in the urn, and we are told that a red ball was drawn on the second draw, the probability of drawing a red ball on the first draw given that data is zero.)

DrChinese
Gold Member
Yes, but that's not Jaynes' point. His point is that the choice of ball in the *second* draw cannot possibly causally affect the results of the *first* draw--yet logically, the two are not independent, so the probabilities don't factorize. (For example, if there is only one red ball in the urn, and we are told that a red ball was drawn on the second draw, the probability of drawing a red ball on the first draw given that data is zero.)
I get that the Jaynes' critique has to do with the factorizing that Bell pushed, but really how does that change the Bell result in any way?

If you assume realism (simultaneous existence of particle attributes independent of the act of observation) AND locality (that the results of Alice and Bob's observations are causally independent): you easily get the Bell result a number of ways. What does factorizing have to do with this? To me, the factorizing was just a way to express the locality assumption, but certainly not central to the argument.

Although I hardly see how Jaynes' point even applies, the urn example seems so contrived.

PeterDonis
Mentor
2019 Award
I get that the Jaynes' critique has to do with the factorizing that Bell pushed, but really how does that change the Bell result in any way?

If you assume realism (simultaneous existence of particle attributes independent of the act of observation) AND locality (that the results of Alice and Bob's observations are causally independent): you easily get the Bell result a number of ways. What does factorizing have to do with this? To me, the factorizing was just a way to express the locality assumption, but certainly not central to the argument.
I thought that having the probabilities for the two measurements factorize was a crucial step in deriving the Bell Inequalities; without the factorization the inequalities can't be derived.

Although I hardly see how Jaynes' point even applies, the urn example seems so contrived.
He was just using it as a simple example, easy to visualize, where A is causally independent of B but the joint probability of A and B doesn't factorize into separate probabilities for A and B. (I realize I'm speaking loosely, if I need to tighten it up I'll do so.)

Yes, but that's not Jaynes' point. His point is that the choice of ball in the *second* draw cannot possibly causally affect the results of the *first* draw--yet logically, the two are not independent, so the probabilities don't factorize.
Bell stipulates that conditional probability of outcome a is independent from free parameter B. Jaynes says conditional probability of one outcome R1 is not independent from another outcome R2. See the difference? If we are talking about outcomes, then of course outcomes a and b are not independent in Bell's case, but it is not the point.

In Bell''s case outcome a and experiment parameters A,B are connected (or not connected as the case might be) through deterministic function a(A,λ). If we try to introduce similar notion in Jayne's case, we'll see the function behaves differently, describing different physical model.