Zafa Pi said:
Now you are asking if A's data is independent of B's setting. How is that different than asking if her data is independent of whether B is standing or not? What does it mean?
Well, think about what data Alice and Bob are collecting. They do thousands of runs. If they choose (independently and at random) from 3 possible detector settings (say 0,60 and 120 degrees) then after the experiment Alice and Bob can compile the data. It will look something like :
run 1 : Alice A = 1, a = 0, Bob B = 1, b = 60
run 2 : Alice A = -1, a = 120, Bob B = 1, b = 0
run 3 : Alice A = 1 a = 0, Bob B = -1, b = 0
run 4 : Alice A = 1, a = 60, Bob B = -1, b = 120
and so on and so on. Here A stands for Alice's 'result' and a stands for Alice's 'setting' - with B and b being Bob's equivalent quantities.
From this data the joint distribution P(A,B) can be experimentally estimated. Alice and Bob might also notice that whenever they happen to choose the same setting there is a perfect correlation and a weaker (non-zero) correlation whenever they choose different settings. So they're really looking here at subsets of the data - the results
given particular settings. In other words they're looking at P(A,B | a,b).
They can also determine the quantities P(A | a,b) and P(B | a,b) which are the marginal distributions. Nothing special here - just analysing the measured data. So the question is whether P(A | a,b) is a function of
both a and b or just a function of a? Does Alice's result probability also depend on the setting Bob has chosen?
Theoretically we might want to make the 'locality' assumption which is to state that Alice's result probability is independent of some remote setting of Bob's. Or to state that P(A | a,b) = P( A | a).
Bell goes further - he hypothesizes that there is some 'cause' for the correlation in terms of variables that we don't know about, or don't control - and that if we only did know the values of these variables we'd be able to explain the correlation. So he assumes that what we really have is a distribution of the form
P(A,B | a,b,h) where h is a symbol that stands for this collection of 'hidden' variables - which could be just one variable, a whole collection of them, or functions etc - the actual details are irrelevant.
By 'explain' we mean that we can write P(A,B | a,b,h) = P(A | a,b,h) P(B | a,b,h)
The locality assumption means that we can reduce this further to P(A,B | a,b,h) = P(A | a, h) P(B | b,h)
Bell showed that IF we make this hidden variable assumption then the data has to satisfy an inequality. [and in the proof there's also an assumption that it's meaningful to talk about the statistics of results if we'd measured things using a different angle - the counterfactual assumption]
The amazing thing, well it's amazing to me anyway, is that Bell has reduced the entire question to simply counting 'pings and dings' and reading 'angles'. Breath taking
If you really want to understand this I strongly recommend the Bertlmann's socks paper linked to by Bhobba
bhobba said:
There is - its different to classical correlations such a bell mentions in his seminal paper with Bertlmanns socks:
https://cds.cern.ch/record/142461/files/198009299.pdf
Bell explains it with far greater clarity and insight than I could ever achieve.