fanieh said:
Is there no standard usage of the word "Non-locality"?
It's probably a good idea to see where the various concepts get inserted into the actual derivation - it's much clearer. I love words and have a regrettable tendency to overuse them, but they can lead one astray
So imagine the set up where we have Alice, Bob and some source as follows
Alice <---------- source -----------> Bob
Alice and Bob each have a measurement device to measure properties of whatever it is the source produces. The outcomes are just 0 and 1 (or yes/no, or +/-, whichever convention you prefer) - it's just a binary outcome. If we let ##A## stand for the results of Alice and ##B## stand for the results of Bob we have that ##A \in \left\{ 0,1 \right\}## and ##B \in \left\{ 0,1 \right\}##
Now Alice and Bob also have dial by which they can adjust the setting on their respective measurement devices. We'll suppose each of them have only 2 settings so that Alice can pick ##a## or ##a'## and Bob can pick ##b## or ##b'##
So we could do experiments and collect enough data to construct a joint probability distribution ##P(A,B)##. But that's not really getting what we want - we have to remember that we have different settings and so what we're really interested in are the distributions for different settings - hence we're really interested in the
conditional distributions ##P(A,B|a,b)##, ##P(A,B|a',b)##, ##P(A,B|a,b')##, and ##P(A,B|a',b')##. These are all measureable quantities - there's no 'magic' here, we're just recording device settings and seeing whether our measuring machine goes 'ping' or 'pong'.
Analysing the results we see that there is some correlation and so we'd like to construct some kind of model. A correlation cries out for explanation - so we assume there are some extra variables that explain the correlation - these are the so-called 'hidden' variables (which is a bit of an unfortunate nomenclature, but it's the one that's stuck). These are given the symbol ##\lambda## and this single symbol is a shorthand for what could be a set of variables which might be discrete continuous, or functions, or even wavefunctions, it really doesn't matter in the slightest - they're just the things that explain why the results are correlated.
So if there are some variables we've not accounted for so far, we'd best put them in our model and write ##P(A,B | a,b,\lambda)## so our measured distribution is going to depend on the experimental settings and these extra variables. All is sweet smelling in the state of Denmark at the moment.
We then make the critical observation that if our variables ##\lambda## account for
all of the correlations then any
residual fluctuation in our results must be statistically
independent so that : $$P(A,B | a,b,\lambda) = P(A | a,b,\lambda)P(B | a,b,\lambda)$$All very reasonable so far. Now we make the assumption of locality in the following sense. We assume the distribution of the results ##A## is not conditioned upon the
settings at Bob, and vice versa, so that we can now write $$P(A,B | a,b,\lambda) = P(A | a,\lambda)P(B | b,\lambda)$$It's this step where the 'locality' assumption gets inserted and we can see it's quite specific.
I often make the mistake of saying 'non-local hidden variables' - and I've done so in this thread, but as Bell points out the hidden variables can be 'non-local' - which all gets very confusing. What has to remain local though is the effect of switching device settings - the hidden variables cannot be such that information about device settings is transferred from place to place in a non-local (FTL) fashion. Loosely then, there can be no mechanism by which the system (device plus measured thingy) at Alice 'knows' about the setting of Bob, and vice versa.
It's very important to note that no assumptions have been made about the nature of the source or what it produces - it's just some unspecified thing that produces some hoojamaflips which might be fields or particles, or something else - and, critically, no quantum mechanics at all has intruded here. We just have some source event that leads to measuring devices going 'ping' or 'pong'.
Where does the notion of 'realism' come in then? Well that's a bit more subtle. Essentially in the derivation it is tacitly assumed that different measurement settings can be meaningfully included in the same expression so that something like ##P(A | a,\lambda)## and ##P(A | a',\lambda)## can be meaningfully manipulated in the same expression. This is fine and valid provided we assume counterfactual definiteness (that is, 'realism'). This is tantamount to the assertion that things have objective properties independent of measurement - essentially a cornerstone of classical physics.
Everything is now set up and it's just straightforward (but ingenious) manipulations of probability distributions to arrive at the celebrated inequality. The assumptions are clear (although as Dr Chinese has beautifully pointed out above there are actually a couple of other implicit assumptions that I've not mentioned in the above - like the 'no-conspiracy' assumption, for example).
I don't know whether this clarifies things, or just muddies things even more - I've taken most of it from Bell's masterful exposition in his Bertlmann's socks paper - and if the above is unclear I urge you to read that - it really is the best explanation I've ever seen. I can't hope to match Bell's clarity and insight but I hope that I've been clear enough to show where the notions of 'locality' and 'realism' actually impact the analysis.