@IsometricPion: thanks for the links! I suspect that our own discussion here, which is based on
http://cdsweb.cern.ch/record/142461, will show that the Arxiv paper misses the point; we'll see!
instead of running to eq.11, I will first work out the example that Bell gave in his introduction, as he did not do so himself.
Note that in Bell's paper the pictures come after the text. I'll start with a partial retake.
Elaborating on Bell's example of Bertlmann's socks, we could write for example:
P1(pink) = 0.5
Here P1(pink) stands for the probability to observe a pink sock on the left foot on an arbitrary day. An experimental estimation of it is found by taking the total from many observations, divided by the number of observations.
As the colour depends on Bertlmann's mood, we can account for that mood as an unknown function "lambda" (here I will just put X, for unknown). However, any "classical" theory that proposes such a physical model, still must predict the same observed result. Therefore, if we include X as invisible cause for the outcome, we must still write:
P1(pinkX) = 0.5
(Compare: P(head  fair coin) = 0.5)
Similarly we can write for the right leg:
P2(pinkX) = 0.5
Bell remarks:
Which colour he will have on a given foot on a given day is quite unpredictable. But when you see that the first sock is pink you can already be sure that the second sock will not be pink. Observation of the first, and experience of Bertlmann, gives immediate information about the second.

The fact that "pink" on the left foot implies "not pink" on the right foot implies a strong correlation between results. We can acknowledge that correlation as follows, with for convenience a slight change of notation:
P(L,RX) =/= P1(LX) P2(RX)
Here L stands for "pink on left leg", and R stands for "pink on right leg".
Ok so far?