RandallB said:
			
		
	
	
		
		
			Sorry, Now you have me totally confused and rereading your proof and prior posts are of no help.  Given this current statement I am at a loss to understand what the propose of the D-function was in your prior posts.
		
		
	 
The whole idea of Bell's proof is that whether the red or the green light lights up at the Alice box is given by a probability that is determined by the "local inputs", which are two-fold: an input that comes from the "central box", and the button that Alice pushes.
That is, GIVEN these inputs, so given the message from the central box, and the choice of Alice, this gives us a probability for there to be "red" as a result (and hence, the complementary probability to have "green" of course).
Now, this can be a genuine probability, like, say, 0.6, or it can be a certainty, which comes down to the probability to be 0 (green for sure) or 1 (red for sure).  We leave this open.
So GIVEN the message from the central box (lambda1 if you want), and GIVEN the choice by Alice (X, which is A, B or C), we have a function, which is P(X,lambda1), and gives us that famous probability.
We can hold the same reasoning at Bob's, where the function will be Q(Y,lambda2).
Now, D is the expectation value of the correlation function of Alice's and Bob's outcomes, when they have picked respectively X and Y, and when the message lambda1 was sent to alice, and the message lambda2 was sent to bob.
D is nothing else but the probability to have (red,red) times +1 plus the probability to have (green,green) times +1 plus the probability to have (red,green) times -1 plus the probability to have (green,red) times -1, under the assumption that Alice pushed X, that Bob pushed Y, that lambda1 was sent to Alice, and under the assumption tht lambda2 was sent to Bob.
As we assume that the "drawing" is done locally (all "common information" is already taken care off by the message lambda1 and lambda2, so we only look at the REMAINING uncertainties), we can assume that the probability to have, say, (red,red) is given by:
P(X,lambda1) x Q(Y,lambda2).  
The probability to have, red-green is given by:
P(X,lambda1) x (1 - Q(Y,lambda2) )
etc...
And from this, we can calculate the above D function (the expectation over the remaining probabilities, given X, Y, lambda1 and lambda2) and we find:
D(X,Y, lambda1, lambda2) = ( 1 - 2 x P(X,lambda1) ) x (1 - 2 x Q(Y, lambda2))
Now there is a triviality, which seems to be confusing you, which I applied:
we can call a new mathematical structure: lambda = { lambda1, lambda2 }.  If lambda1 is a real number, and lambda2 is a real number, then lambda can be seen as a 2-dim vector.  If lambda1 was a text file, and lambda2 is a text file, then lambda can be seen as the concatenation of the two text files.  It is just NOTATION.
Now, if in all generality, you have a function f(x), you can ALWAYS define a function g(x,y) which is equal to f(x) for all values of y, of course.  
So if P(X,lambda1) is a function of lambda1, you can ADD lambda2 as an argument, which doesn't do anything: P'(X,lambda1,lambda2) = P(X,lambda1).  
Same for Q, we can define Q'(Y,lambda1,lambda2) = Q(Y, lambda2).
But we have the "vector" notation lambda which stands for {lambda1, lambda2}, so we can write P'(X,lambda) and Q'(Y,lambda).  They just have a "useless" argument more, but they are the same function, just as g(x,y) is in fact just f(x), and y doesn't play a role.  But if this confuses you, I will continue to write lambda1, lambda2.
So we can write:
D(X,Y, lambda1, lambda2) = ( 1 - 2 x P'(X,lambda1,lambda2) ) x (1 - 2 x Q'(Y, lambda1,lambda2))
And we can drop the ', and call P', simply P, and Q' simply Q.
So we can write:
D(X,Y, lambda1, lambda2) = ( 1 - 2 x P(X,lambda1,lambda2) ) x (1 - 2 x Q(Y, lambda1,lambda2))
Ok, so D was the expectation value of the correlation, GIVEN the choice of Alice and Bob, and GIVEN the (hidden) messages sent from the central box.
It is important to note that D is always a real number between -1 and +1.  This comes from the fact that P and Q are probabilities, and hence between 0 and 1.
Now, we assume that those messages themselves are randomly sent out with a given probability distribution.  That means, there's a certain probability Pc(lambda1,lambda2) to send out a specific couple of messages, namely {lambda1,lambda2}.
Given that Alice and Bob can't see that message, THEIR correlation function (for a given choice X and Y) will be the expectation value of D over this probability distribution of the couples (lambda1, lambda2), right ?  Bob and Alice will "average" their correlation function over the messages.
So how does this work out ?  Well, you have to sum of course each value of D(X,Y,lambda1,lambda2) multiplied with the probability that the messages sent out will be {lambda1,lambda2}.  THIS will give you the correlation function that Bob and Alice will find when they picked X and Y, in other words, C(X,Y).
So we have that:
<br />
C(X,Y) = \sum_{(lambda_1,lambda_2)}  D(X,Y,lambda1,lambda2) Pc(lambda1,lambda2)<br />
This "sum" can be an integral over whatever is the set of the couples (lambda1,lambda2).  It can be a huge set.  In the case of text files, we have to sum over all thinkable couples of textfiles (but some might have probability Pc=0 of course).  In the case of real numbers, we have to integrate over the plane.  It doesn't matter.
The above expression is valid for the 9 different C(X,Y) values: for C(A,A), for C(A,B),...
But we KNOW certain C values: C(A,A) = 1 for instance.  Does C(A,A) = 1 impose a condition on D or on Pc ?
Yes, it does.  This is the whole point.  Let us write out the above expression for the case C(A,A):
<br />
C(A,A) = 1 = \sum_{(lambda_1,lambda_2)}  D(A,A,lambda1,lambda2) Pc(lambda1,lambda2)<br />
Now, 
<br />
 \sum_{(lambda_1,lambda_2)}  Pc(lambda1,lambda2) = 1 <br />
because it is a probability distribution, all Pc values are between 0 and 1, and D(A,A,lambda1,lambda2) is a number between -1 and 1.  Such a sum can only be equal to 1 if ALL D(A,A,lambda1,lambda2) values are equal to 1 (at least, for those lambda1 and lambda2 for which Pc is not equal to 0).
So we know that D(A,A,lambda1, lambda2) = 1 for all lambda1, and all lambda2.
But we also know that D(A,A,lambda1,lambda2) = ( 1 - 2 x P(A,lambda1,lambda2) ) x (1 - 2 x Q(A, lambda1,lambda2))
So we have that:
( 1 - 2 x P(A,lambda1,lambda2) ) x (1 - 2 x Q(A, lambda1,lambda2)) = 1 for all lambda1, and lambda2.
Well, (1 - 2 x) (1 - 2 y), with x and y between 0 and 1, can only be equal to 1 in two different cases:
x = y = 1  OR
x = y = 0.
This means that for each couple (lambda1, lambda2) we have only 2 possibilities:
OR 
P(A,lambda1,lambda2) = Q(A,lambda1,lambda2) = 1 
OR
P(A,lambda1,lambda2) = Q(A,lambda1,lambda2) = 0
Of course, if you take a random lambda1 and lambda2, it can be, say 1, and if you take another lambda1 and lambda2, it can be 0, but it is in each case one of both.
So this means we can split the whole set of (lambda1,lambda2) couples into two parts:
those couples that give P(A,lambda1,lambda2) = Q(A,lambda1,lambda2) = 1 and then the other couples, which necessarily give:  P(A,lambda1,lambda2) = Q(A,lambda1,lambda2) = 0.
Concerning P(A,lambda1,lambda2), we hence don't need to know precisely what are lambda1, and lambda2 (text files, numbers,...), but just whether they fall in the first part, or in the second, because in the first part, P(A,lambda1,lambda2) will be equal to 1, and in the second part, it will be 0.  In ANY case, P(A,lambda1,lambda2) = Q(A,lambda1,lambda2).
So if we know in which of the part the couple (lambda1,lambda2) falls, we know enough about it to know the value of P(A,lambda1,lambda2) and Q(A,lambda1,lambda2).  It is either 1 or 0.   So the split of the set of couples (lambda1,lambda2) comes about because of the fact that we deduced that in any case, P(A,lambda1,lambda2) = Q(A,lambda1,lambda2) can only take up 2 possible values.
Now, we apply the same reasoning to C(B,B) = 1 and then to C(C,C) = 1, and we will now have 3 "partitions" in two of the set of (lambda1,lambda2) couples.  The first partition, as we showed, determines the value of P(A,lambda1,lambda2) = Q(A,lambda1,lambda2) = 0 or 1.  The second partition will determine the value of P(B,lambda1,lambda2) = Q(B,lambda1,lambda2) = 0.  And the last one will do so for P(C,lambda1,lambda2) = Q(C,lambda1,lambda2) = 0
Now, if you apply 3 different partitions in 2 parts to any set, you will end up with at most 8 pieces.  So our entire set of couples (lambda1,lambda2) is now cut in 8 pieces, and if we know in which piece a couple falls, we know what will be the results for the 6 functions:
P(A,lambda1,lambda2), P(B,lambda1,lambda2), P(C,lambda1,lambda2), Q(A,lambda1,lambda2), Q(B,lambda1,lambda2), Q(C,lambda1,lambda2).
Each of these functions is constant over each of the 8 different pieces of the set of (lambda1,lambda2) couples (either it is 1 or it is 0).
Now, if we know these 6 values, we know also the 9 values of 
D(A,A,lambda1,lambda2), D(A,B,lambda1,lambda2), D(A,C,lambda1,lambda2) ...
D(C,C,lambda1,lambda2).
Each of these functions is CONSTANT over each of the 8 different pieces of our (lambda1,lambda2) set, because they depend on the P and Q functions which are constant.  We can call these constant values D(X,Y,firstslice), D(X,Y,secondslice) ...
D(X,Y,8thslice)
Now, pick one of these, say, D(A,B,lambda1,lambda2).  This function can only take on at most 8 different values, because we have only 8 different possibilities for P(A,lambda1,lambda2) and Q(B,lambda1,lambda2).  But in fact it can take on only 4, because our 8 different possibilities included P(C,lambda1,lambda2) and this value doesn't enter into the calculation of D(A,B,lambda1,lambda2), so of our 8 different "slices", they will give 2 by 2 the same result (namely, the two slices that only differ for P(C,lambda1,lambda2) will not change the value of D).
Now, if we go back to 
<br />
C(X,Y) = \sum_{(lambda_1,lambda_2)}  D(X,Y,lambda1,lambda2) Pc(lambda1,lambda2)<br />
split the sum over the entire set of couples (lambda1,lambda2) over the 8 different slices:
<br />
C(X,Y) = \sum_{(lambda_1,lambda_2) in first slice}  D(X,Y,lambda1,lambda2) Pc(lambda1,lambda2) + <br />
\sum_{(lambda_1,lambda_2) in second slice}  D(X,Y,lambda1,lambda2) Pc(lambda1,lambda2) + ...<br />
<br />
\sum_{(lambda_1,lambda_2) in 8th slice}  D(X,Y,lambda1,lambda2) Pc(lambda1,lambda2)<br />
But within the first slice, D is constant!  And within the second slice, too...
So we can bring this outside:
<br />
C(X,Y) =  D(X,Y,firstslice) \sum_{(lambda_1,lambda_2) in first slice} Pc(lambda1,lambda2) + <br />
D(X,Y,secondslice) \sum_{(lambda_1,lambda_2) in second slice}  Pc(lambda1,lambda2) + ...<br />
D(X,Y,8thslice)\sum_{(lambda_1,lambda_2) in 8th slice}   Pc(lambda1,lambda2)<br />
And now the sums that remain, are nothing else but the sum of probabilities of each of the (lambda1,lambda2) couples in the first slice (which we call p1), of each of the (lambda1,lambda2) couples in the second slice (which we call p2), ...
So:
<br />
C(X,Y) =  D(X,Y,firstslice) p1 + D(X,Y,secondslice) p2 + ...<br />
D(X,Y,8thslice) p8<br />
But let us look a bit deeper into D(X,Y,firstslice).  In the first slice, we have that P(A,lambda1,lambda2) = 1 = Q(A,lambda1,lambda2)  AND
P(B,lambda1,lambda2) = 1 = Q(B,lambda1,lambda2)  AND
P(C,lambda1,lambda2) = 1 = Q(C,lambda1,lambda2) 
So this means that D(X,Y,firstslice) = 1 for all X and Y !
Now in the second slice, we have that:
P(A,lambda1,lambda2) = 1 = Q(A,lambda1,lambda2)  AND
P(B,lambda1,lambda2) = 1 = Q(B,lambda1,lambda2)  AND
P(C,lambda1,lambda2) = 0 = Q(C,lambda1,lambda2) 
So this means that D(A,B,secondslice) = 1, D(A,C,secondslice) = -1, ...
Etc,...
In fact, we will find that those famous constants of are just 1 or -1, and we can calculate them (using D(X,Y) = (1-2P(X)) (1-2Q(Y)) ) in each slice.  So there aren't even 4 possibilities for D, but only 2!
Given this, it means that we can calculate each of the 9 functions:
C(X,Y) as sums and differences of p1, p2, p3, ... p8.
But of course, we already know the C(A,A) = C(B,B) = C(C,C) = 1, because we imposed this.  If you do the calculation (do it as an exercise!) you will find that each time, they come out to be p1 + p2 + ... + p8 = 1.  That is because D(A,A...) = 1 for all of the slices, and D(B,B,...) = 1 for all of the slices and D(C,C,...) = 1 for all of the slices, as we already deduced before.