Is action at a distance possible as envisaged by the EPR Paradox.

  • #1,201
JesseM said:
Right, only the relative angles matter, each angle is defined relative to an arbitrary choice of coordinate system.

Thanks Jesse, that is how I have pictured it. But... I don’t really get why we talk about entangled photons like up/down spin... if polarization is a result of spin... and they are unpolarized...??

Or is the explanation that polarized light looks something like this (where the electric force moves up and down perpendicular to the ray direction):

[PLAIN]http://www.colorado.edu/physics/2000/polarization/images/electroArrow.gif

And unpolarized light looks something like this:

[PLAIN]http://www.colorado.edu/physics/2000/polarization/images/arrowThickAnim.gif

But why are we talking about up/down spin...
 
Last edited by a moderator:
Physics news on Phys.org
  • #1,202
DevilsAvocado said:
Thanks Jesse, that is how I have pictured it. But... I don’t really get why we talk about entangled photons like up/down spin... if polarization is a result of spin... and they are unpolarized...??
In classical electromagnetism, I think "polarized" light would just be a beam where if you pick the correct angle for your polarizer 100% of the light will pass through, whereas "unpolarized" would mean no matter what angle you set your polarizer, the intensity would be reduced when the beam passes through it. With individual photons, they have a quantum state which determines the probability they'll make it through a polarizer at any given angle--thinking about it some more, I may have been mistaken to say that they'd always have a 50% chance of passing through a polarizer if their polarization hadn't been previously measured, it might be that even though no polarization measurement had ever been made, knowledge of the properties of the source would give you an initial quantum state that would have different probabilities at different angles, I'm not sure exactly how the initial quantum state of an entangled pair would be defined for a given type of source. Anyway, the main point is that once a photon has passed through a polarizer at a given angle, then it's guaranteed with probability 1 to pass through another polarizer at the same angle (or to have a probability 0 of passing through a polarizer at a 90 degree angle to the first) provided nothing is done to it in between, like passing it through a polarizer at a different angle (if you do that it means there is there is now some finite probability it will pass through a polarizer at a right angle to the first, which can be seen in the very counterintuitive Dirac three polarizers experiment where you have two polarizers at right angles that don't allow any light to get through so they look black, but then if you put another polarizer in between them, you see light coming through all three in the area covered by the middle one). And for photons with entangled polarizations, if one member of the pair passes through a polarizer at a given angle, then you can predict with certainty whether the other will pass through a polarizer at the same angle (or at 90 degrees relative to the first).
 
  • #1,203
Thanks Jesse, I have to check out the Dirac experiment and think some more. I'll get back tomorrow.
 
  • #1,204
JesseM said:
No, they don't. The terms in the purely arithmetical inequality are of this form:
(Fraction of all triples with properties A+ and B-)
While the terms in Bell inequalities are of this form:
(Fraction of A,B samples which gave result A+, B-)
Here again you are referring to your strawman inequality, not the inequality I derived for which the terms are exactly the same as Bell's. It's not worth another response. If you are serious about pursuing this, deal with Bell's exact inequality from his original paper, not some toy version which obfuscates the issue.

JesseM said:
If it wasn't for the context I would assume I did understand what this sentence meant--that at a theoretical level we assume the existence of triples, even if we don't assume they're known to the theoretical experimenter
This is just another reason why I say you are confused. You say with the left side of your mouth that you have triples theoretically, then say on your right side that the theoretical experimenter does not have triples. And you attribute such conspiracy to Bell.

Bell did not consider two different theoretical situations. He had one theoretical situation in which properties existed simultaneously for 3 angles. His inequality is derived from this ONLY. There is no mention in his paper about a theoretical experimenter not knowing the third value.

The issue with experimenters not being able to measure simultaneously the third property is a practical issue with data gathering in real actual experiments. So your reference to Bell's later papers where he acknowledges this issue does not change the fact that it does not arise in the derivation of Bell's inequalities.

Without triples, you can not calculate anything comparable to Bell's inequality. For Bell's derivation, this problem is non-existent because he is not considering an actual experiment but a theoretical situation and he in fact simply assumed that a third property existed simultaneously at a third angle and proceeded to derive his inequality. So if you expect me to believe that Bell assumed a theoretical experimenter did not know the third value, and somehow this assumption is very important for the inequality he derived, even though he did not mention it, you are out of luck. In fact, if you must suggest that Bell was dealing with measurements by a theoretical experimenter, then you must also admit that only one of pairs, (a,b) mentioned by Bell is measured and the other two {(a,c), and (b,c)} are deduced from it by theoretical reasoning that there is a third property at angle c! Bell was absolutely not deriving an inequality for a situation in which each pair is measured separately in a different run of the experiment. So if you actually understand Bell's work as you claim to, then this line of argumentation has no other purpose than obfuscation.

So if you don't like my talk about fractions (even though it's completely relevant to other Bell inequalities), you can instead consider the distinction between terms of this type:
(average value of a*b for all triples)
vs. terms of this type:
(average value of a*b for all triples where experimenter sampled a and b)
I have already explained to you why this distinction is artificial for the inequality I derived, and the one Bell derived. The situation may be different for your toy version in which the (a,b,c) do not mean exactly the same thing in each term. But I'm not interested in your toy version. I am only interested in Bell's inequality and the one I derived in which the terms (a,b,c) mean exactly the same thing between terms. In Bell's inequality the the "a" in the first two terms are exactly the same. Same thing for the "b" in the first and last term and same for the "c" in the last two terms. Anything else is not Bell's inequality. The only type of inequality for which your stated difference above exists, is one in which the symbols are different between terms and Bell's inequality is not one of such. Neither is the one I derived. In fact, earlier, you seem to understand this when you said:

JesseM said:
billschnieder said:
Fast forward to then to the resulting CHSH inequality
|E(a,b) + E(a,b') + E(a',b) - E(a',b')| <= 2

In your opinion then, is the P(λi) the same for each of the above terms, or do you believe it doesn't matter.
The same probability distribution should apply to each of the four terms, but the inequality should hold regardless of the specific probability distribution (assuming the universe is a local realist one and the specific experimental conditions assumed in the derivation apply).
Are you trying to recant that admission, or is this new line of argumentation just for argument sake?

If you think the terms in my inequality are different from Bell's explain it using my inequality and Bell's rather than picking two strawmen inequalities of your own in which the terms differ. Why do you shy away from using the directly relevant inequalities?! I refuse to discuss a contrived strawman when you could have simply used the directly relevant inequality.

In your inequality, does P(b,c) refer to "average value of b*c for all triples where experimenter sampled b and c"? If it does, then it's not hard to find a set of triples that violates your inequality. And if it doesn't, then no, the terms in your inequality don't mean the same thing as those in Bell's.

1 + <bc> = |<ab> - <ac>|

This is only guaranteed for a situation in which a dataset of triples can be obtained. If you start off with triples like Bell, there is no problem. But if you start off with datasets of pairs, the above can only be guaranteed if the pairs can be resorted to obtain a dataset of triples. It doesn't mean you need to resort it in order to calculate the terms. It just means being able to resort the data is evidence that the symbols are equivalent. It is just another way of saying the symbols ("a", "b" and "c") mean exactly the same thing from term to term.

Once you have this triple, there is no distinction between "average value of b*c for all triples" and "average value of b*c for all triples where experimenter sampled b and c", It doesn't matter matter how you obtained the triples, whether you started directly with triples, or you resorted the separate pairs. Your distinction between the two is so ridiculous I wonder why you keep insisting on it. If an experimenter measured a certain number of b and c, say M iterations:
- average value of b*c for all triples is:
\frac{1}{M}\sum_{i}^{M} a_{i}b_{i}

- average value of b*c for triples for which the experimenter measure b*c is:
\frac{1}{M}\sum_{i}^{M} a_{i}b_{i}

Or do you expect "all" in the first case to mean the experimenter can calculate an average over values he did not measure? Note also that you are trying to force a distinction where there is none, in an attempt to imply that my inequality is different from Bell's inequality. So if you think "all" in the first case means more cases than were measured, state clearly which case corresponds to Bell's and which one to mine. Is it your claim that Bell's inequality involves averaging over unmeasured terms (an impossibility), or is it your claim that my inequality involves averaging over unmeasured terms? And when you answer that, also answer whether you think actual experimenters ever average over unmeasured terms.


JesseM said:
Sure there's a difference. Suppose our dataset consisted only of the five you mention, and that for each iteration the pair measured was as follows:

a b c
1: + + - (measured a,b)
2: + - + (measured b,c)
3: + - - (measured a,c)
4: - + - (measured a,b)
5: - - + (measured b,c)
What you present above are dataset of pairs from the measurements. We are interested in what was measured. If it wasn't measured, the experimenter does not have it and can not calculate from it. So let us examine this. For clarity and following from the example you were responding to here are the three datasets of pairs

a b
1:+ +
4:- +

b c
2:- +
5:- +

a c
3:+ -

As you can see already, it is not possible to apply this data to Bell's inequality because we can not sort it in order to obtain a dataset of triples. We can not sort by "b" because the two lists of b's are completely different, same for "a" and "c".
The first term involving ab, is calculated with only positive b terms, the second term with only negative b terms, so each symbol (a,b,c) means something different from term to term. This type of data is not guaranteed to obey Bell's inequality nor the one I derived. What your example shows clearly is the fact that it is possible to violate Bell's inequality using a dataset of pairs (my claim 3) UNLESS it is also possible to sort the dataset of pairs to generate a dataset of triples (my claim 1).

Is it your claim that Bell's inequality is supposed to apply to this kind of data as well? If that is what you believe please say so clearly.
 
  • #1,205
billschnieder said:
Here again you are referring to your strawman inequality, not the inequality I derived for which the terms are exactly the same as Bell's. It's not worth another response. If you are serious about pursuing this, deal with Bell's exact inequality from his original paper, not some toy version which obfuscates the issue.
The inequality is neither a strawman nor a "toy version", as I already pointed out:
JesseM said:
You didn't make clear at the outset that "the form being discussed" was the one in his original paper, in this recent discussion of ours I was the first one to bring up a specific mathematical inequality, first in post #1171 where I quoted a paper from Bell and then again in post #1176 where I talked about

Number(A, not B) + Number(B, not C) greater than or equal to Number(A, not C)

Then in post #1179 I again referred to that inequality, showing that the purely arithmetic version of the inequality can't be violated by a series of triples, but a Bell-type inequality with the same equation can be. It wasn't until post #1182 that you brought up the inequality |ab+ac|-bc <= 1. It's not really fair that you should have total control over the terms of the discussion in this way, but as seen above I'm fine with discussing this inequality too. Still it's a bit much that you now accuse me of an attempt at obfuscation because I brought up a specific example in what had previously been an overly abstract discussion, and then I didn't immediately drop that example when you brought up a slightly different one.

Also, the inequality I mention is hardly "obscure", if you didn't have a single-minded interest in Bell's original paper only and instead looked at discussions of Bell's inequality by other authors, you'd see that this inequality is mentioned more often in introductory discussions of Bell's proof than the one in the original paper, perhaps because it's so much simpler to see how it's derived (I gave a quick derivation in post #1179 when I said 'the proof is trivial--every triplet with A+ and C- must either be of type A+B+C- or type A+B-C-, and if the former it will also contribute to the number with B+ and C-, if the latter it will also contribute to the number with A+ and B-'). I already gave a link to one website which uses it as a starting point, and wikipedia refers to this inequality as http://en.wikipedia.org/wiki/Sakurai's_Bell_inequality]Sakurai's[/PLAIN] Bell inequality because it appeared in Sakurai's widely-used 1994 textbook on QM (the wikipedia article mentions a number of other well-known papers and books on Bell's proof that have used it).
billschnieder said:
This is just another reason why I say you are confused. You say with the left side of your mouth that you have triples theoretically, then say on your right side that the theoretical experimenter does not have triples. And you attribute such conspiracy to Bell.

Bell did not consider two different theoretical situations. He had one theoretical situation in which properties existed simultaneously for 3 angles. His inequality is derived from this ONLY. There is no mention in his paper about a theoretical experimenter not knowing the third value.
As I said before, his original paper was written for an audience of scientists, the argument was fairly condensed and certain things were left implicit because he assumed the audience would understand. Reading the paper carefully, any physicist would understand that when he writes terms like P(a,b), he is referring to the expectation value for a pair of measurements on an entangled pair with detectors setting a and b (and each result being +1 or -1), which is equivalent to the average measurement result over a very large (approaching infinity) series of measurements with detector settings a and b.

Note that he does refer explicitly to a pair of measurements on the first page:
Measurements can be made, say by Stern-Gerlach magnets, on selected components of the spins \sigma_1 and \sigma_2. If measurement of the component \sigma_1 \cdot a, where a is some unit vector, yields the value +1 then, according to quantum mechanics, measurement of \sigma_2 \cdot a must yield the value -1 and vice versa
Do you doubt that here he is talking about a single pair of measurements on a single pair of particles, rather than averages or "resorted" pairs of measurements taken from two distinct pairs of entangled particles, since that's the only case where the results are guaranteed to be +1 and -1? If you don't disagree with this, note where he goes on to say that this implies that "the result of any such measurement must actually be predetermined", the implication here is that if we are choosing between three measurement angles 1, 2, 3, then any given pair of entangled particles must have a triplet of "predetermined" measurement results for each angle. He goes on to say that the parameters predetermining these measurement results can be encapsulated in the variable λ, and that:
The result A of measuring \sigma_1 \cdot a is then determined by a and λ, and the result B of measuring \sigma_2 \cdot b in the same instance is determined by b and λ, and

A(a,λ) = ±1, B(b,λ) = ±1

So here he clearly is talking about a pair of measurement results (by a hypothetical experimenter or team of experimenters), given the assumption that the two results are determined by the two detector angles a and b and the value of λ which represents all the hidden variables with that single pair of entangled particles (where each specific value of λ gives a triplet of 'predetermined' results if the experimenters have three possible detector angles they're choosing from). Then he goes on to say:
If \rho(\lambda) is the probability distribution of λ then the expectation value of the product of the components \sigma_1 \cdot a and \sigma_2 \cdot b is

P(a,b) = \int d\lambda \rho(\lambda) A(a,\lambda)B(b,\lambda) (2)
So remembering that A(a,λ) and B(b,λ) each represented a "result" of "measuring" a member of an entangled pair, with detector angles a and b respectively, you can tell from this integral that he's calculating an "expectation value" (his words) for the product of a pair of measurements (by a hypothetical experimenter or team of experimenters). In general, if you have some finite number N of possible results Ri for a given measurement, and you know the probability P(Ri) for each result, the "expectation value" is just:

E = \sum_{i=1}^N R_i * P(R_i )

If you perform a large number of measurements of this type, the average result over all measurements should approach this expectation value.

If we imagine that λ can only take a finite set of values, so we can write a discrete version of Bell's integral (2) above, it's more clear why it has the form of an expectation value:

\sum_{i=1}^N [A(a,\lambda_i)*B(b,\lambda_i)] * P(\lambda_i)

...so if you perform a large number of measurements with detector angles a and b, and for each trial/iteration you calculate the product of your pair of measurement results (assumed to be determined by the value of λ which is assumed to give a triplet of predetermined results for the three possible detector angles you're choosing from), then if you take the average of the product of the two measurement results over all these trials/iterations with detector angles a and b, it should approach the "expectation value". This is why the inequality 1 + P(b,c) >= |P(a,b) -P(a,c)| can be understood as a prediction that theoretical experimenters in a theoretical universe with local realist laws should see, in the limit as the number of trials/iterations with each pair of detector angles becomes very large, that

1 + (average value of product of measurement results for all particle pairs where experimenters used detector angles b and c)
>= |(average value of product of measurement results for all particle pairs where experimenters used detector angles a and b) - (average value of product of measurement results for all particle pairs where experimenters used detector angles a and c)|

Bell does make the theoretical assumption that in a local realist universe, the fact that they always get opposite results when they choose the same detector angle implies that each particle pair was associated with a λ that gave it a triple of predetermined results for all three angles a,b,c. But this is just an assumption made in the derivation of the inequality, the inequality itself deals only with expectation values for pairs of measurement results seen by the theoretical experimenters on each trial/iteration of the experiment.

If you think my interpretation of his words and equations is incorrect (and I guess you probably will since you always find some reason to disagree with whatever I say, but as I said to DevilsAvocado I'm mainly writing for the purpose of showing other readers why your claims don't make sense), then please point out precisely where, and give your own interpretation of whatever quote/equation you think I have misinterpreted.
billschnieder said:
The issue with experimenters not being able to measure simultaneously the third property is a practical issue with data gathering in real actual experiments. So your reference to Bell's later papers where he acknowledges this issue does not change the fact that it does not arise in the derivation of Bell's inequalities.
Well, see above. The assumption of triples is used in the derivation, but the final inequality he derives concerns only expectations about pairs of measurement results, which is why it can be checked against actual real-world measurement results even though we can never measure more than two angles for a given entangled pair.
billschnieder said:
Without triples, you can not calculate anything comparable to Bell's inequality. For Bell's derivation, this problem is non-existent because he is not considering an actual experiment but a theoretical situation
The theoretical situation concerns expectation values in a theoretical series of measurements. If this wasn't the case there would be no way to make a theoretical comparison with the expectation values in QM, since QM only gives expectation values for measurement results, not for any hidden variables.
billschnieder said:
In fact, if you must suggest that Bell was dealing with measurements by a theoretical experimenter, then you must also admit that only one of pairs, (a,b) mentioned by Bell is measured and the other two {(a,c), and (b,c)} are deduced from it by theoretical reasoning that there is a third property at angle c!
No. In equation (13) he deduces from the fact that the experimenters always get opposite results when they choose the same angle that A(a,λ)=-B(a,λ) (and since a can stand for any angle, it naturally follows from this that A(b,λ)=-B(b,λ) and A(c,λ)=-B(c,λ)). This means that equation (2) which I quoted earlier could be rewritten as:

P(a,b) = -\int d\lambda \rho(\lambda) A(a,\lambda)A(b,\lambda)

And by the same token, you can see from the equation for P(a,b) - P(a,c) at the top of p. 406 that he is assuming P(a,c) is derived theoretically in exactly the same way:

P(a,c) = -\int d\lambda \rho(\lambda) A(a,\lambda)A(c,\lambda)

So just like P(a,b), P(a,c) is an "expectation value" for the product of two measurements with the detectors set to angles a and c, and as I already pointed out, any "expectation value" can be understood as the average for a very large number of measurements of the desired quantity (i.e. 'the product of two measurements with detectors set to angles a and c).

Then a few lines down he writes an equation whose right side is \int d\lambda \rho(\lambda) [1 - A(b,\lambda)A(c,\lambda)] and then says "The second term on the right is P(b,c)", which indicates he is also assuming that

P(b,c) = -\int d\lambda \rho(\lambda) A(b,\lambda)A(c,\lambda)

So, what I just said about P(a,c) also applies to P(b,c).
billschnieder said:
Bell was absolutely not deriving an inequality for a situation in which each pair is measured separately in a different run of the experiment.
Oh, but he absolutely was, and if you ask any other non-crackpot who is knowledgeable about Bell's theorem (DrChinese, say) I'm sure they'll tell you the same thing. I'm pretty sure I could also find you other papers on Bell's theorem, by other physicists or perhaps Bell himself, which would make more clear that this is widely understood as the physical meaning of expectation values that appear in Bell inequalities--would you like me to try, or are you going to stick with the fundamentalist strategy of only looking at one holy text in isolation, ignoring any wider context (like the understanding of other physicists through the years) that might make more clear the meaning of any ambiguous parts?
billschnieder said:
The situation may be different for your toy version in which the (a,b,c) do not mean exactly the same thing in each term. But I'm not interested in your toy version. I am only interested in Bell's inequality and the one I derived in which the terms (a,b,c) mean exactly the same thing between terms. In Bell's inequality the the "a" in the first two terms are exactly the same.
"a" is just a detector angle rather than a result like +1 or -1, the text makes that clear, so of course it means the same thing everywhere. But P(a,b) is an expectation value (he called it that himself), which can be understood as the average value of the product of two measurements on a pair of entangled particles with detectors at angles a and b, in the limit as the number of particle pairs measured in this way goes to infinity.
billschnieder said:
The only type of inequality for which your stated difference above exists, is one in which the symbols are different between terms and Bell's inequality is not one of such.
The symbols a,b,c refer to angles and so don't have different meanings between terms, but each of P(a,b) and P(b,c) and P(a,c) is an expectation value, and to connect that to real or theoretical measurements you have to imagine P(a,b) is the average of the product of results in a run with detectors at angles a and b, P(b,c) is the average for a run with detectors at angles b and c, etc. If you argue this point you're not just arguing with me, you're arguing against the interpretation physicists have had for years about what the inequality is predicting about measurement results, an interpretation which Bell could have corrected if he disagreed with it (and if we looked through enough of his writings I bet we could find explicit confirmation this was his interpretation of the meaning of the terms as well)
billschnieder said:
Neither is the one I derived. In fact, earlier, you seem to understand this when you said:
JesseM said:
billschnieder said:
Fast forward to then to the resulting CHSH inequality
|E(a,b) + E(a,b') + E(a',b) - E(a',b')| <= 2

In your opinion then, is the P(λi) the same for each of the above terms, or do you believe it doesn't matter.
The same probability distribution should apply to each of the four terms, but the inequality should hold regardless of the specific probability distribution (assuming the universe is a local realist one and the specific experimental conditions assumed in the derivation apply).
Are you trying to recant that admission, or is this new line of argumentation just for argument sake?
Why do you think that contradicts anything I have been saying recently? If P(λi) is the same for each of the above terms, that just means the frequencies of getting different values of λi on a near-infinite run of trials with detector settings a and b should be the same as the frequencies of different values of λi on a near-infinite run of trials with detector settings a and b', and so forth. For example, if on the first run with detectors set to a and b it was true (though not known to the experimenters) that 2.3% of trials/iterations had hidden variables described by λ1 and 3.8% of trials/iterations had hidden variables described by λ2, then we are making the theoretical assumption that on the second run with detectors set to a and b' it was also true that 2.3% of trials/iterations had hidden variables described by λ1 and 3.8% of trials/iterations had hidden variables described by λ2. So in no way does this contradict the idea that each expectation value concerns a different run of trials.
billschnieder said:
If you think the terms in my inequality are different from Bell's explain it using my inequality and Bell's rather than picking two strawmen inequalities of your own in which the terms differ.
I have done that several times, whenever I point out that the terms in your inequality have a meaning of this type (with the understanding that here I use notation like b*c to refer not to the product of two detector angles, but the product of the predetermined results +1 or -1 for b and c in a given triple):

1 + (average value of b*c for all triples)
>= |(average value of a*b for all triples) - (average value of a*c for all triples)|

while the terms in Bell's inequality have a meaning of this type

1 + (average value of b*c for all triples where experimenter sampled b and c)
>= |(average value of a*b for all triples where experimenter sampled a and b) - (average value of a*c for all triples where experimenter sampled a and c)|
 
Last edited by a moderator:
  • #1,206
(continued)

billschnieder said:
1 + <bc> = |<ab> - <ac>|

This is only guaranteed for a situation in which a dataset of triples can be obtained. If you start off with triples like Bell, there is no problem. But if you start off with datasets of pairs, the above can only be guaranteed if the pairs can be resorted to obtain a dataset of triples.
No, there is another way besides your bizarre notions about "resorting". An inequality of this type:

1 + (average value of b*c for all triples where experimenter sampled b and c)
>= |(average value of a*b for all triples where experimenter sampled a and b) - (average value of a*c for all triples where experimenter sampled a and c)|

Is obviously not guaranteed to hold for an arbitrary list of triples with a choice of which pair were measured for each triple, but it will hold if you make two additional assumptions

1. The subset of triples (the 'run') where experimenter sampled b and c is very large (approaching infinity), and likewise for the subset where experimenter sampled a and b, and the subset where experimenter sampled a and c

2. the process that generates the list of triples for each subset has the same probability of generating a given triple (like a=+1, b=-1, c=+1) for each new entry on the list, regardless of which two measurements are made in that subset

With these two additional assumptions you do have a basis for deriving an inequality of the form I wrote, despite the fact that each term deals with averages for a different subset of triples, rather than each term being based on the same set of triples.
billschnieder said:
It doesn't mean you need to resort it in order to calculate the terms. It just means being able to resort the data is evidence that the symbols are equivalent. It is just another way of saying the symbols ("a", "b" and "c") mean exactly the same thing from term to term.
I still don't know what you mean by "mean exactly the same thing from term to term". a, b and c are just placeholders, for each triple each one can take value +1 or -1, for example in the first triple on your list you might have a=+1 while on the second triple you might have a=-1. Do you just mean that each term deals with averages from exactly the same list of triples, rather than each term dealing with averages from a separate list of triples?
billschnieder said:
Once you have this triple, there is no distinction between "average value of b*c for all triples" and "average value of b*c for all triples where experimenter sampled b and c"
I don't get how you can say "no distinction" when I gave you a clear example of what I meant by this:
a b c
1: + + - (measured a,b)
2: + - + (measured b,c)
3: + - - (measured a,c)
4: - + - (measured a,b)
5: - - + (measured b,c)

In this case, "average value of a*b for all triples" = [(value of a*b for #1) + (value of a*b for #2) + (value of a*b for #3) + (value of a*b for #4) + (value of a*b for #5)]/5 =
[(+1) + (-1) + (-1) + (-1) + (+1)]/5 = -1/5

On the other hand, "average value of a*b for all triples for which the experimenter measured a and b" would only include triple #1 and triple #4, so it'd be [(value of a*b for #1) + (value of a*b for #4)]/2 = [(+1) + (-1)]/2 = 0.
Your response consisted of somehow saying that if the theoretical experimenter only sampled pairs, then this was really a "list of pairs" despite the fact that they were drawn from triples which we (playing the role of an omniscient being looking down on the lowly human experimenter) do know. But in that case I have no idea what you could possibly mean by the phrase "average value of b*c for all triples where experimenter sampled b and c", if you don't mean something like what I did above (you must have something definite in mind or you hopefully wouldn't have said there was 'no difference' between this and 'average value of b*c for all triples') So can you explain how you interpret the phrase "average value of b*c for all triples where experimenter sampled b and c", preferably with a simple example like mine above?

Anyway, I think you now understand what I mean when I say "(average value of b*c for all triples where experimenter sampled b and c)", so even if you don't like my phrasing I'll ask you not to willfully misread me by substituting in the meaning you think that phrase "should" have. Hopefully you now agree that an inequality like this:

1 + (average value of b*c for all triples where experimenter sampled b and c)
>= |(average value of a*b for all triples where experimenter sampled a and b) - (average value of a*c for all triples where experimenter sampled a and c)|

...cannot be derived from arithmetic alone, although with some additional theoretical assumptions like the one that says a given triple is equally likely to occur regardless of what the experimenter sampled, you can derive it (and that's exactly what derivations of Bell inequalities do).
billschnieder said:
It doesn't matter matter how you obtained the triples, whether you started directly with triples, or you resorted the separate pairs.
No one but you would interpret the terms of Bell's inequality in terms of "resorting" experimental data on pairs (whether theoretical experiments or actual experiments) to create triples, that's just a weird misconception you probably got from Da Raedt's paper. Trust me, no mainstream physicist who has ever done their own derivation of Bell's theorem was ever thinking in terms of that kind of resorting (i.e. multiplying +1's and -1's from different trials/iterations). If they thought about how the terms would relate to experimental data at all (as opposed to just thinking of them as abstract 'expectation values' which can be compared to quantum-mechanical expectation values), they were thinking of something along the lines of my "(average value of b*c for all triples where experimenter sampled b and c)".
billschnieder said:
Your distinction between the two is so ridiculous I wonder why you keep insisting on it. If an experimenter measured a certain number of b and c, say M iterations:
- average value of b*c for all triples is:
\frac{1}{M}\sum_{i}^{M} a_{i}b_{i}

- average value of b*c for triples for which the experimenter measure b*c is:
\frac{1}{M}\sum_{i}^{M} a_{i}b_{i}

Or do you expect "all" in the first case to mean the experimenter can calculate an average over values he did not measure?
No, because when I say "(average value of b*c for all triples) I'm not talking about what the experimenter calculates at all, I'm just dealing with a model where we take the role of an omniscient being who knows the value of all triples even though the hypothetical experimenter does not. If you object to this, just remember that Bell's whole proof is based on figuring out some constraints on what would be calculated if we could know impossible-to-know-in-practice facts like the \rho(\lambda) (under the assumption that there is some objective truth about such things, whether experimenters know it or not).
billschnieder said:
Note also that you are trying to force a distinction where there is none, in an attempt to imply that my inequality is different from Bell's inequality. So if you think "all" in the first case means more cases than were measured
It just means "all" the triples. It doesn't matter whether the triples are assumed to represent the real truth about predetermined results for all three angles on a single trial/iteration involving a single pair of particles, or whether the triples are weird Frankenstein monsters created be stitching together measurements from two or more different pairs of particles (your idiosyncratic 'resorting' idea, which again is not what any mainstream physicists are thinking of when they write down Bell inequalities).
billschnieder said:
Is it your claim that Bell's inequality involves averaging over unmeasured terms (an impossibility), or is it your claim that my inequality involves averaging over unmeasured terms? And when you answer that, also answer whether you think actual experimenters ever average over unmeasured terms.
"no" to all of the above. Again, your inequality is of this form:

1 + (average value of b*c for all triples)
>= |(average value of a*b for all triples) - (average value of a*c for all triples)|

...but I understand that you aren't talking about triples representing all three predetermined values on a single trial/iteration (since all three can't be measured), but rather about Frankentriples created by "resorting". Meanwhile, the terms in Bell's inequality are expectation values, so for a large number of trials/iterations they can be understood as:

1 + (average value of b*c for all triples where experimenter sampled b and c)
>= |(average value of a*b for all triples where experimenter sampled a and b) - (average value of a*c for all triples where experimenter sampled a and c)|

Here the "triples" are not known by the experimenters, only the value for b and c is known on trials/iterations where b and c were sampled, etc. So, you could rewrite Bell's inequality as:

1 + (average value of b*c for trials/iterations where experimenter sampled b and c)
>= |(average value of a*b for trials/iterations where experimenter sampled a and b) - (average value of a*c for trials/iterations where experimenter sampled a and c)|

However, the assumption that there are triples associated with each particle even if we don't know them (and that the probability of a given triple occurring each time does not depend on which pair are sampled) is important to deriving the inequality, though
billschnieder said:
What you present above are dataset of pairs from the measurements. We are interested in what was measured. If it wasn't measured, the experimenter does not have it and can not calculate from it.
No, but we can derive statistical constraints on what the experimenters will see based on the assumption that their results are coming from a set of preexisting triples, even if we don't know the value of all three--that's what derivations of Bell inequalities are all about.
billschnieder said:
So let us examine this. For clarity and following from the example you were responding to here are the three datasets of pairs

a b
1:+ +
4:- +

b c
2:- +
5:- +

a c
3:+ -

As you can see already, it is not possible to apply this data to Bell's inequality because we can not sort it in order to obtain a dataset of triples.
Although the assumption of triples is involved in deriving Bell's inequality, to check whether data satisfies the inequality or not we don't need a "dataset of triples", this is just your weird misconception. P(a,b) is the expectation value for the product of two measurement results with detectors set to a and b, so we'd take a dataset of pairs which each represent two measurements on a pair of entangled particles with detectors set to a and b, and calculate the average of each pair. Likewise for P(b,c) and P(a,c). That's what everyone understands a test of Bell's inequality against real data to involve, no one thinks in terms of constructing artificial Frankentriples. Think about it: if they did first use the data to construct a single list of triples and then calculate 1 + (average value of b*c for all triples)
>= |(average value of a*b for all triples) - (average value of a*c for all triples)|, it would be mathematically impossible for such an inequality to be violated by a single list of triples, and yet experimenters report violations of Bell inequalities all the time!
billschnieder said:
This type of data is not guaranteed to obey Bell's inequality nor the one I derived.
Yes it is, you just have to add some additional assumptions beyond just the idea that each data pair was obtained from a triple of preexisting values. I mentioned the assumptions at the top of this post. And to get back to the start, this is why your (1) is wrong--Bell's inequality is not the type of purely arithmetic inequality you're thinking of, it's an inequality dealing with pairs, and additional assumptions beyond basic arithmetic are used to derive it.
 
Last edited:
  • #1,207
JesseM said:
which can be seen in the very counterintuitive Dirac three polarizers experiment where you have two polarizers at right angles that don't allow any light to get through so they look black, but then if you put another polarizer in between them, you see light coming through all three in the area covered by the middle one

Yes, this is cool and we can run this http://www.lon-capa.org/~mmp/kap24/polarizers/Polarizer.htm" to verify. First set the 3 polarizers to:

Ang1 = 90
Ang2 = 90
Ang3 = 0

0.0% light will get thru. Now change to:

Ang1 = 90
Ang2 = 45
Ang3 = 0

12.50% light will get thru!

JesseM said:
In classical electromagnetism, I think "polarized" light would just be a beam where if you pick the correct angle for your polarizer 100% of the light will pass through, whereas "unpolarized" would mean no matter what angle you set your polarizer, the intensity would be reduced when the beam passes through it. With individual photons, they have a quantum state which determines the probability they'll make it through a polarizer at any given angle

Of course you’re right. It was a mistake by me to bring in the http://en.wikipedia.org/wiki/Wave-particle_duality" ... we have enough "perplexity" in this thread already:smile:, sorry.

Spin of light beams is one thing. Spin of photons another...

250px-Poincar%C3%A9_sphere.svg.png


I probably get back on this, but...

JesseM said:
thinking about it some more, I may have been mistaken to say that they'd always have a 50% chance of passing through a polarizer if their polarization hadn't been previously measured, it might be that even though no polarization measurement had ever been made, knowledge of the properties of the source would give you an initial quantum state that would have different probabilities at different angles, I'm not sure exactly how the initial quantum state of an entangled pair would be defined for a given type of source.

I did think this thru once more, and afaict they must always have a 50% chance, no matter what... otherwise there’s an obvious risk of FTL messaging.

Let’s say that we set Alice at 22.5º and Bob at 0º, but we decide not to measure Bob’s photons. If we run 6 pairs of entangled photons, we could get something like this for Alice:

Code:
	[B]Angle	Corr.	Measure[/B]
--------------------------------
[B]Alice[/B]	22.5º	?	101010

Now, if we had the possibility to do time travel, and could rewind the experiment, we would see that Bob’s measurement must have looked something like this:

Code:
	[B]Angle	Corr.	Measure[/B]
--------------------------------
[B]Alice[/B]	22.5º	85%	101010
[B]Bob[/B]	0º	85%	101011

(cos^2(22.5) = 85% ≈ 5/6)

Now, let’s say we do not always have the 50% random probability, we could get "tidy" results like this, and thereby determine if Bob are measuring his photons, or not, thus it would provide a mechanism for FTL messaging...

Code:
	[B]Angle	Corr.	Measure[/B]
--------------------------------
[B]Alice[/B]	22.5º	85%	111111
[B]Bob[/B]	0º	85%	111110

All this is of course extremely simplified, and will only be valid on a large sampling of photons.

Agree? Or do you see any weakness in my reasoning...?
 
Last edited by a moderator:
  • #1,208
JesseM said:
P(a,b), he is referring to the expectation value for a pair of measurements on an entangled pair with detectors setting a and b (and each result being +1 or -1), which is equivalent to the average measurement result over a very large (approaching infinity) series of measurements with detector
Note the underlined texts as we will come back to it. Now let us consider our previous discussion about this in post #857.

JesseM said:
billschnieder said:
Is the equation as it stands indicating that the numerical value represents what is obtained by measuring a specific pair of settings (ai, bi) a large number of times, or is it indicating that expectation value is what will be obtained my measuring a large number of different pairs of angles (ai,bi)?
The first, I think he's calculating the expectation value for some specific pair of settings. If he wanted to talk about the expectation value for a variety of different ai's I think he'd need to have a sum over different values of i in there.
billschnieder said:
So then, let us consider a specific pair of settings (a, b), and presume that we have calculated an expectation value from equation (2) of Bell's paper, say E(a,b). From what you have explained above, there is going to be a specific probability distribution P(λi) over which E(a,b) was obtained, since the corresponding P(AB|ab) which you obtained your E(a,b) from, was obtained by marginalizing over a specific P(λi) . Do you agree?

billschnieder said:
Fast forward to then to the resulting CHSH inequality
|E(a,b) + E(a,b') + E(a',b) - E(a',b')| <= 2
In your opinion then, is the P(λi) the same for each of the above terms, or do you believe it doesn't matter.
The same probability distribution should apply to each of the four terms, but the inequality should hold regardless of the specific probability distribution (assuming the universe is a local realist one and the specific experimental conditions assumed in the derivation apply).

"billschnieder said:
So then, if it was found that it is possible in a local realist universe for P(λi) to be different for at least one of the terms in the inequality, above, then the inequality will not apply to those situations where P(λi) is not the same. In other words, the inequalities above are limited to only those cases for which a uniform P(λi) can be guaranteed between all terms within the inequality. Do you disagree?

If you remember, our previous discussion fell apart at the point where you refused to give a straight answer to the last question above.

You say, Bell is referring to the measurement of AN entangled pair with detectors set at a and b. I agree. You also say, in order to obtain the expectation value for this pair of angles, Bell integrates over all λi, so that there is a λi probability distribution. I agree also. This is precisely why I asked you all those questions earlier and you also agreed with me that this λi probability distribution must be exactly the same for all expectation value terms in Bell's inequality.

Now please pay attention and make sure you actually understand what I am saying next before you respond.
The reason why the probability distribution of λi must be the same is the following (using the equations you presented, except using E for expectation to avoid confusion with Probability notation).
E(a,b) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(b,\lambda )
E(a,c) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(c,\lambda )
E(b,c) = -\int d\lambda\rho (\lambda )A(b,\lambda )A(c,\lambda )

Note a few things about the above. There are two factorable terms inside the integral, one for each angle. You can visualize this integral in the following descrete way. We have a fixed number of λi, say (λ1, λ2, λ3, ... λn). To calculate the integral, we multiply A(a,λ1)A(b,λ1)*P(λ1) and add it to A(a,λ2)A(b,λ2)*P(λ2) ... all the way to λn. In other words, the above will not work if we did A(a,λ1)A(b,λ5)*P(λ3) or any such.

Secondly, once we have our inequality:

|E(a,b) - E(a,c)| - E(b,c) <= 1

To say the probability distribution of λi must be the same means that, if we obtained E(a,b) by integrating over a series of λi values, say (λ1, λ2, λ4), the same must apply to E(a,c) and E(b,c). In other words, it is a mathematical error to use E(a,b) calculated over (λ1, λ2, λ4), with E(a,c) calculated over (λ6, λ3, λ2) and E(b,c) calculated over (λ5, λ9, λ8) in the above inequality, because in that case ρ(λi) will not be the same across the terms the way Bell intended and we agree that he did. Note also that even if the set of λ's is the same, we still need each λ to be sampled the exact same number of times for each term.

Now what I have just describe here are the specific experimental conditions that should apply for Bell's inequality to be applicable to data obtained from any experiment.

This brings us to the sorting I mentioned earlier which you are having difficulty with.
Suppose in any actual experiment, the experimenter also had along side each pair of measurements in each run, the specific value of λ for that run. He will now have a long list of pairs of +'s and/ -'s plus one indexed λ each. Such that for the three runs of the experiment he will have three lists which look something similar to the following, except the actuall sequence of +'s and/ -'s and λ's will be different

+ - λ1
- + λ9
+ + λ6
- + λ3
...

In such a case, it will be easy to verify if his data meets the requirement that ρ(λi) is the same for each term, as you agreed to previously. He could simply sort each of the three lists according to the λ column and compare if the λ column from all three runs are the same. If they are not, ρ(λi) is different and Bell's inequality can not be applied to the data for purely mathematical reasons. In other words, if they insisted to calculate the LHS of the inequality with that data, the inequality is not guaranteed to be obeyed, for purely mathematical reasons.

(Note I am using the term "run" here to describe the three lists of already separated out data. ie, run one constitutes all the data used for calculating the E(a,b) term, run 2 the E(a,c) etc even though the experimenters may have been doing random switching from angle to angle.)

However, experimenters do not have the λ's so how can they make sure their data is compatible? If it is assumed that each specific λ contains all properties that will deterministically result in the outcome, then we do not need the λs to sort our data. We can just sort the actual result pairs so that the "a" colum of the (a,b) pair matches the "a" column of the (a,c) pair and the "b" and "c" columns also match. If we can do that, then we can be sure that ρ(λi) is the same for all three terms of the inequality and Bell's inequality should apply to our data. If we can not, it means ρ(λi) is different, and the data is mathematically not compatible with the inequality.

Let us look at this slightly differently. Consider our first list which included the λ's. After sorting all three runs by the λ's we will find that we only need three columns of +'s and/ -'s out of the 6 (2 from each run). This is because each column will be duplicated. This simply means for each λ, there are 3 simultaneously existing properties at the angles.

Now, what if instead of collecting three runs of pairs we collected a single run of triples so that the data from our experiment is
a b c
+ - + λ1
- + + λ9
+ + - λ6
- + + λ3
...

We do not need any sorting here because we can calculate all our terms from the same single run with the same ρ(λi). So we can compare ANY dataset of this type with Bell's inequality. Note, this is not the same as saying we can do the same thing even if we only measured pairs so long as triples are assumed to exist. Of course triples are assumed to exist. That is what gave us the inequalities. We are only interested now in the question of whether our dataset obtained in an experiment can fulfil the requirement of uniform ρ(λi). However, since it is not possible to measure triples in any experiment, the requirement to be able to sort the dataset applies to all datasets involving multiple runs of pairs.

Now, let us go back to the underlined text above. Since you agreed with me that ρ(λi) must be the same for each term in the inequality, how do you make sure of that in an experiment? Is that what you were alluding to with the underlined text: "which is equivalent to the average measurement result over a very large (approaching infinity) series of measurements"? In other words, why is it important that the number of measurements be very large? Please I need a specific answer to this question, assuming you are still willing to contest this issue after my very detailed explanation above.

As an aside:
You seem to have an issue with my use of

| <ab> + <ac> | - <bc> <= 1

In which I have replaced E(a,b) in Bell's notation with <ab> in mine. Where a,b represent the outcomes at angles a and b and I was referring to the fact that in calculating the averages, it is not allowed for the list of a's in the first term to contain a different number of +'s and/ -'s from that in the second term and same for "c" and "b".
You objected and said:
"a" is just a detector angle rather than a result like +1 or -1, the text makes that clear, so of course it means the same thing everywhere. But P(a,b) is an expectation value (he called it that himself), which can be understood as the average value of the product of two measurements on a pair of entangled particles with detectors at angles a and b, in the limit as the number of particle pairs measured in this way goes to infinity.
But then later, you used exactly the same notation.
I have done that several times, whenever I point out that the terms in your inequality have a meaning of this type (with the understanding that here I use notation like b*c to refer not to the product of two detector angles, but the product of the predetermined results +1 or -1 for b and c in a given triple)
This tactic of yours combined with lack of willingness to actually understand the opposing view, combined with a severe case of irrelevant argumentum ad verbosium, is the reason I do not take you seriously.
 
  • #1,209
JesseM, I’m sorry to say that if it continues this way, I probably have to charge you some kind of "https://www.physicsforums.com/showpost.php?p=2825463&postcount=1192""... :biggrin:

"a severe case of irrelevant argumentum ad verbosium"​
This very fine and sophisticated grievance can only be deduced to a severe case of argumentum ad hominem abusive.

LOL! Pathetic BS is still nothing more than pathetic BS! :smile: :smile:

So, what’s up next? Well, Mr. BS already smells the defeat, and his only "hope" is semantic games and personal attacks in disguise of silly words, and after yet another 2-3 posts, the attacks will boost significantly.

And then comes the grand finale in an: "Agreement not to agree."

Jesse, we take you seriously, and Mr. BS is nothing more than a pathetic joke.


argumentum ad nauseam
 
Last edited by a moderator:
  • #1,210
One more thing.
JesseM said:
This is easier to see if you suppose λ can only take a discrete set of values from 0 to N, so the integral on the right side of (2) can be replaced by the sum \sum_{i=0}^N A(a,\lambda_i)*B(b,\lambda_i)*P\\lambda_i).

You must agree therefore that the following is Bell's inequality.
<br /> |\sum_{i} A(a, \lambda_{i} )A(b,\lambda_{i} ) P(\lambda_{i} ) + \sum_{i} A(a, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} )| - \sum_{i} A(b, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} ) \leq 1

Which can be factored in this form.
<br /> |\sum_{i} P(\lambda_{i} )A(a, \lambda_{i} )\left [ A(b,\lambda_{i} ) + A(c,\lambda_{i} )\right ]| - \sum_{i} A(b, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} ) \leq 1<br />

Bell himself did a similar factorization. Therefore if for any dataset the two equations above produce different results, it means the dataset is not compatible with Bell's inequality for purely mathematical reasons. Do you agree? If you don't please explain clearly.
 
  • #1,211
In case there is any doubt left. Let us now go through Bell's paper and show step by step, and show that the physical assumptions are peripheral to the derivation of the inequality.

We start by recognizing that Bell has defined a deterministic function A(.,.) which is a two valued function with values (+1 or -1) for a single particle. This is done in equation (1) of his original paper, as follows:

Bell said:
A(a,\lambda ) = \pm 1, B(b,\lambda ) = \pm 1

Let us show set up our own definitions side by side. Let us pick two arbitrary variables a', b' with values (+1 or -1). For our purpose, it is not important what the physical situation is between a', or b' or whether there is remote dependence between a' and b'. All that is important for us is that we have two such arbitrary variables without any regard as to what physical process may be producing them. Please do not confuse our variables a' and b', with Bell's vectors (a and b). a' and b' are rather analogous to Bell's two-valued functions A(.,.) and B(.,.). We will harmonize the notation later. In our case, the analogy of Bell's equation (1) above is the following:
a&#039; = \pm 1, b&#039; = \pm 1


Now let us go to Bell's equation (2) where he defines his expectation values

Bell said:
E(a,b) = \int d\lambda \rho (\lambda )A(a,\lambda )B(b,\lambda )

Note, what Bell is doing here is calculating the weighted average of the product A(a,λ)*B(b,λ) for all λ. Which is essentially the expectation value. Theoretically the above makes sense, where you measure each A(a,.), B(b,.) pair exactly once for a specific λ, and simply multiply with the probability of realizing that specific λ and then add up subsequent ones to get your expectation value E(a,b). But practically, you could obtain the same E(a,b) by calculating a simple average over a representative set of outcomes in which the frequency of realization of a specific λ, is equivalent to it's probability. ie

For example, if we had only 3 possible λ's (λ1, λ2, λ3) with probabilities (0.3, 0.5, 0.2) respectively. The expectation value will be
E(a,b) = 0.3*A(a,λ1)*B(b,λ1) + 0.5*A(a,λ2)*B(b,λ2) + 0.2*A(a,λ3)*B(b,λ3)

Where each outcome for a specific lambda exists exactly once. OR we can calculate it using a simple average, from a dataset of 10 data points, in which A(a,λ1),B(b,λ1) was realized exactly 3 times (3/10 = 0.3), A(a,λ2), B(b,λ2) was realized 5 times, and A(a,λ3), B(b,λ3) was realized 2 times; or any other such dataset of N entries where the relative frequencies are representative of the probabilities. Practically, this is the only way available to obtain expectation values, since no experimenter has any idea what the λ's are or how many of them there are. All they can do is assume that by measuring a large number of points, their data will be as representative as illustrated above.(This is the fair sampling assumption which is however not the focus of this post.) So then in this case, assuming discrete λ's, that Bell's equation (2) is equivalent to the following simple average:
E(a,b) = \frac{1}{N} \sum_{i}^{N} A(a,\lambda _i)B(b,\lambda _i)
Since in any real experiment we do not know which λ is realized for any specific iteration, we can drop lambda from the equation altogether without any impact, where we have simply absorbed the λ into the specific variant of the functions A,B operating for iteration i (that is Ai and Bi)
E(a,b) = \frac{1}{N} \sum_{i}^{N} A(a)_{i}B(b)_{i}
And we could adopt a simplified notation in which we replace the function A(a)_i with the outcome \alpha _i and B(b)_i with \beta _i. Note that the outcomes of our functions are restricted to values (+1 or -1) and we could say \alpha = \pm 1, \beta = \pm 1

To get:
E(a,b) = \frac{1}{N} \sum_{i}^{N} \alpha _{i} \beta _{i} = &lt;\alpha \beta &gt;
Let us then develop our analogy involving our a' and b' to the same point. Remember our first assumption was that we had two such arbitrary variables a' and b' with values (+1 or -1). Now consider the situation in which we had a list of pairs of such variables of length N. Let us designate our list [(a',b')] to indicate that each entry in the list is a pair of (a',b') values. Let us define the expectation value of the pair product for our list as follows:
E(a&#039;,b&#039;) = \frac{1}{N} \sum_{i}^{N} {a}&#039;_{i} {b}&#039;_{i} = &lt;a&#039;b&#039;&gt;
For all practical purposes, this equation is exactly the same as the previous one and the terms a' and b' are mathematically equivalent to α and β respectively. What this shows is that the physical assumptions about existence of hidden variables, locality etc are not necessary to obtain an expression for the expectation values for a pair product. We have obtained the same thing just by defining two variables a', b' with values (+1 and -1) and calculating the expectation value for the paired product of a list of pairs of these variables. You could say the reason Bell obtained the same same expression is because he just happened to be dealing with two functions which can have values (+1 and -1) for physical reasons and experiments producing a list of such pairs. And he just happened to be interested in the pair product of those functions for physical reasons. But the structure of the calculation of the expectation value is determined entirely by the mathematics and not the physics. Once you have two variables with values (+1 and -1) and a list of pairs of such values, the above equations should arise no matter the process producing the values, whether physical, mystical, non-local, spooky, super-luminal, or anything you can dream about. That is why I say the physical assumptions are peripheral.

Note a few things about the above equation. a'_i and b'_i must be multiplied with each other. If we independently reorder the columns in our list so that we have different pairings of a'_i and b'_i, we will obtain the same expectation value only in the most improbable of situations. To see this, consider the simple list below

a' b'
+ -
- +
- +
+ -

<a'b'> = -1/4

If we rearrange the b' column so that the pairing is no longer the same, we may have something like the following were we have the same number of +'s and -'s but their pairing is different:

a' b'
+ +
- -
- -
+ +

<a'b'> = 1/4
Which tells us that we are dealing with an entirely different dataset.
 
  • #1,212
(continued from the last post)

So far we have dealt with pairs, just like Bell up to his equation (14). Let us then, following in Bell's footsteps introduce the third variable (see page 406 of his original paper).
Bell said:
It follows that c is another unit vector
E(a,b) - E(a,c) = -\int d\lambda \rho (\lambda )[A(a,\lambda )A(b,\lambda )-A(a,\lambda )A(c,\lambda )]
= \int d\lambda \rho (\lambda )A(a,\lambda )A(b,\lambda )[A(b,\lambda)A(c,\lambda )-1]
using (1), whence
\left | E(a,b)-E(a,c) \right |\leq \int d\lambda \rho [1 - A(b,\lambda)A(c,\lambda )]
The second term on the right is E(b,c), whence
1 + E(b,c) >= |E(a,b) - E(a,c)| ... (15)

Note a few things here: Bell factorizes at will within the integral. ρ(λ) is a factor of every term under the integral. That is why I explained in my previous detailed post that ρ(λ) must be the same for all three terms. Secondly, Bell derives the expectation value term E(b,c) by factoring out the corresponding A(b,.) and A(c,.) terms from E(a,b) and E(a,c). Therefore, E(b,c) does not contain different A(b,.) and A(c,.) terms but the exact same ones present in E(a,b) and E(a,c). In other words, in order to obtain all three expectation values E(a,b), E(a,c) and E(b,c), we ONLY need three lists of outcomes corresponding to A(a,.), A(b,.), A(c,.) or in simpler notation, we only need a single list of triples [(a',b',c')] to calculate all terms for

1 + <b'c'> >= |<a'b'> - <a'c'>|


So then, we are destined to obtain this inequality for any list of triples of two valued variables (or outcomes of two-valued functions) were the allowed values are (+1 or -1), no matter the physical, metaphysical or mystical situation generating the triples. It is an entirely arithmetic relationship entirely determined by the fact that we are using three such two-variables. Suppose now that we generate from our list of triples, three lists of pairs corresponding to [(a',b')], [(a',c')] and [(b',c')], we can simply calculate our averages and be done with it. It doesn't matter if the order of pairs in the lists are randomized so long as the pairs are kept together. In this case, we can still sort them as described in my previous detailed description, to regenerate our list of triples from the three lists of pairs. However, if we were to randomize without keeping the pairs together, it will be impossible to regenerate our original list of triples from the resulting lists of pairs, and Bell's inequality will not apply to our data.

Now the way Bell-test experiments are usually done, is analogous to collecting three lists of pairs randomly with the assumption that these three lists are representative of the three lists of pairs which we would have obtain from a list of triple, had we been able to measure at three angles simultaneously. And if each list was sufficiently long, the averages will be close to those of the ideal situation assumed by Bell. Again, remember that within each list of pairs actually measured, the individual pairs such as (a',b')_i measured together are assumed to have originated from a specific theoretical triple, (a',c')_j from another triple, and (b',c')_k from another triple. Therefore, our dataset from a real experiment is analogous to our three theoretical lists above, where we randomized the order but kept the pairs together while randomizing. Which means, it should be possible to regenerate our single list of triples simply by resorting the three lists of pairs while keeping the individual pairs together, as I explained previously. If we can not do this, it means either that:
a) our data is most likely of the second kind in which randomization did not keep the pairs together or
b) each list of pairs resulted from different lists of triples and/or
c) our lists of pairs are not representative of the list of triples from which they arose

In any of these cases, Bell's inequality does not and can not apply to the data. In other words, it is simply a mathematical error to use the inequality in such situations. Also note that these represent the only scenarios in which "average value of a*b for all triples" is different from "average value of a*b for measured pairs only". And in this case, the fair sampling assumption can not hold.
 
Last edited:
  • #1,213
(reply to post #1208, part 1)
billschnieder said:
Note the underlined texts as we will come back to it. Now let us consider our previous discussion about this in post #857.
billschnieder said:
JesseM said:
The same probability distribution should apply to each of the four terms, but the inequality should hold regardless of the specific probability distribution (assuming the universe is a local realist one and the specific experimental conditions assumed in the derivation apply).
So then, if it was found that it is possible in a local realist universe for P(λi) to be different for at least one of the terms in the inequality, above, then the inequality will not apply to those situations where P(λi) is not the same. In other words, the inequalities above are limited to only those cases for which a uniform P(λi) can be guaranteed between all terms within the inequality. Do you disagree?
If you remember, our previous discussion fell apart at the point where you refused to give a straight answer to the last question above.
Well, no, you are completely misremembering why our previous discussion "fell apart". In fact I did give you a clear answer to this question in post #861:
When you suggest the possibility that P(λi) could be "different for at least one of the terms in the inequality", that would imply that P(λi) depends on the choice of detector settings, since each expectation value is defined relative to a particular combination of detector settings. Am I understanding correctly, or are you talking about something else?

If I am understanding you right, note that it's generally accepted that one of the assumptions needed in Bell's theorem is something called the "no-conspiracy assumption", which says the decisions about detector settings should not be correlated with the values of the hidden variables.

...

So, I agree the inequality can only be assumed to hold if the choice of detector settings and the value of the hidden variables are statistically independent (which means the probability distribution P(λi) does not change depending on the detector settings), but this is explicitly included as an assumption in the more rigorous modern derivations. If you dispute that a "conspiracy" of the type being ruled out here would in fact have some very physically implausible features so that it's reasonable to rule it out, I can give you some more detailed arguments for why it's so implausible.
Then in post #862 you said:
You are wondering off now, JesseM. Try not to pre-empt the discussion. The question I asked should have a straightforward answer. The reason why P(λi) might be different shouldn't affect the answer you give to my question. If you believe P(λi) will be different when a conspiracy is involved, then you should have no problem admitting that Bell's inequalities do not apply to situations in which there is conspiracy.
And in post #863 I responded to the last sentence (...'then you should have no problem admitting that Bell's inequalities do not apply to situations in which there is conspiracy') by saying:
Didn't I already "admit" that in my last post? Read again:
So, I agree the inequality can only be assumed to hold if the choice of detector settings and the value of the hidden variables are statistically independent (which means the probability distribution P(λi) does not change depending on the detector settings)
So, I made quite clear that my answer to your question was "yes", I agreed that the inequality can only be assumed to hold if the probability distribution P(λi) is assumed to be the same for each of the terms E(a,b), E(a,b'), E(a',b) and E(a',b'). But I additionally explained that assuming the probability distribution was the same for each term was equivalent to the no-conspiracy assumption, i.e. P(λi) = P(λi | a,b) = P(λi | a,b') = P(λi | a',b) = P(λi | a',b'). Your complaint in subsequent posts was not that I had failed to give clear answers to any of your questions, but just a complaint that you didn't like the fact that I made additional commentary about the reasoning behind my answers, commentary which I thought would help people reading the thread to better understand the issues being discussed. You wanted me to shut up and not make any additional comments I deemed relevant, and restrict myself only to short answers to your questions. For example in post #864 you made it clear that you did understand I had answered your questions, and just wanted me to snip out all the surrounding commentary about my answers:
So then, I will assume that the last few posts did not happen, and I will consider that the responses moving forward are as follows:
So then, if it was found that it is possible in a local realist universe for P(λi) to be different for at least one of the terms in the inequality, above, then the inequality will not apply to those situations where P(λi) is not the same. In other words, the inequalities above are limited to only those cases for which a uniform P(λi) can be guaranteed between all terms within the inequality. Do you disagree?
... I agree ...
Do you believe P(λi) can different between the terms if and only if conspiracy is involved?
Yes ...

See how short and to the point this would have been. You would have saved yourself all the typing effort, and to boot, we don't have to start a new rabbit trail about the meaning of "conspiracy"!
Then later in that same post you made clear that your actual objection was to my additional explanatory commentary, and threatened to end the discussion if I wouldn't agree to restrict my comments only to short answers to your questions:
But if you now define conspiracy in a manner that I don't agree with, I will be forced to challenge it because if I don't it may appear as though I agree with that definition, then we end up 20 posts later, discussing whose definition of "conspiracy" is correct, having left the original topic. The more you write, the more things need to be challenged in your posts and the more off-topic the discussions will get. This is why I insist that the discussion be focused. I hope you will recognize and respect this, otherwise there is no point continuing this discussion.
It is certainly reasonable to expect that one's discussion partner will give clear answers to any questions you ask, but it's not reasonable to expect that they restrict themselves only to short answers to your questions and not make any additional commentary they think is relevant. That unreasonable expectation on your part was why the earlier discussion shut down, not because I didn't "give a straight answer" to any of the questions you asked.

Sorry to spend so much time rehashing old disagreements but I don't like being accused of refusing to answer any question, that's something I will always try my best to do. Moving on to the substance of your current post:
billschnieder said:
You say, Bell is referring to the measurement of AN entangled pair with detectors set at a and b. I agree.
So, you agree that "resorting" the data, in the way you did in post #1187, is out of the question? That no physicist would interpret a term like E(a,b) to possibly involve taking the result from a detector with setting a during a trial where the two detectors were set to a,b' and multiplying it by the result from a detector with setting b during a trial where the two detectors were set to a',b?
billschnieder said:
You also say, in order to obtain the expectation value for this pair of angles, Bell integrates over all λi, so that there is a λi probability distribution. I agree also. This is precisely why I asked you all those questions earlier and you also agreed with me that this λi probability distribution must be exactly the same for all expectation value terms in Bell's inequality.
Yes, I did agree, giving you "a straight answer" to this question even though I added some additional commentary about why it is reasonable to expect the probability distribution to be the same regardless of detector settings.
billschnieder said:
Now please pay attention and make sure you actually understand what I am saying next before you respond.
The reason why the probability distribution of λi must be the same is the following (using the equations you presented, except using E for expectation to avoid confusion with Probability notation).
E(a,b) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(b,\lambda )
E(a,c) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(c,\lambda )
E(b,c) = -\int d\lambda\rho (\lambda )A(b,\lambda )A(c,\lambda )
I don't understand how you can say that those equations are "the reason why" the probability distribution is the same. Are you suggesting that those equations can be taken as definitions of E(a,b) and E(a,c) and E(b,c), and thus it is true by definition that the probability distribution \rho (\lambda ) is the same in each case? I would say that a term like E(a,b) is understood to be defined as the expectation value for the product of two measurements on a pair of entangled particles when the detectors are set to a and b, and that Bell then tries to physically justify why we would expect E(a,b) to be given by the equation above in a local realist universe. So any feature of the equations, like \rho (\lambda ) being the same in each, cannot be justified by pointing to the equations themselves, there has to be a physical justification for it or else someone following the derivation would have no reason to agree that the equations above are actually correct in a local realist universe. Do you agree that the derivation depends on the idea that there's a physical justification for assuming \rho (\lambda ) is the same in each of those three equations, that we can't just point to the equations themselves to explain the "reason" that \rho (\lambda ) is the same?

If you disagree with that, I would just point you again to Bell's paper http://cdsweb.cern.ch/record/142461/files/198009299.pdfpapers which I brought up earlier in post #1171 when showing that the simple Bell inequality I originally brought up was one that Bell had actually discussed. On p. 15 of the pdf file (p. 14 of the paper itself) he does bring up the other inequality we had been discussing before you refused to continue if I didn't keep my answers short:

|E(a,b) + E(a,b') + E(a',b) - E(a',b')| <= 2

Then on p. 16 of the pdf (p. 15 of the paper), in the "Envoi" section, he discusses possible objections one might have to his conclusion that the inequality should be obeyed in a local realist universe. And at the bottom of this page, he explicitly brings up the possibility that \rho(\lambda) could be different from term to term, and gives a physical argument for why he considers this very implausible:
Secondly, it may be that it is not permissible to regard the experimental settings a and b in the analyzers as independent variables, as we did. We supposed them in particular to be independent of the supplementary variable λ, in that a and b could be changed without changing the probability distribution \rho(\lambda). Now even if we have arranged that a and b are generated by apparently random radioactive devices, housed in separate boxes and thickly shielded, or by Swiss national lottery machines, or by elaborate computer programmes, or by apparently free willed experimental physicists, or by some combination of all of these, we cannot be sure that a and b are not significantly influenced by the same factors λ that influence A and B. But this way of arranging quantum mechanical correlations would be even more mind boggling than one in which causal chains go faster than light. Apparently separate parts of the world would be deeply and conspiratorially entangled, and our apparent free will would be entangled with them.
So, clearly he doesn't think that E(a,b) and E(a,b') can be said to have the same probability distribution on λ by definition, rather he provides a physical argument to justify this idea.
billschnieder said:
Note a few things about the above. There are two factorable terms inside the integral, one for each angle. You can visualize this integral in the following descrete way. We have a fixed number of λi, say (λ1, λ2, λ3, ... λn). To calculate the integral, we multiply A(a,λ1)A(b,λ1)*P(λ1) and add it to A(a,λ2)A(b,λ2)*P(λ2) ... all the way to λn. In other words, the above will not work if we did A(a,λ1)A(b,λ5)*P(λ3) or any such.
Agreed--note that if you mixed them up in that way you would no longer be computing an "expectation value" for the product of the two measurement results on a single pair of entangled particles, since it's assumed that on each trial with a single pair, λ takes a single value on that trial (its value is supposed to be determined by the values of all hidden variables on a given trial)
 
Last edited by a moderator:
  • #1,214
(reply to post #1208, part 2)


billschnieder said:
Secondly, once we have our inequality:

|E(a,b) - E(a,c)| - E(b,c) <= 1

To say the probability distribution of λi must be the same means that, if we obtained E(a,b) by integrating over a series of λi values, say (λ1, λ2, λ4), the same must apply to E(a,c) and E(b,c). In other words, it is a mathematical error to use E(a,b) calculated over (λ1, λ2, λ4), with E(a,c) calculated over (λ6, λ3, λ2) and E(b,c) calculated over (λ5, λ9, λ8) in the above inequality, because in that case ρ(λi) will not be the same across the terms the way Bell intended and we agree that he did.
True, but now you're talking about a completely different sense of what it would mean for ρ(λi) to "not be the same across the terms" than what I was talking about. I wasn't talking about only adding some values of λ in the sums for each term, I was just talking about how each term could involve a different probability distribution on all possible values of λi, i.e. one might use a probability distribution P1(λ) such that P1(λ5) = 0.03% while another might use a different probability distribution P2(λ) such that P2(λ5) = 1.7%. That is what it would mean to violate the no-conspiracy assumption, it doesn't have anything to do with only adding some values of λ in the sum for each term. Even if the no-conspiracy assumption was violated, the discrete case in a local realist universe (where the result A was always completely predetermined by the value of λ and the choice of detector setting a, b, or c) where there were N possible values of λ would still look like this:

E(a,b) = - \sum_{i=1}^N A(a,\lambda_i)*A(b,\lambda_i)*P_1 (\lambda_i)
E(b,c) = - \sum_{i=1}^N A(b,\lambda_i)*A(c,\lambda_i)*P_2 (\lambda_i)
E(a,c) = - \sum_{i=1}^N A(a,\lambda_i)*A(c,\lambda_i)*P_3 (\lambda_i)

You can see that the only difference here is that the three sums have different probability distributions on λ--P1, P2, and P3--but each sum still includes every possible value of λ (i.e. λ1, λ2, λ3, ... , λN)

Perhaps you are worried that even if we assume the probability distribution P(λ) is the same for each term, there could be trillions of values of λ and thus the subset of trials where we used detector angles a,b might involve a totally different collection of λi's than the subset of trials where we used detector angles b,c or the subset where we used detector angles a,c. If so, this objection is misguided, and once again the reason has to do with the Law of large numbers. "Expectation values" are theoretical calculations about what the average result of some experiment would be in the limit as the number of trials goes to infinity. And one can show mathematically that if you're dealing with an experiment that only has two possible results +1 and -1, then for a reasonably large number of trials (say, 1000) the probability that the average experimental result will differ significantly from the expectation value becomes astronomically small, regardless of how many possible values can be taken by other variables "behind the scenes" which determine whether the final result +1 or -1. This was the point I made back in post #51 on the 'Understanding Bell's Logic' thread, which you never responded to:
I'm fairly certain that the rate at which the likelihood of significant statistical fluctuations drops should not depend on the number of λn's in the integral. For example, suppose you are doing the experiment in two simulated universes, one where there are only 10 possible states for λ and one where there are 10,000 possible states for λ. If you want to figure out the number N of trials needed so that there's only a 5% chance your observed statistics will differ from the true probabilities by more than one sigma, it should not be true that N in the second simulated universe is 1000 times bigger than N in the first simulated universe! In fact, despite the thousandfold difference in possible values for λ, I'd expect N to be exactly the same in both cases. Would you disagree?

To see why, remember that the experimenters are not directly measuring the value of λ on each trial, but are instead just measuring the value of some other variable which can only take two possible values, and which value it takes depends on the value of λ. So, consider a fairly simple simulated analogue of this type of situation. Suppose I am running a computer program that simulates the tossing of a fair coin--each time I press the return key, the output is either "T" or "H", with a 50% chance of each. But suppose the programmer has perversely written an over-complicated program to do this. First, the program randomly generates a number from 1 to 1000000 (with equal probabilities of each), and each possible value is associated with some specific value of an internal variable λ; for example, it might be that if the number is 1-20 that corresponds to λ=1, while if the number is 21-250 that corresponds to λ=2 (so λ can have different probabilities of taking different values), and so forth up to some maximum λ=n. Then each possible value of λ is linked in the program to some value of another variable F, which can take only two values, 0 and 1; for example λ=1 might be linked to F=1, λ=2 might be linked to F=1, λ=3 might be linked to F=0, λ=4 might be linked to F=1, etc. Finally, on any trial where F=0, the program returns the result "H", and on any trial where F=1, the program returns the result "T". Suppose the probabilities of each λ, along with the value of F each one is linked to, are chosen such that if you take [sum over i from 1 to n] P(λ=i)*(value of F associated with λ=i), the result is exactly 0.5. Then despite the fact that there may be a very large number of possible values of λ, each with its own probability, this means that in the end the probability of seeing "H" on a given trial is 0.5, and the probability of seeing "T" on a given trial is also 0.5.

Now suppose that my friend is also using a coin-flipping program, where the programmer picked a much simpler design in which the computer's random number generator picks a digit from 1 to 2, and if it's 1 it returns the output "H" and if it's 2 it returns the output "T". Despite the differences in the internal workings of our two programs, there should be no difference in the probability either of us will see some particular statistics on a small number of trials! For example, if either of us did a set of 30 trials, the probability that we'd get more than 20 heads would be determined by the binomial distribution, which in this case says there is only an 0.049 chance of getting 20 or more heads (see the calculator http://stattrek.com/Tables/Binomial.aspx). Do you agree that in this example, the more complex internal set of hidden variables in my program makes no difference in statistics of observable results, given that both of us can see the same two possible results on each trial, with the same probability of H vs. T in both cases?

For a somewhat more formal argument, just look at http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter8.pdf, particularly the equation that appears on p. 3 after the sentence that starts "By Chebyshev's inequality ..." If you examine the equation and the definition of the terms above, you can see that if we look at the the average value for some random value X after n trials (the S_n / n part), the probability that it will differ from the expectation value \mu by an amount greater than or equal to \epsilon must be smaller than or equal to \sigma^2 / n\epsilon^2, where \sigma^2 is the variance in the value of the original random variable X. And both the expectation value for X and the variance of X depend only on the probability that X takes different possible values (like the variable F in the coin example which has an 0.5 chance of taking F=0 and an 0.5 chance of taking F=1), it shouldn't matter if the value of X on each trial is itself determined by the value of some other variable λ which can take a huge number of possible values.
billschnieder said:
Note also that even if the set of λ's is the same, we still need each λ to be sampled the exact same number of times for each term.
No, see above. The expectation value is the average value we'd expect theoretically in the limit as the number of trials approaches infinity, and my argument from post #51 of "Understanding Bell's logic" explains why, if we do say three runs with 1000 trials each for all three possible combinations of different detector settings, it'd be astronomically unlikely for the average results seen experimentally in each run to differ significantly from the expectation values (assuming that the theoretical assumptions about the laws of physics that went into deriving expressions for the expectation values are actually correct), even if there happen to be 200 googolplex possible values of λ. If you disagree, perhaps you should actually address my example with the coin-flipping program rather than just dismissing it as irrelevant like you did on the "Understanding Bell's logic" thread.
billschnieder said:
Now what I have just describe here are the specific experimental conditions that should apply for Bell's inequality to be applicable to data obtained from any experiment.
Nope, there is no need for each run to sample all values of λi (or for different runs to sample the same values of λi), just as there wouldn't be such a need in the coin-flipping simulation example where the result "heads" or "tails" on each flip depends on the value of an internal random variable λ which can take a huge number of possible values, but the total probability of getting "heads" or "tails" on each flip is still 0.5 (so the theoretical expectation value if heads=+1 and tails=-1 would be 0), and the law of large numbers still says that if you do a few hundred flips the probability that the fraction of "heads" will be significantly different from 0.5 (or the probability that the average value with heads=+1 and tails=-1 is significantly different from 0) will be astronomically small, even if you sampled only a tiny fraction of the possible values of the internal variable λ.

billschnieder said:
This brings us to the sorting I mentioned earlier which you are having difficulty with.
Suppose in any actual experiment, the experimenter also had along side each pair of measurements in each run, the specific value of λ for that run. He will now have a long list of pairs of +'s and/ -'s plus one indexed λ each. Such that for the three runs of the experiment he will have three lists which look something similar to the following, except the actuall sequence of +'s and/ -'s and λ's will be different

+ - λ1
- + λ9
+ + λ6
- + λ3
...

In such a case, it will be easy to verify if his data meets the requirement that ρ(λi) is the same for each term, as you agreed to previously. He could simply sort each of the three lists according to the λ column and compare if the λ column from all three runs are the same.
Again, you misunderstood what I meant when I agreed "ρ(λi) is the same for each term", see the discussion above starting with the paragraph that begins "True, but now you're talking about a completely different sense..." I just meant that the "true" probability distribution for a given pair of settings like a,b, which in frequentist terms can be understood as giving the fraction of trials/iterations with each value of λi that would be obtained in the limit as the number of trials/iterations with those settings went to infinity, would be identical to the "true" probability distribution for a different pair of settings like b,c. Then the law of large numbers indicates that even if you only do 3 runs with 1000 iterations each, and the λi's were completely different on each run, it's still astronomically improbable that the average values you obtain for each run will differ significantly from the "true" expectation values for each setting which can be calculated from the "true" probability distribution ρ(λi).
billschnieder said:
If they are not, ρ(λi) is different and Bell's inequality can not be applied to the data for purely mathematical reasons.
No, you're confusing the theoretical ρ(λi) which appears in the equations calculating expectation values with the actual truth about the fraction of trials/iterations with each value of λi on some finite set of runs, which might better be denoted F(λi). If the number of trials/iterations is not much larger than the number of possible values of λi, then F(λi) might well be wildly different than ρ(λi), but exactly the same would be true in my coin flip simulation example and it wouldn't change the fact that if you do 1000 simulated flips, the chance you will have gotten a number of heads significantly different than 500 is astronomically small. If you think it's actually necessary to sample every value of λi in order to be highly confident that our average result was very close to the "true" expectation value, then you're just misunderstanding how the law of large numbers works.
billschnieder said:
In other words, if they insisted to calculate the LHS of the inequality with that data, the inequality is not guaranteed to be obeyed, for purely mathematical reasons.
Even if all the theoretical assumptions used in the expectation value equations are correct, there's some small probability that experimental data won't satisfy the inequality, but for a reasonably large number of trials/iterations on each run (say, 1000), this probability becomes astronomically small (the probability that the experimental average differs by a given amount from the expectation value can be calculated using the http://stattrek.com/Tables/Binomial.aspx).
 
  • #1,215
(reply to post #1208, part 3)


billschnieder said:
However, experimenters do not have the λ's so how can they make sure their data is compatible? If it is assumed that each specific λ contains all properties that will deterministically result in the outcome, then we do not need the λs to sort our data. We can just sort the actual result pairs so that the "a" colum of the (a,b) pair matches the "a" column of the (a,c) pair and the "b" and "c" columns also match. If we can do that, then we can be sure that ρ(λi) is the same for all three terms of the inequality and Bell's inequality should apply to our data.
I'm not sure I follow what you mean here. Suppose we do only 4 iterations with each pair of different detector settings, and get these results (with the understanding that notation like a=+1 means 'the result with detector set to angle a was +1):

For run with setting (a,b):
1. (a=+1, b=-1)
2. (a=-1, b=-1)
3. (a=-1, b=+1)
4. (a=+1, b=-1)

For run with setting (b,c):
1. (b=-1, c=+1)
2. (b=-1, c=-1)
3. (b=-1, c=+1)
4. (b=+1,c=-1)

For run with setting (a,c):
1. (a=+1, c=-1)
2. (a=+1, c=+1)
3. (a=-1, c=-1)
4. (a=-1, c=+1)

Then we can arrange these results into four rows of three iterations from three runs, such that in each row the value of a is the same for both iterations that sampled a, in each row the value of b is the same for both iterations that sampled b, and in each row the value of c is the same for both iterations that sampled c:

1. (a=+1, b=-1) 3. (b=-1, c=+1) 2. (a=+1, c=+1)
2. (a=-1, b=-1) 1. (b=-1, c=+1) 4. (a=-1, c=+1)
3. (a=-1, b=+1) 4. (b=+1,c=-1) 3. (a=-1, c=-1)
4. (a=+1, b=-1) 2. (b=-1, c=-1) 1. (a=+1, c=-1)

So, we could "resort" the iteration labels for the second run (middle column) such that the former third iteration was now labeled the first, the former first iteration was now labeled the second, the former fourth iteration was now labeled the third, and the former second iteration was now labeled the fourth. Likewise for the third run (right column) we could say the former second iteration was now labeled the first, the former fourth iteration was now labeled the second, the third iteration remained the third, and the former first iteration was now labeled the fourth. Is this the type of "resorting" you mean?

If so, I don't see how this ensures that "ρ(λi) is the same for all three terms of the inequality", or what you even mean by that. For example, isn't it possible that if the number of possible values of λ is 1000, then even though iteration #1 of the first run has been grouped in the same row as iteration #3 of the second run and iteration #2 of the third run (according to their original labels), that doesn't mean the value of λ was the same for each of these three iterations? For example, might it not have been the case that iteration #1 of the first run had λ203, iteration #3 of the second run had λ769, and iteration #2 of the third run had λ488?

As a separate issue it is of course true that if your full set of data can be resorted in this way, that's enough to guarantee mathematically that the data will obey Bell's inequality. But this is a very special case, I think it would be fairly unlikely that the full set of iterations from each run could be resorted such that every row would have the same value of a,b,c throughout, even if the data was obtained in a local realist universe that obeyed Bell's theoretical assumptions, and even if the overall averages from each run actually did obey the Bell inequality.
billschnieder said:
If we can not, it means ρ(λi) is different, and the data is mathematically not compatible with the inequality.
But again that doesn't seem to be true (if I am interpreting your meaning correctly), the prediction that experimental data is highly unlikely to violate the inequality in a local realist universe doesn't require that the values of λ matched on the three experimental runs with different pairs of detector settings. The law of large numbers means that if the equations giving the theoretical expectation values are correct, and the theoretical expectation values obey some inequality, then the probability that experimental data from a finite series of runs would violate the inequality will become astronomically small for a reasonable number (say, a few hundred or a few thousand) trials/iterations, even if this number is vastly smaller than the number of possible values of λ, whose value (along with the detector settings) determines the results on each trial.
billschnieder said:
Let us look at this slightly differently. Consider our first list which included the λ's. After sorting all three runs by the λ's we will find that we only need three columns of +'s and/ -'s out of the 6 (2 from each run). This is because each column will be duplicated. This simply means for each λ, there are 3 simultaneously existing properties at the angles.
Each value of λ is associated with a triplet of predetermined results for settings a,b,c, so if you could somehow know the value of λ on each trial and you knew what settings were used on that trial, that would be sufficient to tell you the results obtained on that trial. Is that basically what you're saying here, or are you making some additional point?
billschnieder said:
Now, what if instead of collecting three runs of pairs we collected a single run of triples so that the data from our experiment is
a b c
+ - + λ1
- + + λ9
+ + - λ6
- + + λ3
...

We do not need any sorting here because we can calculate all our terms from the same single run with the same ρ(λi). So we can compare ANY dataset of this type with Bell's inequality.
You could only "compare it with Bell's inequality" by changing the meaning of the terms in Bell's inequality, which deal with expectation values for experiments where the experimenter only collected a pair of results on each trial, with some specific pair of detector settings. As I've said before, it is of course true that you can prove an inequality like this in a purely mathematical way:

1 + (average value of b*c for all triples)
>= |(average value of a*b for all triples) - (average value of a*c for all triples)|

But that's not Bell's inequality! The terms in Bell's inequality have a meaning like this:

1 + (average value of b*c for all trials where experimenter sampled b and c)
>= |(average value of a*b for all trials where experimenter sampled a and b) - (average value of a*c for all trials where experimenter sampled a and c)|
billschnieder said:
However, since it is not possible to measure triples in any experiment, the requirement to be able to sort the dataset applies to all datasets involving multiple runs of pairs.
No, this is not a "requirement" unless you adopt the strawman position that the inequality is supposed to be guaranteed to hold with probability 1, even for a finite number of trials. But no physicist would claim that, the claim is just that in a local realist universe the actual averages should approach the ideal expectation values as the number of trials becomes large, so in a local realist universe matching Bell's theoretical assumptions, an experiment matching his experimental conditions should have a very tiny probability of yielding data that violates the inequality.
billschnieder said:
Now, let us go back to the underlined text above. Since you agreed with me that ρ(λi) must be the same for each term in the inequality
As noted above I may have meant something different by this than you do, I was talking about the "true" probability distribution and not the actual fraction of trials/iterations with a given value of λi (I used the notation F(λi) to distinguish this second from the first).
billschnieder said:
Is that what you were alluding to with the underlined text: "which is equivalent to the average measurement result over a very large (approaching infinity) series of measurements"? In other words, why is it important that the number of measurements be very large? Please I need a specific answer to this question, assuming you are still willing to contest this issue after my very detailed explanation above.
It's important because true probabilities are understood to be different from actual frequencies on a finite number of trials in the frequentist view, and I don't think there's any sensible way to interpret the probabilities that appear in Bell's proof in non-frequentist terms. An "expectation value" like E(a,b) would be interpreted in frequentist terms as the expected average result in the limit as the number of trials (on a run with detector settings a,b) goes to infinity, and likewise the ideal probability distribution ρ(λi) would in frequentist terms give the fraction of all trials where λ took the specific value λi, again in the limit as the number of trials goes to infinity. Then you can show theoretically that given Bell's physical assumptions, we can derive an inequality like this one:

1 + E(b,c) >= |E(a,b) - E(a,c)|

Then by the law of large numbers, you can show that the likelihood of a significant difference between the "true" expectation value E(b,c) and the experimental average (average for product of two results on all trials where detectors were set to b and c) becomes tiny as the number of trials becomes reasonably large (say, 1000), regardless of whether the ideal probability distribution ρ(λi) is very different from the actual function F(λi) describing the fraction of trials with each value of λi (both functions would be unknown to the experimenter but they should have some true objective value which might be known to an omniscient observer). So, from this we can conclude that with a reasonably large number of trials, it'd be astronomically unlikely in a local realist universe for the experimental data to violate this inequality:

1 + (average value of b*c for all trials where experimenter sampled b and c)
>= |(average value of a*b for all trials where experimenter sampled a and b) - (average value of a*c for all trials where experimenter sampled a and c)|

What specific step(s) in this reasoning do you have an objection to?
billschnieder said:
As an aside:
You seem to have an issue with my use of

| <ab> + <ac> | - <bc> <= 1

In which I have replaced E(a,b) in Bell's notation with <ab> in mine. Where a,b represent the outcomes at angles a and b and I was referring to the fact that in calculating the averages, it is not allowed for the list of a's in the first term to contain a different number of +'s and/ -'s from that in the second term and same for "c" and "b".
OK, the phrase I bolded above now helps clarify what you meant when you said "the symbols ("a", "b" and "c") mean exactly the same thing from term to term", but there was really no way I could have been expected to deduce that without you spelling it out explicitly! Your requirement that we be able to "resort" the data from all three runs such that every row of three iterations from three runs has the same values of a,b,c throughout is a completely idiosyncratic idea no physicist ever brings up in discussions of Bell's theorem, and before post #1208 you hadn't explained it (your previous example involving 'resorting' didn't involve lining up three iterations from three runs, rather it involved creating a fake 'triple' from an iteration of the second run where a and c were measured and an iteration from the third run where b and c were measured, combining the values of a and c from the first iteration with the value of b from the second...see the end of my post #1191 for a discussion of this).
billschnieder said:
You objected and said:
JesseM said:
"a" is just a detector angle rather than a result like +1 or -1, the text makes that clear, so of course it means the same thing everywhere. But P(a,b) is an expectation value (he called it that himself), which can be understood as the average value of the product of two measurements on a pair of entangled particles with detectors at angles a and b, in the limit as the number of particle pairs measured in this way goes to infinity.
But then later, you used exactly the same notation.
The point of my objection was that I didn't understand what you meant when you said 'In Bell's inequality the the "a" in the first two terms are exactly the same.' Whenever I used notation like a*b, I always explained that this was really meant to be a shorthand for the product of two measurement results (each either +1 or -1) on a single pair of particles with detectors set to angles a and b. But that doesn't help to understand what you might mean by 'the "a" in the first two terms are exactly the same', and you didn't explain the meaning before, how was I supposed to know you were talking about reordering each list of iterations such that the value of a in the ith iteration of the run with settings a,b would always match the value of a in the ith iteration of the run with settings a,c? (assuming I have finally understood what you meant, if not please explain) Like I said this is a very idiosyncratic notion of yours and I'm not a mind reader so unless you spell it out I'm not going to know what you're talking about. I didn't assume that the "a" in your phrase 'the "a" in the first two terms are exactly the same' did refer to the detector angle, I just didn't know what it meant and was expressing confusion, and I explicitly asked you for a clarification on this in the second part of my reply (post #1206) when I said:
billschnieder said:
It doesn't mean you need to resort it in order to calculate the terms. It just means being able to resort the data is evidence that the symbols are equivalent. It is just another way of saying the symbols ("a", "b" and "c") mean exactly the same thing from term to term.
I still don't know what you mean by "mean exactly the same thing from term to term". a, b and c are just placeholders, for each triple each one can take value +1 or -1, for example in the first triple on your list you might have a=+1 while on the second triple you might have a=-1. Do you just mean that each term deals with averages from exactly the same list of triples, rather than each term dealing with averages from a separate list of triples?
billschnieder said:
This tactic of yours combined with lack of willingness to actually understand the opposing view, combined with a severe case of irrelevant argumentum ad verbosium, is the reason I do not take you seriously.
Again, this is very uncharitable, not to mention paranoid. When I express confusion about a vague phrase of yours, you act as though it's some sort of sneaky "tactic", and you imagine your posts to be such models of clear exposition that any failure to immediately grok what you are saying must reveal a "lack of willingness to actually understand the opposing view" (speaking of lack of willingness, I do try to address all your arguments as best I can, whereas you immediately dismiss anything that you don't immediately see the relevance of like my coin-flipping simulation example from the 'Understanding Bell's logic' thread...what's more, addressing all your arguments itself requires long posts, and then you interpret this too in a hostile mocking way as 'argumentum ad verbosium'). If you would move away from such a hostile/paranoid mindset, and consider that there might be some truth in what I said at the end of post #1190:
But of course the most charitable and fair assumption is that communication about complex issues like these is sometimes difficult and arguments that may seem clear to you can seem genuinely ambiguous to intelligent readers who aren't privy to all your thought processes.
...then this discussion would probably proceed a lot more smoothly and with less hostility.
 
  • #1,216
billschnieder said:
You must agree therefore that the following is Bell's inequality.
<br /> |\sum_{i} A(a, \lambda_{i} )A(b,\lambda_{i} ) P(\lambda_{i} ) + \sum_{i} A(a, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} )| - \sum_{i} A(b, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} ) \leq 1

Which can be factored in this form.
<br /> |\sum_{i} P(\lambda_{i} )A(a, \lambda_{i} )\left [ A(b,\lambda_{i} ) + A(c,\lambda_{i} )\right ]| - \sum_{i} A(b, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} ) \leq 1<br />

Bell himself did a similar factorization. Therefore if for any dataset the two equations above produce different results, it means the dataset is not compatible with Bell's inequality for purely mathematical reasons. Do you agree? If you don't please explain clearly.
If by "dataset" you mean some finite collection of experimental results, then I don't agree. The above equations are correct only insofar as they refer to the "true" probabilities and expectation values, which in frequentist terms can be understood in terms of fractions of trials with different possible results in the limit as the number of trials goes to infinity. But as I said in the following section of post #1215, Bell's proof is primarily about these ideal "true" probabilities and expectation values, then if you want to connect this with experimental data you have to invoke the law of large numbers (which is really implicit in all physical predictions involving probabilities, so physicists typically don't state this explicitly):
true probabilities are understood to be different from actual frequencies on a finite number of trials in the frequentist view, and I don't think there's any sensible way to interpret the probabilities that appear in Bell's proof in non-frequentist terms. An "expectation value" like E(a,b) would be interpreted in frequentist terms as the expected average result in the limit as the number of trials (on a run with detector settings a,b) goes to infinity, and likewise the ideal probability distribution ρ(λi) would in frequentist terms give the fraction of all trials where λ took the specific value λi, again in the limit as the number of trials goes to infinity. Then you can show theoretically that given Bell's physical assumptions, we can derive an inequality like this one:

1 + E(b,c) >= |E(a,b) - E(a,c)|

Then by the law of large numbers, you can show that the likelihood of a significant difference between the "true" expectation value E(b,c) and the experimental average (average for product of two results on all trials where detectors were set to b and c) becomes tiny as the number of trials becomes reasonably large (say, 1000), regardless of whether the ideal probability distribution ρ(λi) is very different from the actual function F(λi) describing the fraction of trials with each value of λi (both functions would be unknown to the experimenter but they should have some true objective value which might be known to an omniscient observer). So, from this we can conclude that with a reasonably large number of trials, it'd be astronomically unlikely in a local realist universe for the experimental data to violate this inequality:

1 + (average value of b*c for all trials where experimenter sampled b and c)
>= |(average value of a*b for all trials where experimenter sampled a and b) - (average value of a*c for all trials where experimenter sampled a and c)|
 
  • #1,217
The points made in your recent posts have already been pre-empted and rebutted in my posts
#1211 and #1212 so consider those as responses. You probably did not see them before developing your recent responses. If there are any points you still contest after reading those two posts, please indicate and I will re-explain in yet simpler terms.
 
  • #1,218
billschnieder said:
Now let us go to Bell's equation (2) where he defines his expectation values
Bell said:
E(a,b) = \int d\lambda \rho (\lambda )A(a,\lambda )B(b,\lambda )
Perhaps I am over-interpreting your use of the word "defines", but as I argued towards the end of post #1213 (starting with the paragraph that begins 'I don't understand how you can say...'), this paragraph cannot be taken as the definition of E(a,b), rather E(a,b) is understood to be defined in a physical way as the expectation value for the product of two measurements on an entangled particle pair with detector settings a and b. This expectation value is understood as a sum of the different possible measurement outcomes weighted by their "true" probabilities:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

And here the probabilities are the "objective" ones that would correspond in frequentist terms to the frequencies in the limit as the number of trials went to infinity.

Bell then gives some physical arguments as to why we'd expect the expectation value to take this form:

E(a,b) = \int d\lambda \rho (\lambda )A(a,\lambda )B(b,\lambda )

And here as before, \rho(\lambda) is assumed to be the "objective" probability distribution, not something we need to measure or even make guesses about in practice. We don't need to know anything about the details of this probability distribution to derive a general inequality that is expected to apply to the "true" probabilities of different measurement results under any set of local realist laws, and then we can use the law of large numbers to conclude that if we do some sufficient number of trials, our actual experimental averages are astronomically unlikely to differ from the expectation values determined by the "true" probabilities. Once again, here's my summary of the logic from post #1215:
true probabilities are understood to be different from actual frequencies on a finite number of trials in the frequentist view, and I don't think there's any sensible way to interpret the probabilities that appear in Bell's proof in non-frequentist terms. An "expectation value" like E(a,b) would be interpreted in frequentist terms as the expected average result in the limit as the number of trials (on a run with detector settings a,b) goes to infinity, and likewise the ideal probability distribution ρ(λi) would in frequentist terms give the fraction of all trials where λ took the specific value λi, again in the limit as the number of trials goes to infinity. Then you can show theoretically that given Bell's physical assumptions, we can derive an inequality like this one:

1 + E(b,c) >= |E(a,b) - E(a,c)|

Then by the law of large numbers, you can show that the likelihood of a significant difference between the "true" expectation value E(b,c) and the experimental average (average for product of two results on all trials where detectors were set to b and c) becomes tiny as the number of trials becomes reasonably large (say, 1000), regardless of whether the ideal probability distribution ρ(λi) is very different from the actual function F(λi) describing the fraction of trials with each value of λi (both functions would be unknown to the experimenter but they should have some true objective value which might be known to an omniscient observer). So, from this we can conclude that with a reasonably large number of trials, it'd be astronomically unlikely in a local realist universe for the experimental data to violate this inequality:

1 + (average value of b*c for all trials where experimenter sampled b and c)
>= |(average value of a*b for all trials where experimenter sampled a and b) - (average value of a*c for all trials where experimenter sampled a and c)|
If you disagree with any of the above, please go back and address my specific arguments in posts #1213-1215.
billschnieder said:
Note, what Bell is doing here is calculating the weighted average of the product A(a,λ)*B(b,λ) for all λ. Which is essentially the expectation value. Theoretically the above makes sense, where you measure each A(a,.), B(b,.) pair exactly once for a specific λ, and simply multiply with the probability of realizing that specific λ and then add up subsequent ones to get your expectation value E(a,b). But practically, you could obtain the same E(a,b) by calculating a simple average over a representative set of outcomes in which the frequency of realization of a specific λ, is equivalent to it's probability. ie

For example, if we had only 3 possible λ's (λ1, λ2, λ3) with probabilities (0.3, 0.5, 0.2) respectively. The expectation value will be
E(a,b) = 0.3*A(a,λ1)*B(b,λ1) + 0.5*A(a,λ2)*B(b,λ2) + 0.2*A(a,λ3)*B(b,λ3)

Where each outcome for a specific lambda exists exactly once. OR we can calculate it using a simple average, from a dataset of 10 data points, in which A(a,λ1),B(b,λ1) was realized exactly 3 times (3/10 = 0.3), A(a,λ2), B(b,λ2) was realized 5 times, and A(a,λ3), B(b,λ3) was realized 2 times; or any other such dataset of N entries where the relative frequencies are representative of the probabilities. Practically, this is the only way available to obtain expectation values, since no experimenter has any idea what the λ's are or how many of them there are.
The comment above is completely misguided, since the basic definition of "expectation value" in this experiment has nothing at all to do with knowing the value of λ, it is just understood to be:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

Bell argues on a theoretical basis that E(a,b) should also be given by the integral involving \rho(\lambda), but the above should be understood as the basic meaning of an "expectation value". And by the law of large numbers, if you repeat the experiment a fairly large number of times (say 1000), the chances that the fraction of trials where you got some particular result (say, +1 with setting a and +1 with setting b) is significantly different from the "true probability" of that result (in this case P(detector with setting a gets result +1, detector with setting b gets result +1)) would become astronomically small, even if the number of trials was tiny compared to the number of possible values of λ. I gave a bunch of argument for this claim about the law of large numbers in post #1214, so if you disagree please go back and address that post. If you don't disagree, then you can see why in order to compare the inequality with experimental data we don't have to consider λ at all, we just have to use our dataset of pairs to find the average for the product of two results on each of the three combinations of different detector settings.
billschnieder said:
All they can do is assume that by measuring a large number of points, their data will be as representative as illustrated above.
They assume the averages from their data are close to the "true" expectation values E(a,b), E(b,c) and E(a,c), which can be justified by the law of large numbers, but there is no need to assume that the (unknown) frequencies of different values of λi which occurred in the particle pairs they sampled was anything like the "true" probability distribution p(λi). Do you disagree?
billschnieder said:
So then in this case, assuming discrete λ's, that Bell's equation (2) is equivalent to the following simple average:
E(a,b) = \frac{1}{N} \sum_{i}^{N} A(a,\lambda _i)B(b,\lambda _i)
How is it equivalent? It's quite possible that P(λ2) could be very different from P(λ3), for example, in which case you need to weigh the terms A(a,λ2)*B(b,λ2) and A(a,λ3)*B(b,λ3) by the probabilities of those values if you want to get an accurate expectation value. The correct discrete version would have to look like this:
E(a,b) = \frac{1}{N} \sum_{i}^{N} A(a,\lambda _i)*B(b,\lambda _i)*P(\lambda_i)
billschnieder said:
Since in any real experiment we do not know which λ is realized for any specific iteration, we can drop lambda from the equation altogether without any impact, where we have simply absorbed the λ into the specific variant of the functions A,B operating for iteration i (that is Ai and Bi)
E(a,b) = \frac{1}{N} \sum_{i}^{N} A(a)_{i}B(b)_{i}
Well, the i's in λi weren't supposed to be iterations, but rather were just a way of indexing all physically possible values that the hidden variables could take on that type of experiment--there could well be more possible values of i than particles in the observable universe! So if i in the equation above is supposed to refer to iterations you've significantly changed the meaning of the index, from something theoretical to something empirical. And again, Bell's reasoning is based on the "true" or "objective" probabilities of different outcomes which give the "true" expectation value, which is different from the empirical average which you are computing above, although the law of large numbers means that the difference between the two becomes small for a reasonably large number of trials (again see post #1214 on this point). Still, it's important to distinguish theoretical from empirical, so let's use E(a,b) to be the "true" expectation value for the product of the measurements with settings a and b, and Avg(a,b) to be the empirical average of all the products of measurement results on a run with settings a and b, and then we can say that in the limit as the number of trials/iterations in a run goes to infinity, Avg(a,b) should approach E(a,b) with probability 1. In this case I would rewrite the above as:

Avg(a,b) = \frac{1}{N} \sum_{i}^{N} A(a)_{i}B(b)_{i}
billschnieder said:
And we could adopt a simplified notation in which we replace the function A(a)_i with the outcome \alpha _i and B(b)_i with \beta _i. Note that the outcomes of our functions are restricted to values (+1 or -1) and we could say \alpha = \pm 1, \beta = \pm 1

To get:
E(a,b) = \frac{1}{N} \sum_{i}^{N} \alpha _{i} \beta _{i} = &lt;\alpha \beta &gt;
Which I would rewrite as:

Avg(a,b) = \frac{1}{N} \sum_{i}^{N} \alpha _{i} \beta _{i} = &lt;\alpha \beta &gt;
billschnieder said:
Let us then develop our analogy involving our a' and b' to the same point. Remember our first assumption was that we had two such arbitrary variables a' and b' with values (+1 or -1). Now consider the situation in which we had a list of pairs of such variables of length N. Let us designate our list [(a',b')] to indicate that each entry in the list is a pair of (a',b') values. Let us define the expectation value of the pair product for our list as follows:
E(a&#039;,b&#039;) = \frac{1}{N} \sum_{i}^{N} {a}&#039;_{i} {b}&#039;_{i} = &lt;a&#039;b&#039;&gt;
Again this doesn't work as a theoretical expectation value since i refers to some number of iterations, whereas a theoretical expectation value for an experiment which can give anyone of N results R1, R2, ..., RN always has the form E(R) = \sum_{i=1}^N R_i * P(R_i). However, it does work as a way of computing the average for the product of a' and b' for a list of values, so in my notation:

Avg(a&#039;,b&#039;) = \frac{1}{N} \sum_{i}^{N} {a}&#039;_{i} {b}&#039;_{i} = &lt;a&#039;b&#039;&gt;

billschnieder said:
For all practical purposes, this equation is exactly the same as the previous one and the terms a' and b' are mathematically equivalent to α and β respectively. What this shows is that the physical assumptions about existence of hidden variables, locality etc are not necessary to obtain an expression for the expectation values for a pair product.
As I said, you are not really computing an expectation value but just an average, which in the limit as the number N of iterations went to infinity would approach the true expectation value with probability 1.
billschnieder said:
We have obtained the same thing just by defining two variables a', b' with values (+1 and -1) and calculating the expectation value for the paired product of a list of pairs of these variables. You could say the reason Bell obtained the same same expression is because he just happened to be dealing with two functions which can have values (+1 and -1) for physical reasons and experiments producing a list of such pairs. And he just happened to be interested in the pair product of those functions for physical reasons. But the structure of the calculation of the expectation value is determined entirely by the mathematics and not the physics. Once you have two variables with values (+1 and -1) and a list of pairs of such values, the above equations should arise no matter the process producing the values, whether physical, mystical, non-local, spooky, super-luminal, or anything you can dream about. That is why I say the physical assumptions are peripheral.
Physical assumptions are peripheral to calculating averages from experimental data, it's true, and they're also peripheral to writing down expectation values in terms of the "true" probabilities as I did when I wrote E(R) = \sum_{i=1}^N R_i * P(R_i), with the following equation as a special case of this general form:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

...but you can't derive useful inequalities like 1 + E(b,c) >= |E(a,b) - E(a,c)| from such simple definitions! For that you need to make some physical assumptions which allow you to show that the "true" expectation values can also be written in some more specific form, such as:

E(a,b) = - \sum_{i=1}^N A(a,\lambda_i)*A(b,\lambda_i)*P(\lambda_i)
E(b,c) = - \sum_{i=1}^N A(b,\lambda_i)*A(c,\lambda_i)*P(\lambda_i)
E(a,c) = - \sum_{i=1}^N A(a,\lambda_i)*A(c,\lambda_i)*P(\lambda_i)

...and then it's from these more specific forms that you derive the inequalities.
billschnieder said:
Note a few things about the above equation. a'_i and b'_i must be multiplied with each other. If we independently reorder the columns in our list so that we have different pairings of a'_i and b'_i, we will obtain the same expectation value only in the most improbable of situations. To see this, consider the simple list below

a' b'
+ -
- +
- +
+ -

<a'b'> = -1/4

If we rearrange the b' column so that the pairing is no longer the same, we may have something like the following were we have the same number of +'s and -'s but their pairing is different:

a' b'
+ +
- -
- -
+ +

<a'b'> = 1/4
Which tells us that we are dealing with an entirely different dataset.
OK, sure, if you are allowed to resort pairs at will you can get different averages for the products of pairs. But in Bell's theorem it's assumed that all the "products of two measurement results" are each from pairs of measurements on a single pair of entangled particles, you're not allowed to resort the data in this way.
 
  • #1,219
billschnieder said:
(continued from the last post)

So far we have dealt with pairs, just like Bell up to his equation (14). Let us then, following in Bell's footsteps introduce the third variable (see page 406 of his original paper).
Bell said:
It follows that c is another unit vector
E(a,b) - E(a,c) = -\int d\lambda \rho (\lambda )[A(a,\lambda )A(b,\lambda )-A(a,\lambda )A(c,\lambda )]
using (1), whence
\left | E(a,b)-E(a,c) \right |\leq \int d\lambda \rho [1 - A(b,\lambda)A(c,\lambda )]
The second term on the right is E(b,c), whence
1 + E(b,c) >= |E(a,b) - E(a,c)| ... (15)
Note a few things here: Bell factorizes at will within the integral. ρ(λ) is a factor of every term under the integral. That is why I explained in my previous detailed post that ρ(λ) must be the same for all three terms.
And I explained in #1213 that it doesn't make any sense to use these equations as the reason why ρ(λ) should be the same in all three terms, since the equations he writes down for E(a,b) and E(b,c) and E(a,c) are not meant to be definitions of the expectation values, but rather conclusions about how the expectation values can be written down in a universe that obeys local realist laws along with the no-conspiracy assumption. See everything in post #1213 starting with the paragraph that begins "I don't understand how you can say..."

Anyway, if we accept Bell's physical argument that in a local realist universe we should be able to write the expectation values as follows:

E(a,b) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(b,\lambda )
E(a,c) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(c,\lambda )
E(b,c) = -\int d\lambda\rho (\lambda )A(b,\lambda )A(c,\lambda )

...then we can see why the factorization he does in the equations you wrote above should be justified. But he does need to make that physical argument to justify it.

Also, there is some ambiguity in what you mean when you say "ρ(λ) must be the same for all three terms", I discussed this at the start of post #1214. I was interpreting it just as a statement that the "true" or "objective" probability distributions on different values of λ (which would give the frequencies of different values of λ that would be expected in the limit as the number of trials went to infinity) should not depend on the detector settings. If you mean something different, like that the actual finite run of trials on each detector setting should involve the same frequencies of different values of λ, then I disagree that Bell's equation implies anything of the sort since it only deals with "true" probabilities and not empirical results, but again see post #1214 for the detailed discussion on this point.
billschnieder said:
Secondly, Bell derives the expectation value term E(b,c) by factoring out the corresponding A(b,.) and A(c,.) terms from E(a,b) and E(a,c). Therefore, E(b,c) does not contain different A(b,.) and A(c,.) terms but the exact same ones present in E(a,b) and E(a,c).
I don't know why you have replaced terms like A(b,λ) with notation like A(b,.)--easier to type, or some deeper significance? Anyway, Bell is assuming that for any given value of λi, A(a,λi) is the same regardless of whether the other detector was on setting b or setting c, and so forth for A(b,λi) and A(c,λi). In other words, the result at a given detector depends only on that detector's setting and the value of all hidden variables on that trial, it doesn't depend on the other detector's setting (and we wouldn't expect it to in a local realist universe!) Is this all you're saying, or do you think the factorization has some further implications?
billschnieder said:
In other words, in order to obtain all three expectation values E(a,b), E(a,c) and E(b,c), we ONLY need three lists of outcomes corresponding to A(a,.), A(b,.), A(c,.) or in simpler notation, we only need a single list of triples [(a',b',c')] to calculate all terms for

1 + <b'c'> >= |<a'b'> - <a'c'>|
No, again it seems like you are confusing theoretical terms with empirical results. E(a,b) doesn't depend on what results we got on any finite series of trials, it's the "true" expectation value that can be defined as

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

Where each of the P's represents the "true" or "objective" probability for that pair of results, as distinguished from the fraction of some finite number of trials where that pair of results was seen (as always, in frequentist terms the objective probabilities would be the fraction of trials with that pair of results in the limit as the number of trials goes to infinity).
billschnieder said:
So then, we are destined to obtain this inequality for any list of triples of two valued variables (or outcomes of two-valued functions) were the allowed values are (+1 or -1), no matter the physical, metaphysical or mystical situation generating the triples.
But that's not the situation with Bell's theorem. Rather, with Bell's theorem we have three runs with different combinations of detector settings (a,b), (b,c) and (a,c), and considering the average from each run. Bell is showing that if we know the true expectation values for each individual run, in a local realist universe they should obey:

1 + E(b,c) >= |E(a,b) - E(a,c)|

Since each expectation value is for a different run, even if you assume that every iteration of every run is determined by a set of triples, you can't derive the above equation from arithmetic alone since each expectation value would deal with a different collection of triples. So, you do need to consider the "physical, metaphysical or mystical situation generating the triples". And once you are convinced that the above equation should hold for the true expectation values, then by the law of large numbers you can conclude that if you do 1000 trials on each run, in a local realist universe you are astronomically unlikely to see a violation of the following inequality on your data:

1 + (average for product of results on the run with settings b and c) >=
|(average for product of results on the run with settings a and b) -
(average for product of results on the run with settings a and c)|

billschnieder said:
Suppose now that we generate from our list of triples, three lists of pairs corresponding to [(a',b')], [(a',c')] and [(b',c')], we can simply calculate our averages and be done with it. It doesn't matter if the order of pairs in the lists are randomized so long as the pairs are kept together. In this case, we can still sort them as described in my previous detailed description, to regenerate our list of triples from the three lists of pairs.
See my questions and arguments about your "resorting" procedure in post #1215. First I clarified what I thought you meant by this form of "resorting" at the start of the post with a simple example, perhaps you can tell me if I've got it right or not. If I have got it right, then please address my subsequent comments and questions:
If so, I don't see how this ensures that "ρ(λi) is the same for all three terms of the inequality", or what you even mean by that. For example, isn't it possible that if the number of possible values of λ is 1000, then even though iteration #1 of the first run has been grouped in the same row as iteration #3 of the second run and iteration #2 of the third run (according to their original labels), that doesn't mean the value of λ was the same for each of these three iterations? For example, might it not have been the case that iteration #1 of the first run had λ203, iteration #3 of the second run had λ769, and iteration #2 of the third run had λ488?

As a separate issue it is of course true that if your full set of data can be resorted in this way, that's enough to guarantee mathematically that the data will obey Bell's inequality. But this is a very special case, I think it would be fairly unlikely that the full set of iterations from each run could be resorted such that every row would have the same value of a,b,c throughout, even if the data was obtained in a local realist universe that obeyed Bell's theoretical assumptions, and even if the overall averages from each run actually did obey the Bell inequality.
billschnieder said:
Now the way Bell-test experiments are usually done, is analogous to collecting three lists of pairs randomly with the assumption that these three lists are representative of the three lists of pairs which we would have obtain from a list of triple, had we been able to measure at three angles simultaneously.
Yes, that's true. Since there are only eight possible distinct triples, and the value of λ on each trial completely determines the type of triple on that trial, and we assume the true probability distribution P(λ) is the same regardless of the detector settings, then with some reasonably large number of trials (say 1000) on each run we do expect that:

Fraction of trials on first run where the hidden triple was a=+1, b=-1 and c=+1

is very close to

Fraction of trials on second run where the hidden triple was a=+1, b=-1 and c=+1

and to

Fraction of trials on third run where the hidden triple was a=+1, b=-1 and c=+1

And likewise for the fractions of the other seven types of triples that occurred on each run. Do you agree this is a reasonable expectation thanks to the law of large numbers?
billschnieder said:
And if each list was sufficiently long, the averages will be close to those of the ideal situation assumed by Bell. Again, remember that within each list of pairs actually measured, the individual pairs such as (a',b')_i measured together are assumed to have originated from a specific theoretical triple, (a',c')_j from another triple, and (b',c')_k from another triple. Therefore, our dataset from a real experiment is analogous to our three theoretical lists above, where we randomized the order but kept the pairs together while randomizing. Which means, it should be possible to regenerate our single list of triples simply by resorting the three lists of pairs while keeping the individual pairs together, as I explained previously.
Even if the data was drawn from triples, and the probability of different trials didn't depend on the detector settings on each run, there's no guarantee you'd be able to exactly resort the data in the manner of my example in post #1215, where we were able to resort the data so that every row (consisting of three pairs from three runs) had the same value of a,b,c throughout. You might be able to sort it so that most rows of three pairs had the same value of a,b,c throughout, but probably not all. This would at least give a way of roughly estimating the frequencies of different types of triples, though.
billschnieder said:
If we can not do this, it means either that:
a) our data is most likely of the second kind in which randomization did not keep the pairs together or
Well, we know this does not apply in Bell tests, where every data pair is always from a single trial with a single pair of measurements on a single pair of entangled particles.
billschnieder said:
b) each list of pairs resulted from different lists of triples and/or
If the frequencies of each of the 8 types of triples differed significantly in three runs with a significant (say, 1000 or more) number of trials in each, this would imply either an astronomically unlikely statistical miracle or it would imply that the no-conspiracy assumption is false and that the true probabilities of different triples actually does change depending on the detector settings.
billschnieder said:
c) our lists of pairs are not representative of the list of triples from which they arose
Not sure I follow what you mean here. Are you suggesting that even if we had a triple like a=+1, b=-1, c=+1 we might still get result -1 with detector setting a? If so what would be the point of assuming the data arose from triples in the first place? Remember that Bell's assumption of predetermined results on each axis came from the fact that whenever both particles were measured on the same axis they always gave opposite results--in a local realist universe where the decisions about the two detector settings can have a spacelike separation, it seems impossible to explain this result otherwise (though some of Bell's later proofs dropped the assumption of always getting opposite or identical results when both experimenters used the same setting).
billschnieder said:
In any of these cases, Bell's inequality does not and can not apply to the data. In other words, it is simply a mathematical error to use the inequality in such situations.
No, the fact that Bell's inequality is observed not to work is empirical evidence that one of the assumptions used in the derivation must be false, like the assumption that local realism is true (with the conclusion of predetermined triples following from this assumption along with the observation that using the same angle always yields opposite results), or the no-conspiracy assumption. Unless you want to argue (and you probably do) that even if we assume the validity of those theoretical assumptions, this does not necessarily imply Bell's inequality should hold for the type of experiment he describes.
billschnieder said:
Also note that these represent the only scenarios in which "average value of a*b for all triples" is different from "average value of a*b for measured pairs only". And in this case, the fair sampling assumption can not hold.
What do you mean by "fair sampling assumption"? This page says "It states that the sample of detected pairs is representative of the pairs emitted", but that could be true and Bell's inequality could still fail for some other reason like a violation of the no-conspiracy assumption.
 
  • #1,220
billschnieder said:
The points made in your recent posts have already been pre-empted and rebutted in my posts
#1211 and #1212 so consider those as responses. You probably did not see them before developing your recent responses. If there are any points you still contest after reading those two posts, please indicate and I will re-explain in yet simpler terms.
Having replied to these, I saw nothing in them that could be considered a rebuttal of any of the points I made in #1213-#1215. I indicated in my replies to #1211 and #1212 where I thought various claims made in those posts had been disputed or questioned in #1213-#1215, so if you disagree with some of the things I say in my recent replies you can go back and address the corresponding arguments/questions in the earlier posts.
 
  • #1,221
JesseM said:
billschnieder said:
Now let us go to Bell's equation (2) where he defines his expection values ...
Perhaps I am over-interpreting your use of the word "defines", but as I argued towards the end of post #1213 (starting with the paragraph that begins 'I don't understand how you can say...'), this paragraph cannot be taken as the definition of E(a,b), rather E(a,b) is understood to be defined in a physical way as the expectation value for the product of two measurements on an entangled particle pair with detector settings a and b.

You are grasping at straws here. First of all, I said the equation is Bell's definition of HIS expectation values for the situation he is working with.
Secondly, nobody said anything about the probabilities in the equation not being true probabilities, so you are complaining about an inexistent issue. Thirdly, you object to my statement but go on to say the exact same thing. This is what I said after the equation:

billschnieder said:
Note, what Bell is doing here is calculating the weighted average of the product A(a,λ)*B(b,λ) for all λ. Which is essentially the expectation value. Theoretically the above makes sense, where you measure each A(a,.), B(b,.) pair exactly once for a specific λ, and simply multiply with the probability of realizing that specific λ and then add up subsequent ones to get your expectation value E(a,b). But practically, you could obtain the same E(a,b) by calculating a simple average over a representative set of outcomes in which the frequency of realization of a specific λ, is equivalent to it's probability. ie

For example, if we had only 3 possible λ's (λ1, λ2, λ3) with probabilities (0.3, 0.5, 0.2) respectively. The expectation value will be
E(a,b) = 0.3*A(a,λ1)*B(b,λ1) + 0.5*A(a,λ2)*B(b,λ2) + 0.2*A(a,λ3)*B(b,λ3)


Where each outcome for a specific lambda exists exactly once. OR we can calculate it using a simple average, from a dataset of 10 data points, in which A(a,λ1),B(b,λ1) was realized exactly 3 times (3/10 = 0.3), A(a,λ2), B(b,λ2) was realized 5 times, and A(a,λ3), B(b,λ3) was realized 2 times; or any other such dataset of N entries where the relative frequencies are representative of the probabilities.

And this is how it is described on Wikipedia:

Wikipedia said:
http://en.wikipedia.org/wiki/Expected_value
In probability theory and statistics, the expected value (or expectation value, or mathematical expectation, or mean, or first moment) of a random variable is the integral of the random variable with respect to its probability measure.

For discrete random variables this is equivalent to the probability-weighted sum of the possible values.

For continuous random variables with a density function it is the probability density-weighted integral of the possible values.

The term "expected value" can be misleading. It must not be confused with the "most probable value." The expected value is in general not a typical value that the random variable can take on. It is often helpful to interpret the expected value of a random variable as the long-run average value of the variable over many independent repetitions of an experiment.

The expected value may be intuitively understood by the law of large numbers: The expected value, when it exists, is almost surely the limit of the sample mean as sample size grows to infinity.


So when you say:
JesseM said:
This expectation value is understood as a sum of the different possible measurement outcomes weighted by their "true" probabilities:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

...

The comment above is completely misguided, since the basic definition of "expectation value" in this experiment has nothing at all to do with knowing the value of λ, it is just understood to be:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

It clearly shows that you do not understand probability or statistics. Clearly the definition of expectation value is based on probability weighted sum, and law of large numbers is used as an approximation, that is why it says in the last sentence above that the expectation values is "almost surely the limit of the sample mean as the sample size grows to infinity"

You are trying to restrict the definition by suggesting that expection value is defined ONLY over the possible paired outcomes (++, --, +-, -+) and not possible λ's, but that is naive, and short-sighted but also ridiculous as we will see shortly. Now let us go back to the first sentence of the wikipedia definition above and notice the last two words "probability measure". In case you do not know what that means, a probability meaure is simply any real valued function which assigns 1 to the entire probablity space and maps events into the range from 0 to 1. An expectation value can be defined over any such probabiliy measure, not just the one you pick and choose for argumentation purposes. In Bell's equation (2),
\int d\lambda \rho (\lambda ) = 1
Therefore ρ(λ) is a probability measure over the paired products A(a,λ)A(b,λ) and Bell's equation (2) IS defining an expectation value for paired products irrespective of any physical assumptions. There is no escape for you here.
 
  • #1,222
JesseM said:
If you disagree with any of the above, please go back and address my specific arguments in posts #1213-1215
Of course I disagree with a lot of it, for reasons I have already explained in above, I do not see the need to respond specifically. Anyone following the discussion will immediately recognize this fact. For example, you argued earlier that there was a difference between "average value of b*c for all measurements" and "average value of b*c for all triples" with the former one being the one applicable to Bell's inequality:

JesseM said:
Here you seem to be talking about conditions under which an inequality like this:

|(average value of a*b for all triples in which experimenter measured a and b) + (average value of a*c for all triples in which experimenter measured a and c)| - (average value of b*c for all triples in which experimenter measures b and c) <= 1

...can be derived. This is an entirely separate issue from the other point I was arguing, which was just the idea that the above inequality is not guaranteed to hold in spite of the fact that its arithmetical analogue is guaranteed:

|(average value of a*b for all triples) + (average value of a*c for all triples)| - (average value of b*c for all triples) <= 1

Anyway, if you agree that these types of inequalities are conceptually separate, that Bell's inequality was of the top type, and that a proof of the bottom one doesn't constitute a proof of the top

You continued to object despite my argument that as far as Bell's inequality is concerned, the two are equivalent. But now as your argument morphs to try and avoid the trap which requires ρ(λ) to be the same between terms, you are now claiming that the two are really really the same with a probability close to 1, because of the law of large numbers.

JesseM said:
Then by the law of large numbers, you can show that the likelihood of a significant difference between the "true" expectation value E(b,c) and the experimental average (average for product of two results on all trials where detectors were set to b and c) becomes tiny as the number of trials becomes reasonably large (say, 1000), regardless of whether the ideal probability distribution ρ(λi) is very different
There is no escape for you here either.

JesseM] [quote="billschnieder said:
All they can do is assume that by measuring a large number of points, their data will be as representative as illustrated above.
They assume the averages from their data are close to the "true" expectation values E(a,b), E(b,c) and E(a,c), which can be justified by the law of large numbers, but there is no need to assume that the (unknown) frequencies of different values of λi which occurred in the particle pairs they sampled was anything like the "true" probability distribution p(λi). Do you disagree?[/quote]
Yes I disagree. Again here you are grasping at straws. The law of large number is only able to approximate the true expectation value, precisely because ρ(λi) for the ver large sample will almost always not be significantly different from the true probability distribution. If it differs significantly, the law of large numbers will definitely not produce the true expectation value. Just by measuring an extremely large number does not guarantee a representative sample.

So by assuming a that the expectation values are the same for a very large number of measurements, they are in effect also assuming that the probability distribution ρ(λi) in the sample is representative of the true distribution. From these silly mistakes and your recent discussion with RUTA in another thread, I am convinced that you do not understand probability and statistics. Unless you really understand it but are just trying to obfuscate.

JesseM said:
billschnieder said:
E(a,b)= \frac{1}{N} \sum_{i}^{N}A(a,\lambda _i)B(b,\lambda _i)
How is it equivalent? It's quite possible that P(λ2) could be very different from P(λ3), for example, in which case you need to weigh the terms A(a,λ2)*B(b,λ2) and A(a,λ3)*B(b,λ3) by the probabilities of those values if you want to get an accurate expectation value. The correct discrete version would have to look like this
E(a,b)= \frac{1}{N} \sum_{i}^{N}A(a,\lambda _i)*B(b,\lambda _i)*P(\lambda _i)
You were not following when I explained earlier the following:
billschnieder said:
For example, if we had only 3 possible λ's (λ1, λ2, λ3) with probabilities (0.3, 0.5, 0.2) respectively. The expectation value will be
E(a,b) = 0.3*A(a,λ1)*B(b,λ1) + 0.5*A(a,λ2)*B(b,λ2) + 0.2*A(a,λ3)*B(b,λ3)

Where each outcome for a specific lambda exists exactly once. OR we can calculate it using a simple average, from a dataset of 10 data points, in which A(a,λ1),B(b,λ1) was realized exactly 3 times (3/10 = 0.3), A(a,λ2), B(b,λ2) was realized 5 times, and A(a,λ3), B(b,λ3) was realized 2 times; or any other such dataset of N entries where the relative frequencies are representative of the probabilities. Practically, this is the only way available to obtain expectation values, since no experimenter has any idea what the λ's are or how many of them there are. All they can do is assume that by measuring a large number of points, their data will be as representative as illustrated above.(This is the fair sampling assumption which is however not the focus of this post.) So then in this case, assuming discrete λ's, that Bell's equation (2) is equivalent to the following simple average

So your objection above is short-sighted because practically in any experiment P(λ) can not be known so the expectation value can not be calculated using P(λ), but can be calculated as a simple average from a large number of samples which is representative in the sense that the relative frequencies of realizing specific λ's are not significantly different from the true probability of the specific λ's. So your "correction" above is wrong because, you failed to understand the part where I explained that the realizations of the λ's is not unique. In other words, each specific λ, occurs multiple times with the relative frequency corresponding to it's probability.

JesseM said:
Still, it's important to distinguish theoretical from empirical, so let's use E(a,b) to be the "true" expectation value for the product of the measurements with settings a and b, and Avg(a,b) to be the empirical average of all the products of measurement results on a run with settings a and b, and then we can say that in the limit as the number of trials/iterations in a run goes to infinity, Avg(a,b) should approach E(a,b) with probability 1.
That is an completely artificial distinction. Bell is calculating expectation values, and the only time when a simple average can be substituted for the expectation value is when it is calculated over a representative/fair sample. So your insistence on relabelling the term is just grasping at straws. If you insist on pursuing this ridiculous idea, I ask that you write down the expression for the expectation value for the following example:

You are given a theoretical list of N pairs of real-valued numbers x and y. Write down the mathematical expression for the expectation value for the paired product. Once you have done that, try and swindle your way out of the fact that
a) The structure of the expression so derived does not depend on the actual value N. ie, N could be 5, 100, or infinity.
b) The expression so derived is a theoretical expression not "empirical".
c) The expression so derived is the same as the simple average of the paired products.

JesseM said:
Again this doesn't work as a theoretical expectation value since i refers to some number of iterations, whereas a theoretical expectation value for an experiment which can give anyone of N results R1, R2, ..., RN
Again this is not a serious objection because any serious person would not suggest that because we used i as the iterator in one equation means it must have the exact same meaning in a different equation. I already explained and you understood, that in the first case where we were doing a weighted average over λ's, i was iterating over each λ, with each specific λ occurring exactly once. In the second case which is a simple average, i is iterating over each case instance in a representative sample with the understanding that a specific λ will occur multiple times with the relative frequency corresponding to it's probability. Where the actual value of N does not matter so long as the relative frequencies of ALL λ's in our theoretical list is representative of the "true" probability distribution. The two expressions so calculated are exactly equivalent and both are expectation values. So there is no genuine objection here, and no way to escape either.
 
  • #1,223
JesseM said:
...but you can't derive useful inequalities like 1 + E(b,c) >= |E(a,b) - E(a,c)| from such simple definitions! For that you need to make some physical assumptions
This is what your entire argument boils down to. You are still struggling to suggest that physical assumptions are needed to derive Bell's inequality. But as I have explained, all you need are the following purely mathematical requirements:

1) a theoretical list of triples (a,b,c) of two-valued variables restricted in value to +/-1
2) Expressions of the expectation value of cyclical paired-products extracted from the list of triples E(a*b), E(a*c) and E(b*c), which I have shown convincingly to be equivalent to <ab>, <ac> and <bc> respectively.

That is all needed. I have shown that the expression for the expectation values E(a,b) is similar to Bell's. I will now show using notation analogous to that at the top of page 406 of Bell's paper that that the above necessarily lead to the inequalities obtained by Bell, without any physical assumptions. Note that despite your claims, you haven't actually pointed to any point in the derivation in which a physical assumption is required.

&lt;a&#039;b&#039;&gt; - &lt;a&#039;c&#039;&gt; = - \frac{1}{N}\sum_{i}^{N}({a}&#039;_i{b}&#039;_i - {a}&#039;_i{c}&#039;_i)
since b' = 1/b' (from b' = +/-1) it follows that
= \frac{1}{N}\sum_{i}^{N}{a}&#039;_i{b}&#039;_i(\frac{c&#039;_i}{b&#039;_i}-1)
and since a'b' = +/-1 it follows that the RHS is maximum when a'b'=1, therefore:
|&lt;a&#039;b&#039;&gt; - &lt;a&#039;c&#039;&gt;| \leq \frac{1}{N}\sum_{i}^{N}(1 - {b}&#039;_i{c}&#039;_i)
|&lt;a&#039;b&#039;&gt; - &lt;a&#039;c&#039;&gt;| + &lt;b&#039;c&#039;&gt; \leq 1
Note, and you can replace a' with -a' or b' with -b' or c' with -c' in the above and get the full family of Bell's original inequalities.

The above mirrors exactly what Bell did at the top of page 406! Now if you continue to argue that there is a physical assumption hidden in there, please show me using Bell's derivation of page 406 AND show above where you think I sneaked in a physical assumption in order to obtain the same expression. Note also, if you do not understand the above derivation, it means you clearly do not understand Bell's derivation at top of page 406

JesseM said:
And I explained in #1213 that it doesn't make any sense to use these equations as the reason why ρ(λ) should be the same in all three terms.
Any serious person following Bell's derivation would have noticed that the integral on the right hand side of the first equation on page 406 is obtained by subtracting two different integrals for E(a,b) and E(a,c) and joining the integral signs into a single integral over λ. In mathematics, this is normally understood by any serious student worthy of a pass grade to mean that E(a,b) and E(a,c) are defined over the same distribution of λ. Also, on the third expression (1st inequality) on page 406 where Bell factors out and recombines the A(b,λ) originally from the E(a,b) term and the A(c,λ) originally from the E(a,b) to generate a new A(c,λ)(b,λ) term all under the same integral over λ, and subsequently separates the RHS into two integrals over the same λ, with the first part yielding 1 and the othe yielding the E(b,c) term. Any person seriously trying to understand my argument rather than just quibble, would understand that the requirement for ρ(λ) to be the same between all the terms is inherent in the derivation. Duh! No doubt you do not yet recognize that your so-called objections were rebutted by Bell himself, even before you thought about them. Sorry no escape here either.

JesseM said:
billschnieder said:
In other words, in order to obtain all three expectation values E(a,b), E(a,c) and E(b,c), we ONLY need three lists of outcomes corresponding to A(a,.), A(b,.), A(c,.) or in simpler notation, we only need a single list of triples [(a',b',c')] to calculate all terms for

1 + <b'c'> >= |<a'b'> - <a'c'>|
No, again it seems like you are confusing theoretical terms with empirical results.
...
But that's not the situation with Bell's theorem. Rather, with Bell's theorem we have three runs with different combinations of detector settings (a,b), (b,c) and (a,c), and considering the average from each run. Bell is showing that if we know the true expectation values for each individual run, in a local realist universe they should obey:

1 + E(b,c) >= |E(a,b) - E(a,c)|

Since each expectation value is for a different run, even if you assume that every iteration of every run is determined by a set of triples, you can't derive the above equation from arithmetic alone since each expectation value would deal with a different collection of triples.
You do not understand Bell's work. Look again at page 406 and tell me how many distinct A(.,λ) type functions do you see. I can identify only three A(a,λ), A(b,λ), A(c,λ), not 6, which is what you are claiming Bell used in his derivation. The 3 expectation values E(a,b), E(a,c) and E(b,c) are merely cyclical combinations of these same terms. So you are off base here. There is no justification in Bell's work for suggesting that Bell is dealing with three 6 separate terms corresponding to three separate runs. You have provided no proof, either mathematical or logical to justify the ridiculous idea that Bell's inequality is derived from 6 separate terms rather than just 3.

However, as I have been pointing out to you over and over, the reason we can not guarantee that an actual experiment will obey Bell's inequality is due to the fact that actual experiments measure 6 different terms while Bell's derivation mandates the use of only 3. So at least here you seem to be seeing the light, only backwards.
 
Last edited:
  • #1,224
JesseM said:
See my questions and arguments about your "resorting" procedure in post #1215. First I clarified what I thought you meant by this form of "resorting" at the start of the post with a simple example, perhaps you can tell me if I've got it right or not. If I have got it right, then please address my subsequent comments and questions
Yes you claimed to have "clarified" what I mean by resorting, even though I had explained with a detailed example back in post #1187 what I meant. In any case you say:

JesseM said:
I'm not sure I follow what you mean here. Suppose we do only 4 iterations with each pair of different detector settings, and get these results (with the understanding that notation like a=+1 means 'the result with detector set to angle a was +1):

For run with setting (a,b):
1. (a=+1, b=-1)
2. (a=-1, b=-1)
3. (a=-1, b=+1)
4. (a=+1, b=-1)

For run with setting (b,c):
1. (b=-1, c=+1)
2. (b=-1, c=-1)
3. (b=-1, c=+1)
4. (b=+1,c=-1)

For run with setting (a,c):
1. (a=+1, c=-1)
2. (a=+1, c=+1)
3. (a=-1, c=-1)
4. (a=-1, c=+1)

Then we can arrange these results into four rows of three iterations from three runs, such that in each row the value of a is the same for both iterations that sampled a, in each row the value of b is the same for both iterations that sampled b, and in each row the value of c is the same for both iterations that sampled c:

1. (a=+1, b=-1) 3. (b=-1, c=+1) 2. (a=+1, c=+1)
2. (a=-1, b=-1) 1. (b=-1, c=+1) 4. (a=-1, c=+1)
3. (a=-1, b=+1) 4. (b=+1,c=-1) 3. (a=-1, c=-1)
4. (a=+1, b=-1) 2. (b=-1, c=-1) 1. (a=+1, c=-1)

Let us call your three runs (runs 1, 2, 3) and calculate <ab>, <ac> and <bc> from each one.
<a1b1> = -1/4
<a2c2> = -1/4
<b3c3> = 0

Now looking at your resorted list with 6 columns: a1, b1, a2, c2, b3, c3, we can verify that
<a1b1> = <a1b3> = <a2b3> = <a2b1> = -1/4
and
<a2c2> = <a1c2> = <a2c3> = <a1c3> = -1/4
and
<b3c3> = <b1c3> = <b1c2> = <b3c2> = 0

The reason this holds is because after resorting, we see that all the a columns are identical, just like the b and c. So your dataset of 6 columns is in fact just a dataset of 3 columns with each column repeated once. If a dataset cannot be sorted like you did above, all those terms are not guaranteed to be the same. And if they are not the same, Bell's inequality can not be applied to the dataset.

JesseM said:
If so, I don't see how this ensures that "ρ(λi) is the same for all three terms of the inequality", or what you even mean by that. For example, isn't it possible that if the number of possible values of λ is 1000, then even though iteration #1 of the first run has been grouped in the same row as iteration #3 of the second run and iteration #2 of the third run (according to their original labels), that doesn't mean the value of λ was the same for each of these three iterations?

Please, pay attention for once: Every pair of outcomes at those angles is deterministically determined by the specific λ being realized for that iteration. So if for example we had only 5 possible λ's (λ1, λ2, λ3, λ4, λ5), the only possible outcomes are (++, +-, -+, --) which means some of the λ's must result in the same outcome. If say λ5 and λ3 each result in the same outcome (++) deterministically, and each of them was realized in the experiment exactly once, when you resort it, it doesn't matter whether the (++) at the top of the resorted list corresponds to λ5 or λ3 for the following reasons. If in your large number of iterations, λ5 and λ3 are fairly represented, you will still have the right number of (++)'s for both λ5 and λ3 and it doesn't matter if the specific (++) you got at the top is a λ5 ++ or a λ3 ++. Also, if for the three angles under consideration a,b,c a number of λ's deterministically resulted in the same outcomes for (a,b), (b,c) and (a,c) those lambdas are effectively equivalent as far as the experiment is concerned and you could combine them, updating the combined P(λ) appropriately. Finally as clearly explained in my posts #1211 and #1212, being able to sort the data is a test to see if the data meets the mathematical consistency required by Bell's derivation, in which the (b,c) term is derived by factoring out the b from the (a,b) term and factoring out the c from the (a,c) term and multiplying them together. Such factorization imposes a consistency requirement that unless you can do that, the inequality can not be derived and any data which can not be factored likewise, is mathematically incompatible with the inequality.

JesseM said:
Even if the data was drawn from triples, and the probability of different trials didn't depend on the detector settings on each run, there's no guarantee you'd be able to exactly resort the data in the manner of my example in post #1215, where we were able to resort the data so that every row (consisting of three pairs from three runs) had the same value of a,b,c throughout
That is why I cautioned you earlier not to prematurely blurb your claim that conspiracy must be involved for ρ(λi) to be different. Now we get an admission, however reluctantly that it is possible for ρ(λi) to be different without conspiracy. You see, the less you talk (write), the less you will have to recant later as I'm sure you are realizing.
 
  • #1,225
JesseM said:
If the frequencies of each of the 8 types of triples differed significantly in three runs with a significant (say, 1000 or more) number of trials in each, this would imply either an astronomically unlikely statistical miracle or it would imply that the no-conspiracy assumption is false and that the true probabilities of different triples actually does change depending on the detector settings.
First I would like for you to explain from where you pulled the 1000 number. What rule of mathematics, statistics, or any other field of science enabled you to suggest that 1000 or more was a significantly large number of trials??
Secondly I already explained to you in my response to your Scratch lotto example that, all you need to violate that requirement is for the probability of detection to vary with angle. In other words, a biased sample will do that without any conspiracy. Since the rest of the arguments above have failed, I will predict that you will hang on this one and try to change the discussion to one about scratch lotto cards. Let's wait and see ...

JesseM said:
Not sure I follow what you mean here. Are you suggesting that even if we had a triple like a=+1, b=-1, c=+1 we might still get result -1 with detector setting a?
Why would you choose the most improbable of meanings. I mean that the list of pairs is not representative of the list of triples. Which clearly means that the relative frequency of each specific pair in the list of pairs is not the same as the relative frequency of the same pair in the list of triples.
JesseM said:
In any of these cases, Bell's inequality does not and can not apply to the data. In other words, it is simply a mathematical error to use the inequality in such situations.
No, the fact that Bell's inequality is observed not to work is empirical evidence that one of the assumptions used in the derivation must be false, like the assumption that local realism is true
Hehe, you are again grasping at straws here, trying to sneak in a physical assumption. I have just exhaustively and conclusively explained to you that the requirement to be able to sort the data, and for ρ(λi) to be the same across the three terms is a mathematical requirement of Bell's derivation. In other words, Bell could not have been able to derive his inequalities if these were false. I have also pointed out and you agreed that in any real experiment, these mathematical requirements are not guaranteed to be obeyed. So contrary to your claim that the reason experiments violate Bell's inequality is due to failure of some other physical assumption which you haven't demonstrated to be material for deriving the inequality, the real reason is failure to meet the mathematical conditions that must apply for the inequality to apply to the data.

JesseM said:
billschnieder said:
If we can not do this, it means either that:
a) our data is most likely of the second kind in which randomization did not keep the pairs together or
Well, we know this does not apply in Bell tests, where every data pair is always from a single trial with a single pair of measurements on a single pair of entangled particles.
You do not understand Bell test experiments then. Contrary to your claims, it applies because experimenters are not always sure which particle on one arm corresponds to which particle on the other arm. Have you ever heard of the coincidence time window?

JesseM said:
Also note that these represent the only scenarios in which "average value of a*b for all triples" is different from "average value of a*b for measured pairs only". And in this case, the fair sampling assumption can not hold
What do you mean by "fair sampling assumption"? This page says "It states that the sample of detected pairs is representative of the pairs emitted", but that could be true and Bell's inequality could still fail for some other reason like a violation of the no-conspiracy assumption.
Another objection for objection sake. You object but then present a definition which is essentially what I have given.

billschnieder said:
c) our lists of pairs are not representative of the list of triples from which they arose
If you see a difference, illustrate it.

JesseM said:
Having replied to these, I saw nothing in them that could be considered a rebuttal of any of the points I made in #1213-#1215. I indicated in my replies to #1211 and #1212 where I thought various claims made in those posts had been disputed or questioned in #1213-#1215, so if you disagree with some of the things I say in my recent replies you can go back and address the corresponding arguments/questions in the earlier posts.
All I saw was quibbling, unsubstantiated claims and nothing substantive as I have illustrated in the last few posts.
 
  • #1,226
billschnieder said:
First of all, I said the equation is Bell's definition of HIS expectation values for the situation he is working with.
But then you use that to come to the absurd conclusion that in order to compare with empirical data, we need to make some assumptions about the distribution of values of λ on our three runs. We don't--Bell was writing for an audience of physicists, who would understand that whenever you talk about an "expectation value", the basic definition is always just a sum over each possible measurement result times the probability of that result, so to compare with empirical measurements you just take the average result on all your trials, nothing more. Bell obviously did not mean for his integrals to be the definitions of E(a,b) and E(b,c) and E(a,c), implying that you can only compare them with empirical data if you have actually confirmed that \rho(\lambda) was the same for each run--rather he was making an argument that the "expectation values" as conventionally understood would also be equal to those integrals.
billschnieder said:
Secondly, nobody said anything about the probabilities in the equation not being true probabilities, so you are complaining about an inexistent issue.
You understand that the "true probabilities" represent the frequencies of different outcomes in the limit as the number of trials goes to infinity, and not the actual frequencies in our finite series of trials? So for example, if one run with settings (a,b) included three trials where λ took the value λ3, while another run with settings (b,c) included no trials where it took the value λ3, this wouldn't imply that ρ(λi) differed in the integrals for E(a,b) and E(b,c)? Because your comment at the end of post #1224 suggests you you are still confusing the issue of what it means for the "true probabilities" ρ(λi) to differ depending on the detector settings and what it means for the actual frequencies of different values of λi to differ on runs with different detector settings:
billschnieder said:
JesseM said:
Even if the data was drawn from triples, and the probability of different trials didn't depend on the detector settings on each run, there's no guarantee you'd be able to exactly resort the data in the manner of my example in post #1215, where we were able to resort the data so that every row (consisting of three pairs from three runs) had the same value of a,b,c throughout
That is why I cautioned you earlier not to prematurely blurb your claim that conspiracy must be involved for ρ(λi) to be different. Now we get an admission, however reluctantly that it is possible for ρ(λi) to be different without conspiracy. You see, the less you talk (write), the less you will have to recant later as I'm sure you are realizing.
So, kinda seems like this is not actually a dead issue. You may have noticed I discussed exactly this distinction between the "true probability distribution" ρ(λi) differing from one run to another and the actual frequencies of different λi's differing from one run to another at the very start of post #1214, but since you didn't respond I don't know if you even read that or what you thought of the distinction I was making there.
billschnieder said:
Thirdly, you object to my statement but go on to say the exact same thing. This is what I said after the equation:
Theoretically the above makes sense, where you measure each A(a,.), B(b,.) pair exactly once for a specific λ, and simply multiply with the probability of realizing that specific λ and then add up subsequent ones to get your expectation value E(a,b). But practically, you could obtain the same E(a,b) by calculating a simple average over a representative set of outcomes in which the frequency of realization of a specific λ, is equivalent to it's probability. ie

For example, if we had only 3 possible λ's (λ1, λ2, λ3) with probabilities (0.3, 0.5, 0.2) respectively. The expectation value will be
E(a,b) = 0.3*A(a,λ1)*B(b,λ1) + 0.5*A(a,λ2)*B(b,λ2) + 0.2*A(a,λ3)*B(b,λ3)
You really think that this is the "exact same thing" as what I was saying? Here your "practical" average requires us to know which value of λ occurred on each trial, and what the probability of each value was! Of course this is nothing like what I mean when I talk about comparing the theoretical expectation value to actual experimental data. Again, a definition of the expectation value involving "true probabilities" would be:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

So if you want to compare with empirical data on a run where the detector settings were a and b, it'd just be:

(+1*+1)*(fraction of trials where detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*(fraction of trials where detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*(fraction of trials where detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*(fraction of trials where detector with setting a gets result -1, detector with setting b gets result -1)

...which is equivalent to just computing the product of the two measurements on each trial, and adding them all together and dividing by the number of trials to get the empirical average for the product of the two measurements on all trials in the run.

You quote my simple equation for E(a,b) above and say:
It clearly shows that you do not understand probability or statistics. Clearly the definition of expectation value is based on probability weighted sum,
Which mine is--I'm multiplying each possible result by the probability of that result, for example the result (+1*-1) is multiplied by P(detector with setting a gets result +1, detector with setting b gets result -1)
billschnieder said:
and law of large numbers is used as an approximation, that is why it says in the last sentence above that the expectation values is "almost surely the limit of the sample mean as the sample size grows to infinity"
Of course. In the limit as the number of trials goes to infinity, we would expect this:

(+1*+1)*(fraction of trials where detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*(fraction of trials where detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*(fraction of trials where detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*(fraction of trials where detector with setting a gets result -1, detector with setting b gets result -1)

to approach this:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

...where all the probabilities in the second expression represent the "true probabilities", i.e. the fraction of trials with that outcome in the limit as the number of trials goes to infinity!

So, it's not clear why you think the wikipedia definition of expectation value is somehow different from mine, or that I "do not understand probability or statistics". Perhaps you misunderstood something about my definition.
billschnieder said:
You are trying to restrict the definition by suggesting that expection value is defined ONLY over the possible paired outcomes (++, --, +-, -+) and not possible λ's, but that is naive, and short-sighted but also ridiculous as we will see shortly.
No, all expectation values are just defines as a sum over all possible results times the probability of each possible result. And in this experiment the value of λ is not a "result", the "result" on each trial is just +1 or -1.
billschnieder said:
Now let us go back to the first sentence of the wikipedia definition above and notice the last two words "probability measure". In case you do not know what that means, a probability meaure is simply any real valued function which assigns 1 to the entire probablity space and maps events into the range from 0 to 1. An expectation value can be defined over any such probabiliy measure, not just the one you pick and choose for argumentation purposes. In Bell's equation (2),
\int d\lambda \rho (\lambda ) = 1
Therefore ρ(λ) is a probability measure over the paired products A(a,λ)A(b,λ)
No, ρ(λ) is a probability measure over values of λ, and it happens to be true (according to Bell's physical assumptions) that the value of λ along with the detector angles completely determines the results on each trial. But you can also define a probability measure on the results themselves, that would just be a measure that assigns probabilities between 0 and 1 to each of the four possible results:

1. (detector with setting a gets result +1, detector with setting b gets result +1)
2. (detector with setting a gets result +1, detector with setting b gets result -1)
3. (detector with setting a gets result -1, detector with setting b gets result +1
4. (detector with setting a gets result -1, detector with setting b gets result -1)

With the sum of the four probabilities equalling one. That's exactly the sort of probability measure I was assuming when I wrote down my equation:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

And when trying to compare an equation involving expectation values to actual empirical results, every physicist would understand that you don't need to even consider the question of what values λ may have taken on your experimental runs, instead you'd just compute something like this:

(+1*+1)*(fraction of trials where detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*(fraction of trials where detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*(fraction of trials where detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*(fraction of trials where detector with setting a gets result -1, detector with setting b gets result -1)

...which, by the law of large numbers, is terrifically unlikely to differ significantly from the "true" expectation value if you have done a large number of trials. If you think a physicists comparing experimental data to Bell's inequality would actually have to draw any conclusions about the values of λ on the experimental trials, I guarantee you that your understanding is totally idiosyncratic and contrary to the understanding of all mainstream physicists who talk about testing Bell's inequality empirically.
billshand Bell's equation (2) IS defining an expectation value for paired products irrespective of any physical assumptions. There is no escape for you here.[/QUOTE said:
If equation (2) was supposed to be the definition of the expectation value, rather than just an expression that he would expect the expectation value (under its 'normal' meaning, the one I've given above involving only actual measurable results and the probabilities of each result) to be equal to, then why do you think he would need to make physical arguments as to why equation (2) should be the correct form? Do you deny that he did make physical arguments for the form of equation (2), like in the first paper where he wrote:
Now we make the hypothesis, and it seems one at least worth considering, that if the two measurements are made at places remote from one another the orientation of one magnet does not influence the result obtained with the other. Since we can predict in advance the result of measuring any chosen component of \sigma_2, by previously measuring the same component of \sigma_1, it follows that the result of an such measurement must actually be predetermined. Since the initial quantum mechanical wave function does not determine the result of an individual measurement, this predetermination implies the possibility of a more complete specification of the state.

Let this more complete specification be effected by means of parameters λ ... the result A of measuring \sigma_1 \cdot a is then determined by a and λ, and the result B of measuring \sigma_2 \cdot b in the same instance is determined by b and λ
Do you disagree that here the first paragraph is providing physical justification for why A is a function only of a and λ but not b, and why B is a function of b and λ but not a, along with a justification for why we should believe the result A can be completely determined by a and the hidden parameters λ in the first place? Likewise, in the paper http://cdsweb.cern.ch/record/142461/files/198009299.pdfpapers , would you deny that this section from p. 16 of the pdf (p. 15 of the paper) is trying to provide physical justification for why the same function ρ(λ) appears in different integrals for different expectation values like E(a,b) and E(b,c)?
Secondly, it may be that it is not permissible to regard the experimental settings a and b in the analyzers as independent variables, as we did. We supposed them in particular to be independent of the supplementary variable λ, in that a and b could be changed without changing the probability distribution ρ(λ). Now even if we have arranged that a and b are generated by apparently random radioactive devices, housed in separate boxes and thickly shielded, or by Swiss national lottery machines, or by elaborate computer programmes, or by apparently free willed experimental physicists, or by some combination of all of these, we cannot be sure that a and b are not significantly influenced by the same factors λ that influence A and B. But this way of arranging quantum mechanical correlations would be even more mind boggling than one in which causal chains go faster than light. Apparently separate parts of the world would be deeply and conspiratorially entangled, and our apparent free will would be entangled with them.
If you don't disagree that these sections are attempts to provide physical justification for the form of the integrals he writes, why do you think he would feel the need to provide physical justification if he didn't have some independent meaning of "expectation values" in mind, like the meaning I talked about above involving just the different results and the probabilities of each one?
 
Last edited by a moderator:
  • #1,227
JesseM said:
You understand that the "true probabilities" represent the frequencies of different outcomes in the limit as the number of trials goes to infinity, and not the actual frequencies in our finite series of trials?

You do not understand probability either. Say I give you the following list of

++
--
-+
+-

And ask you to calculate P(++) from it. Clearly the probability is the number of times (++) occurs in the list divided by the number of entries in the list. The list does not have an infinite number of entries, there is no need to perform an infinite number of trials in order to deduce the probability. And even if you did perform a large number of trials, you will not get exactly the true probability which is 1/4. So your "law of large numbers" cop-out is an approximation of the true probability not it's definition. You need to learn some basic probability theory here because you are way off base.

JesseM said:
But then you use that to come to the absurd conclusion that in order to compare with empirical data, we need to make some assumptions about the distribution of values of λ on our three runs. We don't--Bell was writing for an audience of physicists, who would understand that whenever you talk about an "expectation value", the basic definition is always just a sum over each possible measurement result times the probability of that result
Sorry JesseM but that bubble has already been burst, when I proved conclusively that you do not know the meaning of "expectation value". To show how silly this adventitious argument of yours is, I asked you a simple question and dare you to answer it:

billschnieder said:
You are given a theoretical list of N pairs of real-valued numbers x and y. Write down the mathematical expression for the expectation value for the paired product. Once you have done that, try and swindle your way out of the fact that
a) The structure of the expression so derived does not depend on the actual value N. ie, N could be 5, 100, or infinity.
b) The expression so derived is a theoretical expression not "empirical".
c) The expression so derived is the same as the simple average of the paired products.

JesseM said:
So for example, if one run with settings (a,b) included three trials where λ took the value λ3, while another run with settings (b,c) included no trials where it took the value λ3, this wouldn't imply that ρ(λi) differed in the integrals for E(a,b) and E(b,c)? Because your comment at the end of post #1224 suggests you you are still confusing the issue of what it means for the "true probabilities" ρ(λi) to differ depending on the detector settings and what it means for the actual frequencies of different values of λi to differ on runs with different detector settings
You are sorely confused. Note I use ρ(λi) not P(λi) to signify that we are dealing with a probability distribution, which is essentially a function defined over the space of all λ, with integral over all λ equal to 1.

If the (a,b) run included N iterations with three of those corresponding to λ3, P(λ3) for our dataset = 3/N. But if in a different run of the experiment (b,c) none of the λ's was λ3, P(λ3) = 0 for our dataset. It therefore means the proability distribution of ρ(λi) can not be same for E(a,b) and E(b,c). If this is still too hard for you, let me simplify further.

According to Bell, E(a,b) calculated by the following sum

a1*b1*P(λ1) + a2*b2*P(λ2) + ... + an*bn*P(λn) where n is the total number of possible distinct lambdas. ρ(λ) is a function which maps a specific λi to its probability P(λi). By definition therefore, if the function ρ(λ) is the same for two runs of the experiment, it must produce the same P(λi) for both cases. In other words, if it produced different values of P(λi) such as 3/N in one case and 0 in another, it means ρ(λ) is necessarily different between the two and the runs can not be used together as a valid source of terms for comparing with Bell's inequality.

JesseM said:
billschnieder said:
Note, what Bell is doing here is calculating the weighted average of the product A(a,λ)*B(b,λ) for all λ. Which is essentially the expectation value. Theoretically the above makes sense, where you measure each A(a,.), B(b,.) pair exactly once for a specific λ, and simply multiply with the probability of realizing that specific λ and then add up subsequent ones to get your expectation value E(a,b). But practically, you could obtain the same E(a,b) by calculating a simple average over a representative set of outcomes in which the frequency of realization of a specific λ, is equivalent to it's probability. ie

For example, if we had only 3 possible λ's (λ1, λ2, λ3) with probabilities (0.3, 0.5, 0.2) respectively. The expectation value will be
E(a,b) = 0.3*A(a,λ1)*B(b,λ1) + 0.5*A(a,λ2)*B(b,λ2) + 0.2*A(a,λ3)*B(b,λ3)

Where each outcome for a specific lambda exists exactly once. OR we can calculate it using a simple average, from a dataset of 10 data points, in which A(a,λ1),B(b,λ1) was realized exactly 3 times (3/10 = 0.3), A(a,λ2), B(b,λ2) was realized 5 times, and A(a,λ3), B(b,λ3) was realized 2 times; or any other such dataset of N entries where the relative frequencies are representative of the probabilities. Practically, this is the only way available to obtain expectation values, since no experimenter has any idea what the λ's are or how many of them there are.
You really think that this is the "exact same thing" as what I was saying? Here your "practical" average requires us to know which value of λ occurred on each trial
Oh come on! At least be honest about what you claim I am saying! Why would you need to know λ for each trial if you are calculating a simple average!? Go back and answer the example I requested for the expectation value for N pairs of real-valued numbers x and y and if you still do not understand how ridiculous this sounds, ask a gain and I will explain it using yet simpler terms assuming it is possible to simplify this any further.
 
  • #1,228
JesseM said:
So if you want to compare with empirical data on a run where the detector settings were a and b, it'd just be:

(+1*+1)*(fraction of trials where detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*(fraction of trials where detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*(fraction of trials where detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*(fraction of trials where detector with setting a gets result -1, detector with setting b gets result -1)

...which is equivalent to just computing the product of the two measurements on each trial, and adding them all together and dividing by the number of trials to get the empirical average for the product of the two measurements on all trials in the run.
Despite your empty protests, you are still unable to show why the above will be different from a simple average <ab>. Oh wait, you actually agree with my statement that:

billschnieder said:
But practically, you could obtain the same E(a,b) by calculating a simple average over a representative set of outcomes
So yeah, you are saying the exact same thing after objecting to it!

JesseM said:
So, it's not clear why you think the wikipedia definition of expectation value is somehow different from mine, or that I "do not understand probability or statistics"
It is different because your restricts expectation values to only the possible outcomes (++, --, +-, +-) even though expectation values are definied for any probability measure. ρ(λ) is a probability measure over all outcomes, therefore, Bell's equation (2) is a standard mathematical expression for an expectation value, contrary to your morphing claims.

JesseM said:
No, all expectation values are just defines as a sum over all possible results times the probability of each possible result. And in this experiment the value of λ is not a "result", the "result" on each trial is just +1 or -1.
...
No, ρ(λ) is a probability measure over values of λ
Hehe, this is precisely an example of why I say you do not understand probability theory and statistics. In Bell's equation (2), the pair [A(a,λ)B(b,λ)] defines an event, the probability of the event [A(a,λ)B(b,λ)] occurring is P(λ), therefore ρ(λ) IS a probability measure over [A(a,λ)B(b,λ)] whether you like it or not. There are lots of references online. Find me one which says otherwise. No physical assumption is required to obtain this blatant mathematical definition.

JesseM said:
But you can also define a probability measure on the results themselves, that would just be a measure that assigns probabilities between 0 and 1 to each of the four possible results:

1. (detector with setting a gets result +1, detector with setting b gets result +1)
2. (detector with setting a gets result +1, detector with setting b gets result -1)
3. (detector with setting a gets result -1, detector with setting b gets result +1
4. (detector with setting a gets result -1, detector with setting b gets result -1)

With the sum of the four probabilities equalling one. That's exactly the sort of probability measure I was assuming when I wrote down my equation:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)
This is an admission that you were wrong to suggest that Bell's equation (2) is not a valid expectation value unless physical assumptions are also made. Nobody is arguing that there no other valid mathematical expressions for expectation value. You were the one arguing that mathematically defined expectation value must be the one you chose and not the one Bell chose. I'm happy you are now backtracking from that ridiculous position.

JesseM said:
If you think a physicists comparing experimental data to Bell's inequality would actually have to draw any conclusions about the values of λ on the experimental trials, I guarantee you that your understanding is totally idiosyncratic and contrary to the understanding of all mainstream physicists who talk about testing Bell's inequality empirically.
Grasping at straws here to make it look like there is something I said which you object to. Note that you start the triumphant statement with an IF and then go ahead to hint that what you are condemning is actually something I think but you provide no quote of mine in which I said anything of the sort. I thought this kind of tactics was relagated to talk-show tv and political punditry.
 
  • #1,229
JesseM said:
If equation (2) was supposed to be the definition of the expectation value, rather than just an expression that he would expect the expectation value (under its 'normal' meaning, the one I've given above involving only actual measurable results and the probabilities of each result) to be equal to, then why do you think he would need to make physical arguments as to why equation (2) should be the correct form? Do you deny that he did make physical arguments for the form of equation (2) ...
Duh! The whole point is that no physical assumptions are needed! This issue would be dead had you not argued vehemently that without extra physical assumptions, Bell's equation (2) will not be a standard mathematical expression for the expectation value of paired products.

You apparently did not see the following in my earlier post #1211:
billschnieder said:
You could say the reason Bell obtained the same same expression is because he just happened to be dealing with two functions which can have values (+1 and -1) for physical reasons and experiments producing a list of such pairs. And he just happened to be interested in the pair product of those functions for physical reasons. But the structure of the calculation of the expectation value is determined entirely by the mathematics and not the physics. Once you have two variables with values (+1 and -1) and a list of pairs of such values, the above equations should arise no matter the process producing the values, whether physical, mystical, non-local, spooky, super-luminal, or anything you can dream about. That is why I say the physical assumptions are peripheral.
So while it is true that Bell discussed the physical issues of local causality, those issues are peripheral as I have already explained.

JesseM said:
If you don't disagree that these sections are attempts to provide physical justification for the form of the integrals he writes, why do you think he would feel the need to provide physical justification if he didn't have some independent meaning of "expectation values" in mind, like the meaning I talked about above involving just the different results and the probabilities of each one?

Because the meaning of the expression is clear from the expression Bell wrote himself. He is multiplying the paired product A(a,λ)A(b,λ) with their probability P(λ) and integrating over all λ. That is the mathematical definition of an expectation value. You are the one trying to impose on Bell's equation a meaning he did not intend as is evident from what he himself wrote in his original paper. You can't escape this one.

For example:

Let us define A(a,λ) = +/-1 , B(a,λ) just like Bell and say that the functions represent the outcome of two events on two stations one on Earth (A) and another (B) on planet 63, and in our case λ represents non-local mystical processes which together with certain settings on the planets uniquely determine the outcome. We also allow in our spooky example for the setting a on Earth to remotely affect the choice of b instantaneously and vice versa. Note in our example, there is no source producing any entangled particles, everything is happening instantaneously.

The expectation value for the paired product of the outcomes at the two stations is exactly the same as Bell's equation (2). If you disagree, explain why it would be different or admit that the physical assumptions are completely peripheral.
 
  • #1,230
EPR represents only conservation (in the line of the question:also in there issues that are quite different, e.g., "is QM a complete theory?"). For a classical pair, magnetic momentum (for instance) would be conserved along ANY but also along ALL directions. In QM, only one direction at once makes sense, so the spin projection is conserved along ANY direction but NOT ALONG ALL directions. Think of the Uncertainty Principle with reversed time, as proved in 1931 by Einstein, Tolman, and Podolsky. Bell's theorem assumes a form of realism not proven to make sense in the microcosm, at least for the type of coordinates we know (Einstein lie Schodinger thought that one should use other variables, but would have considered the hidden variables of Bell very naive). Assuming like Bell a form of naive microscopic realism that would let one make sense, e.g., of spin projections along at least 3 directions, John Bell proved an inequality already known by Boole in the late ninetieth century for macroscopic properties, but only realism counts there. The (nice) experiments supposed to "prove action at a distance" ONLY proved QM to be right, something that competent people did not doubt so much about anyway: they prove that realism and locality (absence of action at distance so to speak) cannot both hold true, but the only interesting question is to know whether realism (at least in the classical form, i.e., valid for all observables) holds true in the microcosm. A proof has just appeared in the European Journal of Physics to the effect that a Bell theorem holds true without assuming locality, en route to prove that (classical) realism is false, perhaps.
 
  • #1,231
Bill, from reading the last two pages, this seems like a pretty straightforward example of you being mistaken, and JesseM being correct. Posting in bulk isn't changing this, or obscuring that fact in any way from those of us reading this this thread. I just thought you might want that reality check-in.
 
  • #1,232
nismaratwork said:
Posting in bulk isn't changing this

Yeah, and the extremely funny thing is that Bill are accusing others for writing too loooooooooong posts!?

(:biggrin:)
 
  • #1,233
charlylebeaugosse said:
A proof has just appeared in the European Journal of Physics to the effect that a Bell theorem holds true without assuming locality, en route to prove that (classical) realism is false, perhaps.

Extremely interesting! Any links?


P.S. Welcome to PF charlylebeaugosse! :wink:
 
  • #1,235
Last edited:
  • #1,236
DrChinese said:
... Also, this author has written other articles claiming that Bell leads to a rejection of what he calls "weak realism".

I don’t know... but there seems to be other things that are a little "weak" also...? Like this:
"As a consequence classical realism, and not locality, is the common source of the violation by nature of all Bell Inequalities."

I may be stupid, but I always thought one has to make a choice between locality and realism? You can’t have both, can you?

And what is this?
"We prove versions of the Bell and the GHZ theorems that do not assume locality but only the effect after cause principle (EACP) according to which for any Lorentz observer the value of an observable cannot change because of an event that happens after the observable is measured."

To me this is contradictory. If you accept nonlocality, you must accept that the (nonlocal) effect comes before the cause (at speed of light)?
 
  • #1,237
DevilsAvocado said:
I don’t know... but there seems to be other things that are a little "weak" also...? Like this:
"As a consequence classical realism, and not locality, is the common source of the violation by nature of all Bell Inequalities."

I may be stupid, but I always thought one has to make a choice between locality and realism? You can’t have both, can you?

And what is this?
"We prove versions of the Bell and the GHZ theorems that do not assume locality but only the effect after cause principle (EACP) according to which for any Lorentz observer the value of an observable cannot change because of an event that happens after the observable is measured."

To me this is contradictory. If you accept nonlocality, you must accept that the (nonlocal) effect comes before the cause (at speed of light)?

There are some signs - and this is one, GHZ being another, and there are others too - that realism flat out fails no matter what. You could also simply say that reality is contextual and get the same effect. The time symmetry interpretations as well as MWI fall into this category. Pretty much all of the Bohmian/dBBers also acknowledge contextuality.

Keep in mind that in Delayed Choice setups, you can have after the fact entanglement. So that pretty much wrecks his EACP anyway.
 
  • #1,238
DrChinese said:
Keep in mind that in Delayed Choice setups, you can have after the fact entanglement. So that pretty much wrecks his EACP anyway.

Thanks DrC. Great to have you back as the "Concierge" in this messy thread... :wink:
 
  • #1,239
DevilsAvocado said:
Thanks DrC. Great to have you back as the "Concierge" in this messy thread... :wink:

More like the con rather than the concierge. :smile:

Hey, look at my post count! Although JesseM has been smearing me lately on post length...
 
Last edited:
  • #1,240
DrChinese said:
More like the con

But not on Shutter Island, right!?

(:biggrin:)
 
  • #1,241
Message to the Casual Reader

Maybe you are confused by what’s going on in this thread. And maybe you don’t know what to think about extensive and overcomplicated mathematical formulas, claiming to be a serious "rebuttal" of Bell's inequality.

Don’t worry. You are not alone. Let's untie this spurious "Gordian knot".

As already said – all this can be understood by a gifted 10-yearold (which includes DrC & Me, where the former is gifted :smile:).

Let’s start from the beginning, with Bell's theorem:
"[URL – Bell's theorem[/B][/URL]

In theoretical physics, Bell's theorem (AKA Bell's inequality) is a no-go theorem, loosely stating that:
No physical theory of local hidden variables can ever reproduce all of the predictions of quantum mechanics.

It is the most famous legacy of the late physicist John S. Bell.

Bell's theorem has important implications for physics and the philosophy of science as it proves that every quantum theory must violate either locality or counterfactual definiteness.


Right there we can see that "some" in this thread have totally misinterpreted the very basics about Bell's theorem/Bell's inequality – Quantum Mechanics must violate either locality or counterfactual definiteness.

Bell's Theorem is not a diehard proof of nonlocality, never was, never will be.

Counterfactual definiteness (CFD) is another word for objective Realism, i.e. the ability to assume the physical existence of objects and properties of objects defined, whether or not it is measured (or observed or not).

Therefore we can say: Bell's Theorem proves that QM must violate either Locality or Realism.

If we combine Locality and Realism, we get Local Realism (LR), i.e. an object is influenced directly only by its immediate surroundings, and have an objective existence even when not measured.

Now we can see that: Bell's Theorem proves that QM violates Local Realism (LR).

Local Realism just doesn’t work with current understanding of Quantum Mechanics. Note that this is a totally different thing than faster than light (FTL) messaging.



Furthermore we can see that, for example billschnieder, is convinced that Bell's Theorem is an empirical "law of nature", and if he can find a mathematical flaw in this "law of nature", all goes down the drain, including 45 years of hard work, which is of course utterly silly and stupid, because it’s not a "law of nature", it’s a Theorem:
http://en.wikipedia.org/wiki/Theorem"

Theorems have two components, called the hypotheses and the conclusions. The proof of a mathematical theorem is a logical argument demonstrating that the conclusions are a necessary consequence of the hypotheses, in the sense that if the hypotheses are true then the conclusions must also be true, without any further assumptions. The concept of a theorem is therefore fundamentally deductive, in contrast to the notion of a scientific theory, which is empirical.


Deductive reasoning constructs or evaluates deductive arguments, which attempts to show that a conclusion necessarily follows from a set of premises.

Quantum mechanics, on the other hand, is an empirical scientific theory, where information is gained by means of observation, experience, or experiment.

billschnieder is comparing apples and oranges, without knowing what he's doing – in a last hysterical attempt to find some "flaw" in Bell's Theorem:
billschnieder said:
For a dataset of triples, Bell's inequality can never be violated, not even by spooky action at a distance! ... In other words, it is mathematically impossible to violate the inequalities for a dataset of triples, irrespective of the physical situation generating the data, whether it is local causality or FTL.


Pretty obvious, isn’t it? He’s fighting in the dark, totally obsessed with FTL, and completely in ignorance of the other half in Local Realism.

billschnieder is also convinced that he is in possession of the highest IQ of all times. That his simple "High School Freshman Discovery" has been overlooked by thousands of extremely brilliant scientist – including Nobel Laureates – where none of them saw this very simple "rebuttal": To violate Bell's inequality we need a dataset of TRIPLES from TWO entangled objects!

Besides totally hilarious, it’s an inevitable fact that we are dealing with a clear case of the dreadful http://en.wikipedia.org/wiki/Dunning–Kruger_effect" .

Bell's Inequality is a concept, an idea, how to finally settle the long debate between Albert Einstein and Niels Bohr regarding the EPR paradox. Bell's Inequality is not one single mathematical solution – it can be defined in many ways – as DrChinese points out very well:
DrChinese said:
One of the things that it is easy to lose sight of - in our discussions about spin/polarization - is that a Bell Inequality can be created for literally dozens of attributes. Anything that can be entangled is a potential source. Of course there are the other primary observables like momentum, energy, frequency, etc. But there are secondary observables as well. There was an experiment showing "entangled entanglement", for example. Particles can be entangled which have never interacted, as we have discussed in other threads.

And in all of these cases, a realistic assumption of some kind leads to a Bell Inequality; that Inequality is tested; the realistic hypothesis is rejected; and the predictions of QM are confirmed.



There is not one single "Holy Grail of Inequality", as billschnieder assumes, and I’m going to prove it in a very simple example.

billschnieder thrives from complexity - the longer his futile equations gets – the happier he gets, and that goes for his semantic games as well. billschnieder rejects everything that’s beautiful in its simplicity, where there is no room for his erratic ideas.

This example, by Nick Herbert, is known as one of the simplest proofs of Bell's Inequality (and I already know billschnieder going to hate it :devil:):

The setup is standard, one source of entangled pair of photons, and two polarizers that we can position independently at different angles.
13z71hi.png

The entangled source is of that kind, that if both polarizers are set to 0º, we will get perfect agreement, i.e. if one photon gets thru one polarizer the other photon gets thru the other polarizer, and if one is stopped the other is also stopped, i.e. 100% match and 0% discordance.

To start, we set first polarizer at +30º, and the second polarizer at :
16jlw1g.png

If we calculate that discordance (i.e. the number of measurements where we get a mismatching outcome thru,stop / stop,thru), we get 25% according to QM and experiments.

Now, if we set first polarizer to , and the second polarizer to -30º:
106jwrd.png

And calculate this discordance we will naturally get 25% according to QM, this time also.

Now let’s use some of John Bell’s brilliant logic, and ask ourselves:

– What will the discordance be if we set the polarizers to +30º and -30º ...??
2zjm5jk.png

Well that isn’t hard, is it ...!:rolleyes:?

If we assume a local reality, that nothing we do to one polarizer can affect the outcome of the other polarizer, we can formulate this simple Bell Inequality:
N(+30°, -30°) ≤ N(+30°, 0°) + N(0°, -30°)

The symbol N represents the number of discordance (or mismatches).

This inequality is as good as any other you’ve seen in this thread, anybody stating different is a crackpot liar.

(The "is less than or equal to" sign is just to show that there could be compensating changes where a mismatch is converted to a match, but this is not extremely important.)

We can make this simple Bell Inequality even simpler, for let’s say a gifted 10-yearold :smile::
50% = 25% + 25%

This is the obvious local realistic assumption.

But this wrong! According to QM and physical experiments we will now get 75% discordance!
sin2(60º) = 75%

This is completely crazy!? How can the setting of one polarizer affect the discordance of the other, if reality is local?? It just doesn’t make sense!

But John Bell demonstrated by means of very brilliant and simple tools that our natural assumption about a local reality is by over 25% incompatible with the predictions of Quantum Mechanics and all performed physical experiments so far.

We can simplify our inequality even further and say:
25 + 25 = 50

And divide by 25, to get this extremely simple local realistic Bell Inequality:
1 + 1 = 2

How simple can it be ?:-p?

Now we can see that QM predictions and experiments violate this simple inequality:
1 + 1 = 3 !:devil:!​

Conclusion: We do not need dataset of triples, or miles of Bayesian probability, or conspiracy theories, or any overcomplicated math whatsoever – BECAUSE IT’S ALL VERY SIMPLE AND BEAUTIFUL.


Hope this was helpful, and that you now clearly see who the liar in this thread is.

Thanks for the attention.
 
Last edited by a moderator:
  • #1,242
DevilsAvocado;2833234[B said:
Local Realism[/B] just doesn’t work with current understanding of Quantum Mechanics.






Bell's words:

"-My theorem answers some of Einstein's questions in a way that Einstein would have liked the least."


responding to Einstein's:

"-On this I absolutely stand firm. The world is not like this."
 
  • #1,243
DevilsAvocado said:
As already said – all this can be understood by a gifted 10-yearold (which includes DrC & Me, where the former is gifted :smile:).

Let’s start from the beginning, with Bell's theorem:

...

Great post!

And I am gifted, because I got a present for my birthday! (The 10 year old part represents my emotional age, by the way.)
 
  • #1,244
GeorgCantor said:
Bell's words:

"-My theorem answers some of Einstein's questions in a way that Einstein would have liked the least."


responding to Einstein's:

"-On this I absolutely stand firm. The world is not like this."

History has shown that the opinions of such men are less important than the work they leave behind. I think even dogs know at this point that Einstein was an uncompromising figure in the latter half of his life, searching for something which now seems even less likely. Should I raise a family in the manner of Dirac because he was brilliant? Bell's assertion is meaningless without his theorem, and Einstein's rebuttal is meaningless without a foundation.
 
  • #1,245
GeorgCantor said:
Bell's words:

"-My theorem answers some of Einstein's questions in a way that Einstein would have liked the least."


responding to Einstein's:

"-On this I absolutely stand firm. The world is not like this."

Georg, Sources, please?

Thank you, JenniT
 
  • #1,246
JenniT said:
Georg, Sources, please?

Thank you, JenniT



"Bell, in his first
article on hidden variables and contextuality [9], wrote
“the Einstein-Podolsky-Rosen paradox is resolved in the
way which Einstein would have liked least.”


Page 1 of:

"Einstein, Podolsky, Rosen, and Shannon"
Asher Peres
Department of Physics, Technion—Israel Institute of Technology, 32000 Haifa, Israel

http://arxiv.org/PS_cache/quant-ph/pdf/0310/0310010v1.pdf


The quote can also be found in "Quantum Reality" by N.Herbert with the insistence about spooky action "On this I absolutely stand firm. The world is not like this."
 
  • #1,247
nismaratwork said:
History has shown that the opinions of such men are less important than the work they leave behind. I think even dogs know at this point that Einstein was an uncompromising figure in the latter half of his life, searching for something which now seems even less likely. Should I raise a family in the manner of Dirac because he was brilliant? Bell's assertion is meaningless without his theorem, and Einstein's rebuttal is meaningless without a foundation.



You are arguing with yourself or an imaginary version of "me". It must be your fantasy that drives your misguided belief I implied their work wasn't important. I said no such thing.
 
  • #1,248
GeorgCantor said:
You are arguing with yourself or an imaginary version of "me". It must be your fantasy that drives your misguided belief I implied their work wasn't important. I said no such thing.

What was your point exactly?
 
  • #1,249
billschnieder said:
You do not understand probability either. Say I give you the following list of

++
--
-+
+-

And ask you to calculate P(++) from it. Clearly the probability is the number of times (++) occurs in the list divided by the number of entries in the list.
No, you can't calculate the probability just from the information provided, not if we are talking about objective frequentist probabilities rather than subjective estimates. After all, the nature of the physical process generating this list might be such that frequency of ++ in a much greater number of trials would be something other than 0.25, and according to the frequentist definition P(++) is whatever fraction of trials would yield result ++ in the limit as the number of trials went to infinity.
billschnieder said:
So your "law of large numbers" cop-out is an approximation of the true probability not it's definition. You need to learn some basic probability theory here because you are way off base.
Again your argument seems to involve a casual dismissal of the frequentist view of probability, when it is an extremely mainstream way of defining the notion of "probability", and regardless of whether you like it or not, it's a pretty safe bet that Bell was tacitly assuming the frequentist definitions in his proofs since they become fairly incoherent with any more subjective definition of probability (because they deal with "probabilities" of hidden variables that would be impossible for experimenters to measure)
JesseM said:
But then you use that to come to the absurd conclusion that in order to compare with empirical data, we need to make some assumptions about the distribution of values of λ on our three runs. We don't--Bell was writing for an audience of physicists, who would understand that whenever you talk about an "expectation value", the basic definition is always just a sum over each possible measurement result times the probability of that result
billschnieder said:
Sorry JesseM but that bubble has already been burst, when I proved conclusively that you do not know the meaning of "expectation value".
So you deny that the "expectation value" for a test which can yield any of N possible results R1, R2, ..., RN would just be 1/N \sum_{i=1}^N R_i * P(R_i )? (where P(R) is the probability distribution function that gives the probability for each possible Ri) This is the definition of "expectation value" I used, and if you deny that this is true for a test with a finite set of possible results (like the measurement of spin for two entangled particles), then it is you who fails to understand the basic meaning of the term "expectation value". If you agree with this definition but think I have somehow been failing to use it in my own arguments, then you are misunderstanding something, please clarify.
billschnieder said:
To show how silly this adventitious argument of yours is, I asked you a simple question and dare you to answer it:

You are given a theoretical list of N pairs of real-valued numbers x and y. Write down the mathematical expression for the expectation value for the paired product.
It's impossible to write down the correct objective/frequentist expectation value unless we know the sample space of possible results (all possible pairs, which might include possibilities that don't appear on the list of N pairs) along with the objective probabilities of each result (which may be different from the frequency with which the result appears on your list, although you can estimate the objective probability based on the empirical frequency if N is large...it's better if you have some theory that gives precise equations for the probability like QM though).
billschnieder said:
Once you have done that, try and swindle your way out of the fact that
"Swindle", nice. You stay classy Bill!
billschnieder said:
a) The structure of the expression so derived does not depend on the actual value N. ie, N could be 5, 100, or infinity.
If you know the objective probabilities, then it doesn't even depend on the results that happen to appear on the list! But if you're just trying to estimate the true probabilities based on the frequencies on the list, than the accuracy of your estimates (as compared to the actual true probabilities) is likely to be higher the greater N is.
billschnieder said:
b) The expression so derived is a theoretical expression not "empirical".
If you are estimating the probabilities based on the frequencies on the list, then I would call this an empirical estimate of the expectation value, which may be different from the true expectation value. For example, if I know based on theory that a certain test has an 0.5 chance of giving result +1 and an 0.5 chance of giving result -1, then the expectation value is (+1)*(0.5) + (-1)*(0.5)=0. On the other hand, if I don't know the true probabilities of +1 and -1 and am just given a list of results with 51 results that are +1 and 49 results that are -1, then my estimate of the expectation value would be (+1)*(0.51) + (-1)*(0.49) = 0.02, close to the theoretically-derived expectation value of 0 but slightly off.
billschnieder said:
c) The expression so derived is the same as the simple average of the paired products.
Not if you know (or can calculate theoretically) the true probabilities of different results, and they are different from the fraction of trials with each result that appear on the list.
JesseM said:
So for example, if one run with settings (a,b) included three trials where λ took the value λ3, while another run with settings (b,c) included no trials where it took the value λ3, this wouldn't imply that ρ(λi) differed in the integrals for E(a,b) and E(b,c)? Because your comment at the end of post #1224 suggests you you are still confusing the issue of what it means for the "true probabilities" ρ(λi) to differ depending on the detector settings and what it means for the actual frequencies of different values of λi to differ on runs with different detector settings
billschnieder said:
You are sorely confused. Note I use ρ(λi) not P(λi) to signify that we are dealing with a probability distribution, which is essentially a function defined over the space of all λ, with integral over all λ equal to 1.
P(λi) is also a type of probability distribution, the only difference between ρ(λi) and P(λi) is that ρ(λi) is a continuous probability density function (based on the assumption that λ can take a continuous range of values) while P(λi) is a discrete probability distribution--I have in some posts made the simplifying assumption that λ can only take a finite set of possible values rather than being a continuous variable, it makes no real difference to Bell's argument which one we assume.
billschnieder said:
If the (a,b) run included N iterations with three of those corresponding to λ3, P(λ3) for our dataset = 3/N. But if in a different run of the experiment (b,c) none of the λ's was λ3, P(λ3) = 0 for our dataset. It therefore means the probability distribution of ρ(λi) can not be same for E(a,b) and E(b,c)
No, it doesn't mean that, because the ρ(λi) that appears in Bell's equations (along with the P(λi) that appears in the discrete version) is pretty clearly supposed to be an objective probability function of the frequentist type. Anyone who understands what it means to say that for a fair coin P(heads)=0.5 even if an actual series of 20 flips yielded 11 heads and 9 tails should be able to see the difference between the two.

Again, no one is asking you to agree that frequentist definitions are the "best" ones to use in ordinary situations where we are trying to come up with probability estimates from real data, but you can't really deny they are widely used in theoretical arguments involving probabilities, so you might at least consider whether Bell's arguments make sense when interpreted in frequentist terms. If you simply refuse to even talk about the frequentist notion of probability because you have such a burning hatred for it, then probably you're not really interested in trying to understanding Bell's argument in its own terms (i.e., how Bell and other physicists would conceive the argument), but are just trying to make a rhetorical case against it based on showing that it becomes incoherent when we interpret the probabilities in non-frequentist terms.
billschnieder said:
According to Bell, E(a,b) calculated by the following sum

a1*b1*P(λ1) + a2*b2*P(λ2) + ... + an*bn*P(λn) where n is the total number of possible distinct lambdas.
Sure.
billschnieder said:
ρ(λ) is a function which maps a specific λi to its probability P(λi).
Huh? P(λi) is already a function that maps each specific λi to a probability. Bell just uses the greek letter \rho to indicate he's talking about a probability density function on a variable λ which is assumed to be continuous--the "probability density" for a specific value of λ would then not be an actual probability, instead if you want to know the probability that λ was in some finite range (say, between 0.4 and 0.5) you'd integrate the probability density function in that range, and that would give the probability. That's why Bell writes "It is a matter of indifference in the following whether λ denotes a single variable or a set, or even a set of functions, and whether the variables are discrete or continuous. However, we write as if λ were a single continuous parameter ... ρ(λ) is the probability distribution of λ". It's common in QM to use ρ to refer to a probability density, see here and here for example.
billschnieder said:
By definition therefore, if the function ρ(λ) is the same for two runs of the experiment, it must produce the same P(λi) for both cases. In other words, if it produced different values of P(λi) such as 3/N in one case and 0 in another, it means ρ(λ) is necessarily different between the two and the runs can not be used together as a valid source of terms for comparing with Bell's inequality.
Not if we are defining probabilities in a frequentist sense, and I think any physicist reading Bell's work would understand that in his theoretical proof he is indeed using the frequentist definition, so having the same probability distribution for different detector settings need not imply that the frequency of a given λi would actually be exactly the same for two finite runs with different detector settings (just like the claim that two fair coins both have P(heads)=0.5 does not imply that two runs of ten flips with each coin will each produce exactly five heads).
billschnieder said:
JesseM said:
billschnieder said:
But practically, you could obtain the same E(a,b) by calculating a simple average over a representative set of outcomes in which the frequency of realization of a specific λ, is equivalent to it's probability. ie

For example, if we had only 3 possible λ's (λ1, λ2, λ3) with probabilities (0.3, 0.5, 0.2) respectively. The expectation value will be
E(a,b) = 0.3*A(a,λ1)*B(b,λ1) + 0.5*A(a,λ2)*B(b,λ2) + 0.2*A(a,λ3)*B(b,λ3)

Where each outcome for a specific lambda exists exactly once. OR we can calculate it using a simple average, from a dataset of 10 data points, in which A(a,λ1),B(b,λ1) was realized exactly 3 times (3/10 = 0.3), A(a,λ2), B(b,λ2) was realized 5 times, and A(a,λ3), B(b,λ3) was realized 2 times; or any other such dataset of N entries where the relative frequencies are representative of the probabilities. Practically, this is the only way available to obtain expectation values, since no experimenter has any idea what the λ's are or how many of them there are.
You really think that this is the "exact same thing" as what I was saying? Here your "practical" average requires us to know which value of λ occurred on each trial
Oh come on! At least be honest about what you claim I am saying! Why would you need to know λ for each trial if you are calculating a simple average!?
OK, I missed the bolded sentence, but I don't understand how the stuff that preceded it can possibly be consistent with the idea that the experimenter doesn't know what the λ's are. How does the experimenter know that "A(a,λ1),B(b,λ1) was realized exactly 3 times" if he has no idea whether λ1 or some other λ occurred on a given trial? How would you know whether your outcomes were "a representative set of outcomes in which the frequency of realization of a specific λ, is equivalent to it's probability" if you had no idea what the frequency was that each specific λ was realized? Once again your explanation is totally confusing to me, and I suspect to other readers as well, but anytime I misunderstand instead of helpfully correcting me you immediately jump down my throat and accuse me of not being "honest".

Also, what does it even mean to say that a set of outcomes is "representative" if "the frequency of realization of a specific λ, is equivalent to it's probability" when you are using a non-frequentist definition of probability? If we have a set of 3000 outcomes and we somehow know that λ1 occurred on 30 of those, are you using a definition of "probability" where that would automatically imply that the probability of λ1 given that data must be 0.01? (that's what seemed to be implied by your comment quoted at the start that 'Clearly the probability is the number of times (++) occurs in the list divided by the number of entries in the list') For a frequentist the "true" probability of λ1 could certainly be different from 0.01 since the fraction of outcomes with λ1 might approach some other value in the limit as the number of trials approached infinity, but from the way you are defining probabilities it seems like the fraction of trials where λ1 occurs is by definition said to be the "probability" of λ1, so I don't see how any set of outcomes could fail to be "representative". If you are not defining the probability of an event as just the fraction of trials in the dataset where that event occurred, please clarify your definition.

And once again, regardless of your definition, will you at least consider whether Bell's proof makes sense if the probabilities are interpreted in frequentist terms? It seems like most of your critique is based on the assumption that he is defining probabilities in terms of actual outcomes on some finite set of trials, but if he was assuming more "objective" frequentist definitions then this would be a giant strawman argument.
 
Last edited:
  • #1,250
JesseM said:
No, you can't calculate the probability just from the information provided, not if we are talking about objective frequentist probabilities rather than subjective estimates. After all, the nature of the physical process generating this list might be such that frequency of ++ in a much greater number of trials would be something other than 0.25, and according to the frequentist definition P(++) is whatever fraction of trials would yield result ++ in the limit as the number of trials went to infinity.
Who said anything about a physical process. I've given you an abstract mathematical list, and you can't bring yourself to admit that you were wrong, to the point you are making yourself look foolish. P(++) for the list I gave you is 1/4, even a cave man can understand that level of probability theory Jesse! Are you being serious, really?

JesseM said:
billschnieder said:
So your "law of large numbers" cop-out is an approximation of the true probability not it's definition. You need to learn some basic probability theory here because you are way off base.
Again your argument seems to involve a casual dismissal of the frequentist view of probability, when it is an extremely mainstream way of defining the notion of "probability"

Who said anything about frequentist view. All I did was point out to you a basic mainstream fact in probability theory:

Wikipedia (http://en.wikipedia.org/wiki/Law_of_large_numbers):
In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

So you are way off base and I am right to say that you do not understand probability theory.

So you deny that the "expectation value" for a test which can yield any of N possible results R1, R2, ..., RN would just be
1/N \sum_{i=1}^N R_i * P(R_i ) ?

(where P(R) is the probability distribution function that gives the probability for each possible Ri)

Again you are way off base. In probability theory When using the probability of an R as a weight in calculating the expectation value, you do not need to divide the sum by N again. That will earn you an F grade. The correct expression should be:

\sum_{i}^{N} R_i * P(R_i )

For example, if N is 3 and the probabilities of R1, R2 and R3 are (0.3, 0.5, 0.2) the expectation value will R1*0.3 + R2*0.5 + R3*0.2 NOT (R1*0.3 + R2*0.5 + R3*0.2)/3 !
 
Last edited by a moderator:

Similar threads

Replies
45
Views
3K
Replies
4
Views
1K
Replies
18
Views
3K
Replies
6
Views
2K
Replies
2
Views
2K
Replies
100
Views
10K
Back
Top