Is action at a distance possible as envisaged by the EPR Paradox.

JesseM · Aug 3, 2010

billschnieder said:

If by this you mean the inequality in which (a,b,c) mean something different from term to term needs extra assumptions to be valid,

See my most recent response to your post #1186 for a request for clarification on what you mean by "something different from term to term."

billschnieder said:

You are confused. If I have a dataset of triples such as:
a b c
1: + + -
2: + - +
3: + - -
4: - + -
5: - - +
...

in iteration (1), if the experimenter measured (a,b) they will obtain (++) and if they measured (b,c) they would have obtained (+-). So contrary to your statements above, there is no difference between
"average value of a*b for the all triples" and "average value of a*b for all triples for which the experimenter measured a and b".

Sure there's a difference. Suppose our dataset consisted only of the five you mention, and that for each iteration the pair measured was as follows:

a b c
1: + + - (measured a,b)
2: + - + (measured b,c)
3: + - - (measured a,c)
4: - + - (measured a,b)
5: - - + (measured b,c)

In this case, "average value of a*b for all triples" = [(value of a*b for #1) + (value of a*b for #2) + (value of a*b for #3) + (value of a*b for #4) + (value of a*b for #5)]/5 =
[(+1) + (-1) + (-1) + (-1) + (+1)]/5 = -1/5

On the other hand, "average value of a*b for all triples for which the experimenter measured a and b" would only include triple #1 and triple #4, so it'd be [(value of a*b for #1) + (value of a*b for #4)]/2 = [(+1) + (-1)]/2 = 0.

Using pure arithmetical reasoning, we can prove this inequality must hold for all datasets of triples, including the above:

1 + (average value of b*c for all triples)
>= |(average value of a*b for all triples) - (average value of a*c for all triples)|

But without additional assumptions we cannot prove this inequality:

1 + (average value of b*c for all triples where experimenter sampled b and c)
>= |(average value of a*b for all triples where experimenter sampled a and b) - (average value of a*c for all triples where experimenter sampled a and c)|

Do you disagree?

billschnieder said:

The factorization <ab> + <ac> = <a(b+c)> is not my idea, it is Bell's. Look at page 406 of his original paper, the section leading up to equation (15) and shortly there after.

I don't see anything on that page where he does a factorization like P(a,b) + P(a,c) = P(a*(b+c)), can you quote the line you're referring to?

billschnieder said:

Also, you are confusing runs with iterations. Note that within angled brackets, terms such as a1,b1, etc are lists of numbers with values (+1,-1) since we are calculating averages. So if you performed three runs of the experiment in which you measured (a,b) on the first, (a,c) on the second and (b,c) on the third the averages from each run will be

<a1*b1> for run 1
<a2*c2> for run 2
<b3*c3> for run 3

You mean each "run" consists of multiple pairs of particles that are each measured with the same detector settings? And that notation like a1 only refers to the "run" and not the iteration? That's fine with me, but we are still free to introduce some more detailed notation like a_1,3 to mean "the third iteration of the first run", and then my objections to your math could be rephrased in terms of this new notation. For example, if each run consisted of only three iterations, then notation like <a1*b1> could be written out in "long form" as:

(a_1,1*b_1,1 + a_1,2*b_1,2 + a_1,3*b_1,3)/3

...and then I still wouldn't know what to make of the notation <ab> + <ac> = <a(b+c)>. To rephrase my previous comments in terms of this notation

Then <a1*b1> + <a2*c2> would be equivalent to:

(a_1,1*b_1,1 + a_1,2*b_1,2 + a_1,3*b_1,3)/3 + (a_2,1*c_2,1 + a_2,2*c_2,2 + a_2,3*c_2,3)/3

But with the averages written out in this explicit form, I don't see how it makes sense to reduce this to <a(b+c)>. If you think it does make sense, can you show what that factorization would look like written out in the same sort of explicit form?

billschnieder said:

Again you are agreeing while appearing to disagree with me. My argument is that you can not apply Bell's inequality to dataset of pairs obtained in this way UNLESS you sort them such that a1 becomes equivalent to a2 and b2 to b3 etc. If you do not do that, you do not have terms that can be used in Bell's inequality or the one I derived.

I'm not clear on what you mean by "sort them such that a1 becomes equivalent to a2 and b2 to b3 etc.", but later you do give an example of such "sorting" so I'll wait until later to ask questions about it.

In any case, my disagreement once again lies with your claims that Bell's inequality is nothing more than the type of purely arithmetical inequality that we can prove must hold, of this type:

1 + (average value of b*c for all triples)
>= |(average value of a*b for all triples) - (average value of a*c for all triples)|

Rather, Bell's inequality is of this type:

1 + (average value of b*c for all triples where experimenter sampled b and c)
>= |(average value of a*b for all triples where experimenter sampled a and b) - (average value of a*c for all triples where experimenter sampled a and c)|

...and of course some lists of pairs sampled from triplets can violate this second inequality, but the inequality can nevertheless be justified with some additional physical assumptions, which are exactly the ones made in derivations of Bell's inequality.

billschnieder said:

Essentially you have the datasets of pairs as follows:

Run1:
a1b1
-+
++
+-

Run2:
a2c3
++
+-
++

Run3:
b3c3
+-
-+
-+<a1b1> = -1/3
<a2c2> = 1/3
<b3c3> = -1

Note that the symbols in the inequality do not mean the same thing so we can not factor them the way Bell did. You can only factor them if you have a dataset of triples, or you resort the dataset of pairs so that it becomes a dataset of triples. Using your dataset above, let us focus the last two runs. we can write them down as follows:
a2 c2 b3 c3
+ + + -
+ - - +
+ + - +
Clearly we can resort the last two columns so that the c2 column matches the c3 column to get
a2 c2 b3 c3
+ + - +
+ - + -
+ + - +

And since c2 and c3 are now equivalent, we can drop the c3 column altogether and we now have our dataset of triples which can never violate the inequality.

OK, so here you're talking about combining a's and c's from the second run (where a and c were measured) with b's from the third run (where b and c were measured) and treating them all as "triples". In effect you're just showing that if we have 3 runs of N iterations each, then we can prove in a purely arithmetic way that this inequality must hold:

1 + [tex]\mbox{$ 1/N \sum_{i=1}^N $}[/tex] b_j,i*c_k,i (where j represents the run where the experimenter sampled b and c, while k represents the run where a and c were sampled)
>= |[tex]\mbox{$ 1/N \sum_{i=1}^N $}[/tex] a_k,i*b_j,i (where j represents the run where the experimenter sampled b and c, while k represents the run where a and c were sampled) - [tex]\mbox{$ 1/N \sum_{i=1}^N $}[/tex] a_k,i*c_k,i (where k represents the run where a and c were sampled)|

But of course this is totally different from Bell's inequality! Bell's inequality takes this form:

1 + [tex]\mbox{$ 1/N \sum_{i=1}^N $}[/tex] b_j,i*c_j,i (where j represents the run where the experimenter sampled b and c)
>= |[tex]\mbox{$ 1/N \sum_{i=1}^N $}[/tex] a_k,i*b_k,i (where k represents the run where the experimenter sampled a and b) - [tex]\mbox{$ 1/N \sum_{i=1}^N $}[/tex] a_l,i*c_l,i (where l represents the run where the experimenter sampled a and c)|

...and neither Bell nor any other physicist would claim that an inequality like this could be derived in a purely arithmetic manner! You can derive it using additional physical assumptions, which would be mentioned in any really rigorous derivation of a Bell inequality, although Bell's original paper left some things implicit (and perhaps some necessary conditions hadn't occurred to him yet).

DevilsAvocado · Aug 3, 2010

JesseM, let me tell you what is going to happen next. Mr BS is now going to accuse you for not answering his questions, and writing to long posts, and dealing with the wrong subjects that Mr. BS did not 'approved', etc, etc.

After yet another posts, the personal attacks will boost from Mr BS.

And finally it all breaks down in an: "Agreement not to agree."

And after a couple of days (when you are not around), Mr BS will triumphantly conclude that he has indeed proved that Bell's Theorem is faulty, and that no one can prove him wrong.

Funny isn’t it?

JesseM, I truly admire your will to help users here on PF, to learn new important things about science. It’s a very likeable attitude. And I really mean it.

But this is something else. Your discussion with Mr. BS is not about learning and it’s definitely not about science. It’s about obsession, preconceptions and pure madness.

Mr. BS has already decided that Bell's Theorem is dead wrong no matter what – and there is absolutely nothing whatsoever you, or anyone else, can do about it. It’s a fact, in fact the only fact about Mr. BS.

Mr. BS 'approach' is extremely similar to Crackpot Kracklauer and his "http://www.nonloco-physics.000freehosting.com/"", which is nothing more than a crusade against contemporary physics - Quantum Mechanics and Relativity:

``Loco'' (Spanish for `crazy'). Contemporary Physics is vexed by some really ``loco'' ideas, with nonlocality and asymmetric aging leading the list. (The title is an obvious play on the word ``nonlocal,'' which, in this writer's opinion, is the epitome of a `loco' idea.)

This webpage publicizes an independent research project, the goal of which is to purge selected 'loco' ideas from the discipline of Physics. Why? There are two reasons. One is strictly internal to the profession; it is to foster the unification of Quantum Mechanics and Relativity. It is widely recognized, that despite considerable success so far, the job is not done. Most obviously, gravity is not yet included. What's worse really: there is no accepted covariant wave equation for multiple, interacting particles; and, it turns out that, the obstacles to writing such an equation are just those features leading to nonlocality and asymmetric aging. For those interested, the background story:
...
It was with all this in mind, that this research program was undertaken. As a concrete matter, this effort has focused on two features characterized by goofy conceptualism and faulty math: non-locality and asymmetric aging. Below, the progress made attacking these `loco' ideas will be delineated in some detail.

Do I need to say that billschnieder is a big fan of Crackpot Kracklauer...

I truly hope you see what’s "at crazy stake" here? If "independent researcher" Crackpot Kracklauer can prove nonlocality and asymmetric aging wrong – he will find TOE!

And I’m pretty sure Mr. BS agrees...

Open your eyes JesseM.

JesseM · Aug 3, 2010

DevilsAvocado said:

Mr. BS 'approach' is extremely similar to Crackpot Kracklauer and his "http://www.nonloco-physics.000freehosting.com/"", which is nothing more than a crusade against contemporary physics - Quantum Mechanics and Relativity:

In this case I think his arguments are more likely to be inspired by Possible Experience: from Boole to Bell which I have discussed with him in the past (see post 941 and post 961 on this thread). His odd notions about Bell's inequality being purely arithmetical, and about the idea of "resorting" the data which I responded to in post #1191, seem inspired by the discussion in that paper, which showed how a certain inequality superficially similar to one of Bell's can be derived, but if you have the freedom to arbitrarily relabel which values of a are multiplied by which value of b and so forth, then you can violate that inequality. Bill seems to have taken this as some kind of general argument against all versions of Bell's theorem, even though these derivations make more specific assumptions which results from a and b are multiplied (i.e. the ones that actually represent measurements from a single pair of entangled particles with detector settings a and b), so all that the paper really shows is that if you violate some of the assumptions in Bell inequality derivations you can produce violations of those inequalities under local realism. Anyway, if Bill is bothered by the length of my responses he can always ignore the first two and just concentrate on my post #1191 where I focused on this issue of "resorting" and what the exact meaning of the terms in Bell's inequality is supposed to be.

Incidentally, you're probably right that I won't be able to change his mind, but there's a chance I could get him to modify at least some of his arguments, and in any case as long as he keeps posting these anti-Bell arguments it may be useful to other readers to have someone to point out the flaws in these arguments.

DevilsAvocado · Aug 3, 2010

JesseM said:

In this case I think his arguments are more likely to be inspired by Possible Experience: from Boole to Bell which I have discussed with him in the past (see post 941 and post 961 on this thread). His odd notions about Bell's inequality being purely arithmetical, and about the idea of "resorting" the data which I responded to in post #1191, seem inspired by the discussion in that paper, which showed how a certain inequality superficially similar to one of Bell's can be derived, if you have the freedom to arbitrarily relabel which values of a are multiplied by which value of b and so forth, then you can violate that inequality. Bill seems to have taken this as some kind of general argument against all versions of Bell's theorem, even though these derivations make more specific assumptions which results from a and b are multiplied (i.e. the ones that actually represent measurements from a single pair of entangled particles with detector settings a and b), so all that the paper really shows is that if you violate some of the assumptions in Bell inequality derivations you can produce violations of those inequalities under local realism.

Yes, that’s probably correct about "Boole to Bell", and also "some kind of general argument against all versions of Bell's theorem" seems to be spot on. Bill started his 'career' in PF like this:

billschnieder said:

Trying to Understand Bell's reasoning
...
1) Bell's ansatz (equation 2 in his paper) correctly represent those local-causal hidden variables
2). Bell's ansatz necessarily lead to Bell's inequalities
3). Experiments violate Bell's inequalities
Conclusion: Therefore the real physical situation of the experiments is not Locally causal.

There is no doubt in my mind that statement (2) has been proven mathematically since I do not know of any mathematical errors in Bells derivation. Similarly, there is very little doubt in my mind that experiments have effectively demonstrated that Bell's inequalities are violated. I say little doubt because no loophole-free experiments have yet been performed but for the sake of this discussion we can assume that loopholes do not matter.

There is nothing wrong in changing one’s mind about these questions, but the problem is that he cannot explain this mathematical "U-turn" in a convincing way.

As I see it, you punctured his first argument against Bell, and then he needed a new one, and on the way – he made a contradiction to himself!

(Also the extremely farfetched assumption that John Bell was incapable of calculate the probability of getting one red or one white card out of a box, at least makes me laugh.)

JesseM said:

Anyway, if Bill is bothered by the length of my responses he can always ignore the first two and just concentrate on my post #1191 where I focused on this issue of "resorting" and what the exact meaning of the terms in Bell's inequality is supposed to be.

I don’t think Bill is really bothered by the length of your responses. He is bothered by that you have more knowledge and intelligence than him. That’s why he’s upset – you beat him on every point. And as you say – the scrollbar is always there and it does not cost one penny to use (which makes these kinds of 'complaints' hilarious

).

JesseM said:

Incidentally, you're probably right that I won't be able to change his mind, but there's a chance I could get him to modify at least some of his arguments, and in any case as long as he keeps posting these anti-Bell arguments it may be useful to other readers to have someone to point out the flaws in these arguments.

Please note Jesse – my last post was absolutely no critic about you, or your posts. It’s just admirable that you have the determination to continue the 'discussion' with Bill!

I just felt... well how should I put it... almost "sorry" (don’t take it wrong) for you... spending all this time and energy on Bill, when probably nothing in this world will change his mind...

But you are absolutely right – all unjustified arguments should be challenged.

I just can’t get into my head how Bill’s "logic" works... To me this looks like an "Apollo Moon Hoaxer" starting his "mission" by trying to find a mathematical flaw in Newton's law of universal gravitation – "They could never have left the planet in the first place!"

Megalomania...?

DevilsAvocado · Aug 4, 2010

JesseM, if you have the time:

What will happen if we run an EPR-Bell experiment, with entangled photons, and decide to never measure Bob’s photons?

Will Alice photons be measured in the same way as unpolarized light, i.e. a random 50/50 distribution over all angles?

And if the above is correct - Is there any way to distinguish the random 50/50 distribution above from one where we do measure Bob’s photons (without comparing Alice & Bob’s results)?

DevilsAvocado · Aug 4, 2010

I just have to share this quote, from https://www.physicsforums.com/blog.php?b=1816" , which seems to be tailored for "one" in this thread:

"Scientists are clever, but the problem with them is that sometimes they are too clever to see the obvious." -- Hrvoje Nikolic

JesseM · Aug 4, 2010

DevilsAvocado said:

JesseM, if you have the time:

What will happen if we run an EPR-Bell experiment, with entangled photons, and decide to never measure Bob’s photons?

Will Alice photons be measured in the same way as unpolarized light, i.e. a random 50/50 distribution over all angles?

And if the above is correct - Is there any way to distinguish the random 50/50 distribution above from one where we do measure Bob’s photons (without comparing Alice & Bob’s results)?

Yup, Alice's photons alone should show a random 50/50 distribution for a polarizer at any given angle, and I don't think there's any way to tell, just by measuring the polarization of Alice's photons alone, whether they're entangled. However, there are other cases where if you only measure one half of a collection of entangled pairs, you can deduce that they were entangled--for example, if the two photons are entangled in such a way that a measurement of the position of photon B could allow you to deduce which of two slits photon A went through, then the probability distribution for A alone would not show an interference pattern, whereas if you sent a photon through a double-slit without any way of measuring which slit it went through the probability distribution would show interference as in the standard double-slit experiment (there's a discussion on this thread for example).

DevilsAvocado · Aug 4, 2010

JesseM said:

Yup, Alice's photons alone should show a random 50/50 distribution for a polarizer at any given angle, and I don't think there's any way to tell, just by measuring the polarization of Alice's photons alone, whether they're entangled. However, there are other cases where if you only measure one half of a collection of entangled pairs, you can deduce that they were entangled

Right there I almost spilled my coffee all over the place – FTL messaging!

(

)

But then I realized you are talking about the http://en.wikipedia.org/wiki/Delayed_choice_quantum_eraser" , right?

Thanks for the info. Just a small follow-up:

We can regard the two entangled photons in EPR-Bell as unpolarized light. That must mean that there is no "coordinate system" that the photons must "obey". When we talk about calibrating the polarizers, it’s just for our own "convenience" when measuring at 0º, 22.5º, 45º, etc, right?

Or to put it frank – the entangled photons doesn’t "care" about the actual individual angles on the polarizers, it’s the relative angle between the two polarizers that makes all the difference, right?

DrChinese · Aug 4, 2010

billschnieder said:

Oh, I am sure you understand very well what I mean. It is exactly what Bell meant when he said on page 406 of his original article that:

...

Contrary to what you seem to be claiming here, according to Bell (a,b,c) are existing together. So when he goes on to derive his inequality half a page later to be

1 + P(b,c) >= |P(a,b) -P(a,c)|

he has nothing in mind other than that those terms originate from the triple (a,b,c) which exist together. You should know this.
Read page 406 of Bell's original paper. He assumes that there exist 3 vectors (a,b,c), also known as a TRIPLE then he derives the inequalities

1 + P(b,c) >= | P(a,b) - P(a,c)|

Each term in the above contains only two vectors from the same TRIPLE. In other words, the symbols (a,b,c) MUST mean exactly the same thing in each term!

Rubbish. Provide me a dataset of triples which violates the above inequality, for which the terms (a,b,c) mean exactly the same thing in each term. Use whatever assumptions of conspiracy or "source knowing settings" that you like. All I want is for you back up your claim that it is possible for the inequality to be violated without extra assumptions in addition to the existence of triples (a,b,c). I have been asking you this for the last 3-4 posts and you haven't provided one, yet you keep claiming that without your extra assumptions the inequality can be violated.

...

All I ask is that you provide the dataset of triples which violates Bell's inequality. You can impose any physical constrains of your choosing, such as non-locality, conspiracy, and any other feature of your choosing. Just provide the dataset of triples. If you can not, then don't be surprised when I claim that not even FTL can violate Bell's inequalities (claim 6).

I am glad we are on the same page.

If you assume that the triple is simultaneously well defined, you always come into problems - just as you say. I am not certain that is resolved even with an FTL influence - just as you say. (After all, what was the 3rd value and when does it gain, lose or change its value?)

If on the other hand, you insist that (measured) pairs from the triple are all that are real... well, that goes against EPR's criterion in favor of the Bohr approach.

JesseM · Aug 4, 2010

DevilsAvocado said:

But then I realized you are talking about the http://en.wikipedia.org/wiki/Delayed_choice_quantum_eraser" , right?

Yup, that's what I was thinking of, although there are other similar experimental setups where the same is true--basically any setup with the basic form seen in Fig. 2 of this paper (on p. 3 of the pdf)

DevilsAvocado said:

We can regard the two entangled photons in EPR-Bell as unpolarized light. That must mean that there is no "coordinate system" that the photons must "obey". When we talk about calibrating the polarizers, it’s just for our own "convenience" when measuring at 0º, 22.5º, 45º, etc, right?

Or to put it frank – the entangled photons doesn’t "care" about the actual individual angles on the polarizers, it’s the relative angle between the two polarizers that makes all the difference, right?

Right, only the relative angles matter, each angle is defined relative to an arbitrary choice of coordinate system.

DevilsAvocado · Aug 4, 2010

JesseM said:

Right, only the relative angles matter, each angle is defined relative to an arbitrary choice of coordinate system.

Thanks Jesse, that is how I have pictured it. But... I don’t really get why we talk about entangled photons like up/down spin... if polarization is a result of spin... and they are unpolarized...??

Or is the explanation that polarized light looks something like this (where the electric force moves up and down perpendicular to the ray direction):

[PLAIN]http://www.colorado.edu/physics/2000/polarization/images/electroArrow.gif

And unpolarized light looks something like this:

[PLAIN]http://www.colorado.edu/physics/2000/polarization/images/arrowThickAnim.gif

But why are we talking about up/down spin...

JesseM · Aug 4, 2010

DevilsAvocado said:

Thanks Jesse, that is how I have pictured it. But... I don’t really get why we talk about entangled photons like up/down spin... if polarization is a result of spin... and they are unpolarized...??

In classical electromagnetism, I think "polarized" light would just be a beam where if you pick the correct angle for your polarizer 100% of the light will pass through, whereas "unpolarized" would mean no matter what angle you set your polarizer, the intensity would be reduced when the beam passes through it. With individual photons, they have a quantum state which determines the probability they'll make it through a polarizer at any given angle--thinking about it some more, I may have been mistaken to say that they'd always have a 50% chance of passing through a polarizer if their polarization hadn't been previously measured, it might be that even though no polarization measurement had ever been made, knowledge of the properties of the source would give you an initial quantum state that would have different probabilities at different angles, I'm not sure exactly how the initial quantum state of an entangled pair would be defined for a given type of source. Anyway, the main point is that once a photon has passed through a polarizer at a given angle, then it's guaranteed with probability 1 to pass through another polarizer at the same angle (or to have a probability 0 of passing through a polarizer at a 90 degree angle to the first) provided nothing is done to it in between, like passing it through a polarizer at a different angle (if you do that it means there is there is now some finite probability it will pass through a polarizer at a right angle to the first, which can be seen in the very counterintuitive Dirac three polarizers experiment where you have two polarizers at right angles that don't allow any light to get through so they look black, but then if you put another polarizer in between them, you see light coming through all three in the area covered by the middle one). And for photons with entangled polarizations, if one member of the pair passes through a polarizer at a given angle, then you can predict with certainty whether the other will pass through a polarizer at the same angle (or at 90 degrees relative to the first).

DevilsAvocado · Aug 4, 2010

Thanks Jesse, I have to check out the Dirac experiment and think some more. I'll get back tomorrow.

billschnieder · Aug 4, 2010

JesseM said:

No, they don't. The terms in the purely arithmetical inequality are of this form:
(Fraction of all triples with properties A+ and B-)
While the terms in Bell inequalities are of this form:
(Fraction of A,B samples which gave result A+, B-)

Here again you are referring to your strawman inequality, not the inequality I derived for which the terms are exactly the same as Bell's. It's not worth another response. If you are serious about pursuing this, deal with Bell's exact inequality from his original paper, not some toy version which obfuscates the issue.

JesseM said:

If it wasn't for the context I would assume I did understand what this sentence meant--that at a theoretical level we assume the existence of triples, even if we don't assume they're known to the theoretical experimenter

This is just another reason why I say you are confused. You say with the left side of your mouth that you have triples theoretically, then say on your right side that the theoretical experimenter does not have triples. And you attribute such conspiracy to Bell.

Bell did not consider two different theoretical situations. He had one theoretical situation in which properties existed simultaneously for 3 angles. His inequality is derived from this ONLY. There is no mention in his paper about a theoretical experimenter not knowing the third value.

The issue with experimenters not being able to measure simultaneously the third property is a practical issue with data gathering in real actual experiments. So your reference to Bell's later papers where he acknowledges this issue does not change the fact that it does not arise in the derivation of Bell's inequalities.

Without triples, you can not calculate anything comparable to Bell's inequality. For Bell's derivation, this problem is non-existent because he is not considering an actual experiment but a theoretical situation and he in fact simply assumed that a third property existed simultaneously at a third angle and proceeded to derive his inequality. So if you expect me to believe that Bell assumed a theoretical experimenter did not know the third value, and somehow this assumption is very important for the inequality he derived, even though he did not mention it, you are out of luck. In fact, if you must suggest that Bell was dealing with measurements by a theoretical experimenter, then you must also admit that only one of pairs, (a,b) mentioned by Bell is measured and the other two {(a,c), and (b,c)} are deduced from it by theoretical reasoning that there is a third property at angle c! Bell was absolutely not deriving an inequality for a situation in which each pair is measured separately in a different run of the experiment. So if you actually understand Bell's work as you claim to, then this line of argumentation has no other purpose than obfuscation.

So if you don't like my talk about fractions (even though it's completely relevant to other Bell inequalities), you can instead consider the distinction between terms of this type:
(average value of a*b for all triples)
vs. terms of this type:
(average value of a*b for all triples where experimenter sampled a and b)

I have already explained to you why this distinction is artificial for the inequality I derived, and the one Bell derived. The situation may be different for your toy version in which the (a,b,c) do not mean exactly the same thing in each term. But I'm not interested in your toy version. I am only interested in Bell's inequality and the one I derived in which the terms (a,b,c) mean exactly the same thing between terms. In Bell's inequality the the "a" in the first two terms are exactly the same. Same thing for the "b" in the first and last term and same for the "c" in the last two terms. Anything else is not Bell's inequality. The only type of inequality for which your stated difference above exists, is one in which the symbols are different between terms and Bell's inequality is not one of such. Neither is the one I derived. In fact, earlier, you seem to understand this when you said:

JesseM said:

billschnieder said:

Fast forward to then to the resulting CHSH inequality
|E(a,b) + E(a,b') + E(a',b) - E(a',b')| <= 2

In your opinion then, is the P(λi) the same for each of the above terms, or do you believe it doesn't matter.

The same probability distribution should apply to each of the four terms, but the inequality should hold regardless of the specific probability distribution (assuming the universe is a local realist one and the specific experimental conditions assumed in the derivation apply).

Are you trying to recant that admission, or is this new line of argumentation just for argument sake?

If you think the terms in my inequality are different from Bell's explain it using my inequality and Bell's rather than picking two strawmen inequalities of your own in which the terms differ. Why do you shy away from using the directly relevant inequalities?! I refuse to discuss a contrived strawman when you could have simply used the directly relevant inequality.

In your inequality, does P(b,c) refer to "average value of b*c for all triples where experimenter sampled b and c"? If it does, then it's not hard to find a set of triples that violates your inequality. And if it doesn't, then no, the terms in your inequality don't mean the same thing as those in Bell's.

1 + <bc> = |<ab> - <ac>|

This is only guaranteed for a situation in which a dataset of triples can be obtained. If you start off with triples like Bell, there is no problem. But if you start off with datasets of pairs, the above can only be guaranteed if the pairs can be resorted to obtain a dataset of triples. It doesn't mean you need to resort it in order to calculate the terms. It just means being able to resort the data is evidence that the symbols are equivalent. It is just another way of saying the symbols ("a", "b" and "c") mean exactly the same thing from term to term.

Once you have this triple, there is no distinction between "average value of b*c for all triples" and "average value of b*c for all triples where experimenter sampled b and c", It doesn't matter matter how you obtained the triples, whether you started directly with triples, or you resorted the separate pairs. Your distinction between the two is so ridiculous I wonder why you keep insisting on it. If an experimenter measured a certain number of b and c, say M iterations:
- average value of b*c for all triples is:
[tex]\frac{1}{M}\sum_{i}^{M} a_{i}b_{i}[/tex]

- average value of b*c for triples for which the experimenter measure b*c is:
[tex]\frac{1}{M}\sum_{i}^{M} a_{i}b_{i}[/tex]

Or do you expect "all" in the first case to mean the experimenter can calculate an average over values he did not measure? Note also that you are trying to force a distinction where there is none, in an attempt to imply that my inequality is different from Bell's inequality. So if you think "all" in the first case means more cases than were measured, state clearly which case corresponds to Bell's and which one to mine. Is it your claim that Bell's inequality involves averaging over unmeasured terms (an impossibility), or is it your claim that my inequality involves averaging over unmeasured terms? And when you answer that, also answer whether you think actual experimenters ever average over unmeasured terms.

JesseM said:

Sure there's a difference. Suppose our dataset consisted only of the five you mention, and that for each iteration the pair measured was as follows:

a b c
1: + + - (measured a,b)
2: + - + (measured b,c)
3: + - - (measured a,c)
4: - + - (measured a,b)
5: - - + (measured b,c)

What you present above are dataset of pairs from the measurements. We are interested in what was measured. If it wasn't measured, the experimenter does not have it and can not calculate from it. So let us examine this. For clarity and following from the example you were responding to here are the three datasets of pairs

a b
1:+ +
4:- +

b c
2:- +
5:- +

a c
3:+ -

As you can see already, it is not possible to apply this data to Bell's inequality because we can not sort it in order to obtain a dataset of triples. We can not sort by "b" because the two lists of b's are completely different, same for "a" and "c".
The first term involving ab, is calculated with only positive b terms, the second term with only negative b terms, so each symbol (a,b,c) means something different from term to term. This type of data is not guaranteed to obey Bell's inequality nor the one I derived. What your example shows clearly is the fact that it is possible to violate Bell's inequality using a dataset of pairs (my claim 3) UNLESS it is also possible to sort the dataset of pairs to generate a dataset of triples (my claim 1).

Is it your claim that Bell's inequality is supposed to apply to this kind of data as well? If that is what you believe please say so clearly.

JesseM · Aug 5, 2010

billschnieder said:

Here again you are referring to your strawman inequality, not the inequality I derived for which the terms are exactly the same as Bell's. It's not worth another response. If you are serious about pursuing this, deal with Bell's exact inequality from his original paper, not some toy version which obfuscates the issue.

The inequality is neither a strawman nor a "toy version", as I already pointed out:

JesseM said:

You didn't make clear at the outset that "the form being discussed" was the one in his original paper, in this recent discussion of ours I was the first one to bring up a specific mathematical inequality, first in post #1171 where I quoted a paper from Bell and then again in post #1176 where I talked about

Number(A, not B) + Number(B, not C) greater than or equal to Number(A, not C)

Then in post #1179 I again referred to that inequality, showing that the purely arithmetic version of the inequality can't be violated by a series of triples, but a Bell-type inequality with the same equation can be. It wasn't until post #1182 that you brought up the inequality |ab+ac|-bc <= 1. It's not really fair that you should have total control over the terms of the discussion in this way, but as seen above I'm fine with discussing this inequality too. Still it's a bit much that you now accuse me of an attempt at obfuscation because I brought up a specific example in what had previously been an overly abstract discussion, and then I didn't immediately drop that example when you brought up a slightly different one.

Also, the inequality I mention is hardly "obscure", if you didn't have a single-minded interest in Bell's original paper only and instead looked at discussions of Bell's inequality by other authors, you'd see that this inequality is mentioned more often in introductory discussions of Bell's proof than the one in the original paper, perhaps because it's so much simpler to see how it's derived (I gave a quick derivation in post #1179 when I said 'the proof is trivial--every triplet with A+ and C- must either be of type A+B+C- or type A+B-C-, and if the former it will also contribute to the number with B+ and C-, if the latter it will also contribute to the number with A+ and B-'). I already gave a link to one website which uses it as a starting point, and wikipedia refers to this inequality as http://en.wikipedia.org/wiki/Sakurai's_Bell_inequality]Sakurai's[/PLAIN] Bell inequality because it appeared in Sakurai's widely-used 1994 textbook on QM (the wikipedia article mentions a number of other well-known papers and books on Bell's proof that have used it).

billschnieder said:

This is just another reason why I say you are confused. You say with the left side of your mouth that you have triples theoretically, then say on your right side that the theoretical experimenter does not have triples. And you attribute such conspiracy to Bell.

Bell did not consider two different theoretical situations. He had one theoretical situation in which properties existed simultaneously for 3 angles. His inequality is derived from this ONLY. There is no mention in his paper about a theoretical experimenter not knowing the third value.

As I said before, his original paper was written for an audience of scientists, the argument was fairly condensed and certain things were left implicit because he assumed the audience would understand. Reading the paper carefully, any physicist would understand that when he writes terms like P(a,b), he is referring to the expectation value for a pair of measurements on an entangled pair with detectors setting a and b (and each result being +1 or -1), which is equivalent to the average measurement result over a very large (approaching infinity) series of measurements with detector settings a and b.

Note that he does refer explicitly to a pair of measurements on the first page:

Measurements can be made, say by Stern-Gerlach magnets, on selected components of the spins [tex]\sigma_1[/tex] and [tex]\sigma_2[/tex]. If measurement of the component [tex]\sigma_1 \cdot a[/tex], where a is some unit vector, yields the value +1 then, according to quantum mechanics, measurement of [tex]\sigma_2 \cdot a[/tex] must yield the value -1 and vice versa

Do you doubt that here he is talking about a single pair of measurements on a single pair of particles, rather than averages or "resorted" pairs of measurements taken from two distinct pairs of entangled particles, since that's the only case where the results are guaranteed to be +1 and -1? If you don't disagree with this, note where he goes on to say that this implies that "the result of any such measurement must actually be predetermined", the implication here is that if we are choosing between three measurement angles 1, 2, 3, then any given pair of entangled particles must have a triplet of "predetermined" measurement results for each angle. He goes on to say that the parameters predetermining these measurement results can be encapsulated in the variable λ, and that:

The result A of measuring [tex]\sigma_1 \cdot a[/tex] is then determined by a and λ, and the result B of measuring [tex]\sigma_2 \cdot b[/tex] in the same instance is determined by b and λ, and

A(a,λ) = ±1, B(b,λ) = ±1

So here he clearly is talking about a pair of measurement results (by a hypothetical experimenter or team of experimenters), given the assumption that the two results are determined by the two detector angles a and b and the value of λ which represents all the hidden variables with that single pair of entangled particles (where each specific value of λ gives a triplet of 'predetermined' results if the experimenters have three possible detector angles they're choosing from). Then he goes on to say:

If [tex]\rho(\lambda)[/tex] is the probability distribution of λ then the expectation value of the product of the components [tex]\sigma_1 \cdot a[/tex] and [tex]\sigma_2 \cdot b[/tex] is

[tex]P(a,b) = \int d\lambda \rho(\lambda) A(a,\lambda)B(b,\lambda)[/tex] (2)

So remembering that A(a,λ) and B(b,λ) each represented a "result" of "measuring" a member of an entangled pair, with detector angles a and b respectively, you can tell from this integral that he's calculating an "expectation value" (his words) for the product of a pair of measurements (by a hypothetical experimenter or team of experimenters). In general, if you have some finite number N of possible results R_i for a given measurement, and you know the probability P(R_i) for each result, the "expectation value" is just:

[tex]E = \sum_{i=1}^N R_i * P(R_i )[/tex]

If you perform a large number of measurements of this type, the average result over all measurements should approach this expectation value.

If we imagine that λ can only take a finite set of values, so we can write a discrete version of Bell's integral (2) above, it's more clear why it has the form of an expectation value:

[tex]\sum_{i=1}^N [A(a,\lambda_i)*B(b,\lambda_i)] * P(\lambda_i)[/tex]

...so if you perform a large number of measurements with detector angles a and b, and for each trial/iteration you calculate the product of your pair of measurement results (assumed to be determined by the value of λ which is assumed to give a triplet of predetermined results for the three possible detector angles you're choosing from), then if you take the average of the product of the two measurement results over all these trials/iterations with detector angles a and b, it should approach the "expectation value". This is why the inequality 1 + P(b,c) >= |P(a,b) -P(a,c)| can be understood as a prediction that theoretical experimenters in a theoretical universe with local realist laws should see, in the limit as the number of trials/iterations with each pair of detector angles becomes very large, that

1 + (average value of product of measurement results for all particle pairs where experimenters used detector angles b and c)
>= |(average value of product of measurement results for all particle pairs where experimenters used detector angles a and b) - (average value of product of measurement results for all particle pairs where experimenters used detector angles a and c)|

Bell does make the theoretical assumption that in a local realist universe, the fact that they always get opposite results when they choose the same detector angle implies that each particle pair was associated with a λ that gave it a triple of predetermined results for all three angles a,b,c. But this is just an assumption made in the derivation of the inequality, the inequality itself deals only with expectation values for pairs of measurement results seen by the theoretical experimenters on each trial/iteration of the experiment.

If you think my interpretation of his words and equations is incorrect (and I guess you probably will since you always find some reason to disagree with whatever I say, but as I said to DevilsAvocado I'm mainly writing for the purpose of showing other readers why your claims don't make sense), then please point out precisely where, and give your own interpretation of whatever quote/equation you think I have misinterpreted.

billschnieder said:

The issue with experimenters not being able to measure simultaneously the third property is a practical issue with data gathering in real actual experiments. So your reference to Bell's later papers where he acknowledges this issue does not change the fact that it does not arise in the derivation of Bell's inequalities.

Well, see above. The assumption of triples is used in the derivation, but the final inequality he derives concerns only expectations about pairs of measurement results, which is why it can be checked against actual real-world measurement results even though we can never measure more than two angles for a given entangled pair.

billschnieder said:

Without triples, you can not calculate anything comparable to Bell's inequality. For Bell's derivation, this problem is non-existent because he is not considering an actual experiment but a theoretical situation

The theoretical situation concerns expectation values in a theoretical series of measurements. If this wasn't the case there would be no way to make a theoretical comparison with the expectation values in QM, since QM only gives expectation values for measurement results, not for any hidden variables.

billschnieder said:

In fact, if you must suggest that Bell was dealing with measurements by a theoretical experimenter, then you must also admit that only one of pairs, (a,b) mentioned by Bell is measured and the other two {(a,c), and (b,c)} are deduced from it by theoretical reasoning that there is a third property at angle c!

No. In equation (13) he deduces from the fact that the experimenters always get opposite results when they choose the same angle that A(a,λ)=-B(a,λ) (and since a can stand for any angle, it naturally follows from this that A(b,λ)=-B(b,λ) and A(c,λ)=-B(c,λ)). This means that equation (2) which I quoted earlier could be rewritten as:

[tex]P(a,b) = -\int d\lambda \rho(\lambda) A(a,\lambda)A(b,\lambda)[/tex]

And by the same token, you can see from the equation for P(a,b) - P(a,c) at the top of p. 406 that he is assuming P(a,c) is derived theoretically in exactly the same way:

[tex]P(a,c) = -\int d\lambda \rho(\lambda) A(a,\lambda)A(c,\lambda)[/tex]

So just like P(a,b), P(a,c) is an "expectation value" for the product of two measurements with the detectors set to angles a and c, and as I already pointed out, any "expectation value" can be understood as the average for a very large number of measurements of the desired quantity (i.e. 'the product of two measurements with detectors set to angles a and c).

Then a few lines down he writes an equation whose right side is [tex]\int d\lambda \rho(\lambda) [1 - A(b,\lambda)A(c,\lambda)][/tex] and then says "The second term on the right is P(b,c)", which indicates he is also assuming that

[tex]P(b,c) = -\int d\lambda \rho(\lambda) A(b,\lambda)A(c,\lambda)[/tex]

So, what I just said about P(a,c) also applies to P(b,c).

billschnieder said:

Bell was absolutely not deriving an inequality for a situation in which each pair is measured separately in a different run of the experiment.

Oh, but he absolutely was, and if you ask any other non-crackpot who is knowledgeable about Bell's theorem (DrChinese, say) I'm sure they'll tell you the same thing. I'm pretty sure I could also find you other papers on Bell's theorem, by other physicists or perhaps Bell himself, which would make more clear that this is widely understood as the physical meaning of expectation values that appear in Bell inequalities--would you like me to try, or are you going to stick with the fundamentalist strategy of only looking at one holy text in isolation, ignoring any wider context (like the understanding of other physicists through the years) that might make more clear the meaning of any ambiguous parts?

billschnieder said:

The situation may be different for your toy version in which the (a,b,c) do not mean exactly the same thing in each term. But I'm not interested in your toy version. I am only interested in Bell's inequality and the one I derived in which the terms (a,b,c) mean exactly the same thing between terms. In Bell's inequality the the "a" in the first two terms are exactly the same.

"a" is just a detector angle rather than a result like +1 or -1, the text makes that clear, so of course it means the same thing everywhere. But P(a,b) is an expectation value (he called it that himself), which can be understood as the average value of the product of two measurements on a pair of entangled particles with detectors at angles a and b, in the limit as the number of particle pairs measured in this way goes to infinity.

billschnieder said:

The only type of inequality for which your stated difference above exists, is one in which the symbols are different between terms and Bell's inequality is not one of such.

The symbols a,b,c refer to angles and so don't have different meanings between terms, but each of P(a,b) and P(b,c) and P(a,c) is an expectation value, and to connect that to real or theoretical measurements you have to imagine P(a,b) is the average of the product of results in a run with detectors at angles a and b, P(b,c) is the average for a run with detectors at angles b and c, etc. If you argue this point you're not just arguing with me, you're arguing against the interpretation physicists have had for years about what the inequality is predicting about measurement results, an interpretation which Bell could have corrected if he disagreed with it (and if we looked through enough of his writings I bet we could find explicit confirmation this was his interpretation of the meaning of the terms as well)

billschnieder said:

Neither is the one I derived. In fact, earlier, you seem to understand this when you said:

JesseM said:

billschnieder said:

Fast forward to then to the resulting CHSH inequality
|E(a,b) + E(a,b') + E(a',b) - E(a',b')| <= 2

In your opinion then, is the P(λi) the same for each of the above terms, or do you believe it doesn't matter.

The same probability distribution should apply to each of the four terms, but the inequality should hold regardless of the specific probability distribution (assuming the universe is a local realist one and the specific experimental conditions assumed in the derivation apply).

Are you trying to recant that admission, or is this new line of argumentation just for argument sake?

Why do you think that contradicts anything I have been saying recently? If P(λi) is the same for each of the above terms, that just means the frequencies of getting different values of λi on a near-infinite run of trials with detector settings a and b should be the same as the frequencies of different values of λi on a near-infinite run of trials with detector settings a and b', and so forth. For example, if on the first run with detectors set to a and b it was true (though not known to the experimenters) that 2.3% of trials/iterations had hidden variables described by λ1 and 3.8% of trials/iterations had hidden variables described by λ2, then we are making the theoretical assumption that on the second run with detectors set to a and b' it was also true that 2.3% of trials/iterations had hidden variables described by λ1 and 3.8% of trials/iterations had hidden variables described by λ2. So in no way does this contradict the idea that each expectation value concerns a different run of trials.

billschnieder said:

If you think the terms in my inequality are different from Bell's explain it using my inequality and Bell's rather than picking two strawmen inequalities of your own in which the terms differ.

I have done that several times, whenever I point out that the terms in your inequality have a meaning of this type (with the understanding that here I use notation like b*c to refer not to the product of two detector angles, but the product of the predetermined results +1 or -1 for b and c in a given triple):

1 + (average value of b*c for all triples)
>= |(average value of a*b for all triples) - (average value of a*c for all triples)|

while the terms in Bell's inequality have a meaning of this type

1 + (average value of b*c for all triples where experimenter sampled b and c)
>= |(average value of a*b for all triples where experimenter sampled a and b) - (average value of a*c for all triples where experimenter sampled a and c)|

JesseM · Aug 5, 2010

(continued)

billschnieder said:

1 + <bc> = |<ab> - <ac>|

This is only guaranteed for a situation in which a dataset of triples can be obtained. If you start off with triples like Bell, there is no problem. But if you start off with datasets of pairs, the above can only be guaranteed if the pairs can be resorted to obtain a dataset of triples.

No, there is another way besides your bizarre notions about "resorting". An inequality of this type:

1 + (average value of b*c for all triples where experimenter sampled b and c)
>= |(average value of a*b for all triples where experimenter sampled a and b) - (average value of a*c for all triples where experimenter sampled a and c)|

Is obviously not guaranteed to hold for an arbitrary list of triples with a choice of which pair were measured for each triple, but it will hold if you make two additional assumptions

1. The subset of triples (the 'run') where experimenter sampled b and c is very large (approaching infinity), and likewise for the subset where experimenter sampled a and b, and the subset where experimenter sampled a and c

2. the process that generates the list of triples for each subset has the same probability of generating a given triple (like a=+1, b=-1, c=+1) for each new entry on the list, regardless of which two measurements are made in that subset

With these two additional assumptions you do have a basis for deriving an inequality of the form I wrote, despite the fact that each term deals with averages for a different subset of triples, rather than each term being based on the same set of triples.

billschnieder said:

It doesn't mean you need to resort it in order to calculate the terms. It just means being able to resort the data is evidence that the symbols are equivalent. It is just another way of saying the symbols ("a", "b" and "c") mean exactly the same thing from term to term.

I still don't know what you mean by "mean exactly the same thing from term to term". a, b and c are just placeholders, for each triple each one can take value +1 or -1, for example in the first triple on your list you might have a=+1 while on the second triple you might have a=-1. Do you just mean that each term deals with averages from exactly the same list of triples, rather than each term dealing with averages from a separate list of triples?

billschnieder said:

Once you have this triple, there is no distinction between "average value of b*c for all triples" and "average value of b*c for all triples where experimenter sampled b and c"

I don't get how you can say "no distinction" when I gave you a clear example of what I meant by this:

a b c
1: + + - (measured a,b)
2: + - + (measured b,c)
3: + - - (measured a,c)
4: - + - (measured a,b)
5: - - + (measured b,c)

In this case, "average value of a*b for all triples" = [(value of a*b for #1) + (value of a*b for #2) + (value of a*b for #3) + (value of a*b for #4) + (value of a*b for #5)]/5 =
[(+1) + (-1) + (-1) + (-1) + (+1)]/5 = -1/5

On the other hand, "average value of a*b for all triples for which the experimenter measured a and b" would only include triple #1 and triple #4, so it'd be [(value of a*b for #1) + (value of a*b for #4)]/2 = [(+1) + (-1)]/2 = 0.

Your response consisted of somehow saying that if the theoretical experimenter only sampled pairs, then this was really a "list of pairs" despite the fact that they were drawn from triples which we (playing the role of an omniscient being looking down on the lowly human experimenter) do know. But in that case I have no idea what you could possibly mean by the phrase "average value of b*c for all triples where experimenter sampled b and c", if you don't mean something like what I did above (you must have something definite in mind or you hopefully wouldn't have said there was 'no difference' between this and 'average value of b*c for all triples') So can you explain how you interpret the phrase "average value of b*c for all triples where experimenter sampled b and c", preferably with a simple example like mine above?

Anyway, I think you now understand what I mean when I say "(average value of b*c for all triples where experimenter sampled b and c)", so even if you don't like my phrasing I'll ask you not to willfully misread me by substituting in the meaning you think that phrase "should" have. Hopefully you now agree that an inequality like this:

1 + (average value of b*c for all triples where experimenter sampled b and c)
>= |(average value of a*b for all triples where experimenter sampled a and b) - (average value of a*c for all triples where experimenter sampled a and c)|

...cannot be derived from arithmetic alone, although with some additional theoretical assumptions like the one that says a given triple is equally likely to occur regardless of what the experimenter sampled, you can derive it (and that's exactly what derivations of Bell inequalities do).

billschnieder said:

It doesn't matter matter how you obtained the triples, whether you started directly with triples, or you resorted the separate pairs.

No one but you would interpret the terms of Bell's inequality in terms of "resorting" experimental data on pairs (whether theoretical experiments or actual experiments) to create triples, that's just a weird misconception you probably got from Da Raedt's paper. Trust me, no mainstream physicist who has ever done their own derivation of Bell's theorem was ever thinking in terms of that kind of resorting (i.e. multiplying +1's and -1's from different trials/iterations). If they thought about how the terms would relate to experimental data at all (as opposed to just thinking of them as abstract 'expectation values' which can be compared to quantum-mechanical expectation values), they were thinking of something along the lines of my "(average value of b*c for all triples where experimenter sampled b and c)".

billschnieder said:

Your distinction between the two is so ridiculous I wonder why you keep insisting on it. If an experimenter measured a certain number of b and c, say M iterations:
- average value of b*c for all triples is:
[tex]\frac{1}{M}\sum_{i}^{M} a_{i}b_{i}[/tex]

- average value of b*c for triples for which the experimenter measure b*c is:
[tex]\frac{1}{M}\sum_{i}^{M} a_{i}b_{i}[/tex]

Or do you expect "all" in the first case to mean the experimenter can calculate an average over values he did not measure?

No, because when I say "(average value of b*c for all triples) I'm not talking about what the experimenter calculates at all, I'm just dealing with a model where we take the role of an omniscient being who knows the value of all triples even though the hypothetical experimenter does not. If you object to this, just remember that Bell's whole proof is based on figuring out some constraints on what would be calculated if we could know impossible-to-know-in-practice facts like the [tex]\rho(\lambda)[/tex] (under the assumption that there is some objective truth about such things, whether experimenters know it or not).

billschnieder said:

Note also that you are trying to force a distinction where there is none, in an attempt to imply that my inequality is different from Bell's inequality. So if you think "all" in the first case means more cases than were measured

It just means "all" the triples. It doesn't matter whether the triples are assumed to represent the real truth about predetermined results for all three angles on a single trial/iteration involving a single pair of particles, or whether the triples are weird Frankenstein monsters created be stitching together measurements from two or more different pairs of particles (your idiosyncratic 'resorting' idea, which again is not what any mainstream physicists are thinking of when they write down Bell inequalities).

billschnieder said:

Is it your claim that Bell's inequality involves averaging over unmeasured terms (an impossibility), or is it your claim that my inequality involves averaging over unmeasured terms? And when you answer that, also answer whether you think actual experimenters ever average over unmeasured terms.

"no" to all of the above. Again, your inequality is of this form:

1 + (average value of b*c for all triples)
>= |(average value of a*b for all triples) - (average value of a*c for all triples)|

...but I understand that you aren't talking about triples representing all three predetermined values on a single trial/iteration (since all three can't be measured), but rather about Frankentriples created by "resorting". Meanwhile, the terms in Bell's inequality are expectation values, so for a large number of trials/iterations they can be understood as:

1 + (average value of b*c for all triples where experimenter sampled b and c)
>= |(average value of a*b for all triples where experimenter sampled a and b) - (average value of a*c for all triples where experimenter sampled a and c)|

Here the "triples" are not known by the experimenters, only the value for b and c is known on trials/iterations where b and c were sampled, etc. So, you could rewrite Bell's inequality as:

1 + (average value of b*c for trials/iterations where experimenter sampled b and c)
>= |(average value of a*b for trials/iterations where experimenter sampled a and b) - (average value of a*c for trials/iterations where experimenter sampled a and c)|

However, the assumption that there are triples associated with each particle even if we don't know them (and that the probability of a given triple occurring each time does not depend on which pair are sampled) is important to deriving the inequality, though

billschnieder said:

What you present above are dataset of pairs from the measurements. We are interested in what was measured. If it wasn't measured, the experimenter does not have it and can not calculate from it.

No, but we can derive statistical constraints on what the experimenters will see based on the assumption that their results are coming from a set of preexisting triples, even if we don't know the value of all three--that's what derivations of Bell inequalities are all about.

billschnieder said:

So let us examine this. For clarity and following from the example you were responding to here are the three datasets of pairs

a b
1:+ +
4:- +

b c
2:- +
5:- +

a c
3:+ -

As you can see already, it is not possible to apply this data to Bell's inequality because we can not sort it in order to obtain a dataset of triples.

Although the assumption of triples is involved in deriving Bell's inequality, to check whether data satisfies the inequality or not we don't need a "dataset of triples", this is just your weird misconception. P(a,b) is the expectation value for the product of two measurement results with detectors set to a and b, so we'd take a dataset of pairs which each represent two measurements on a pair of entangled particles with detectors set to a and b, and calculate the average of each pair. Likewise for P(b,c) and P(a,c). That's what everyone understands a test of Bell's inequality against real data to involve, no one thinks in terms of constructing artificial Frankentriples. Think about it: if they did first use the data to construct a single list of triples and then calculate 1 + (average value of b*c for all triples)
>= |(average value of a*b for all triples) - (average value of a*c for all triples)|, it would be mathematically impossible for such an inequality to be violated by a single list of triples, and yet experimenters report violations of Bell inequalities all the time!

billschnieder said:

This type of data is not guaranteed to obey Bell's inequality nor the one I derived.

Yes it is, you just have to add some additional assumptions beyond just the idea that each data pair was obtained from a triple of preexisting values. I mentioned the assumptions at the top of this post. And to get back to the start, this is why your (1) is wrong--Bell's inequality is not the type of purely arithmetic inequality you're thinking of, it's an inequality dealing with pairs, and additional assumptions beyond basic arithmetic are used to derive it.

DevilsAvocado · Aug 5, 2010

JesseM said:

which can be seen in the very counterintuitive Dirac three polarizers experiment where you have two polarizers at right angles that don't allow any light to get through so they look black, but then if you put another polarizer in between them, you see light coming through all three in the area covered by the middle one

Yes, this is cool and we can run this http://www.lon-capa.org/~mmp/kap24/polarizers/Polarizer.htm" to verify. First set the 3 polarizers to:

Ang1 = 90
Ang2 = 90
Ang3 = 0

0.0% light will get thru. Now change to:

Ang1 = 90
Ang2 = 45
Ang3 = 0

12.50% light will get thru!

JesseM said:

In classical electromagnetism, I think "polarized" light would just be a beam where if you pick the correct angle for your polarizer 100% of the light will pass through, whereas "unpolarized" would mean no matter what angle you set your polarizer, the intensity would be reduced when the beam passes through it. With individual photons, they have a quantum state which determines the probability they'll make it through a polarizer at any given angle

Of course you’re right. It was a mistake by me to bring in the http://en.wikipedia.org/wiki/Wave-particle_duality" ... we have enough "perplexity" in this thread already

, sorry.

Spin of light beams is one thing. Spin of photons another...

I probably get back on this, but...

JesseM said:

thinking about it some more, I may have been mistaken to say that they'd always have a 50% chance of passing through a polarizer if their polarization hadn't been previously measured, it might be that even though no polarization measurement had ever been made, knowledge of the properties of the source would give you an initial quantum state that would have different probabilities at different angles, I'm not sure exactly how the initial quantum state of an entangled pair would be defined for a given type of source.

I did think this thru once more, and afaict they must always have a 50% chance, no matter what... otherwise there’s an obvious risk of FTL messaging.

Let’s say that we set Alice at 22.5º and Bob at 0º, but we decide not to measure Bob’s photons. If we run 6 pairs of entangled photons, we could get something like this for Alice:

Code:

	[B]Angle	Corr.	Measure[/B]
--------------------------------
[B]Alice[/B]	22.5º	?	101010

Now, if we had the possibility to do time travel, and could rewind the experiment, we would see that Bob’s measurement must have looked something like this:

Code:

	[B]Angle	Corr.	Measure[/B]
--------------------------------
[B]Alice[/B]	22.5º	85%	101010
[B]Bob[/B]	0º	85%	101011

(cos^2(22.5) = 85% ≈ 5/6)

Now, let’s say we do not always have the 50% random probability, we could get "tidy" results like this, and thereby determine if Bob are measuring his photons, or not, thus it would provide a mechanism for FTL messaging...

Code:

	[B]Angle	Corr.	Measure[/B]
--------------------------------
[B]Alice[/B]	22.5º	85%	111111
[B]Bob[/B]	0º	85%	111110

All this is of course extremely simplified, and will only be valid on a large sampling of photons.

Agree? Or do you see any weakness in my reasoning...?

billschnieder · Aug 6, 2010

JesseM said:

P(a,b), he is referring to the expectation value for a pair of measurements on an entangled pair with detectors setting a and b (and each result being +1 or -1), which is equivalent to the average measurement result over a very large (approaching infinity) series of measurements with detector

Note the underlined texts as we will come back to it. Now let us consider our previous discussion about this in post #857.

JesseM said:

billschnieder said:

Is the equation as it stands indicating that the numerical value represents what is obtained by measuring a specific pair of settings (ai, bi) a large number of times, or is it indicating that expectation value is what will be obtained my measuring a large number of different pairs of angles (ai,bi)?

The first, I think he's calculating the expectation value for some specific pair of settings. If he wanted to talk about the expectation value for a variety of different ai's I think he'd need to have a sum over different values of i in there.

billschnieder said:

So then, let us consider a specific pair of settings (a, b), and presume that we have calculated an expectation value from equation (2) of Bell's paper, say E(a,b). From what you have explained above, there is going to be a specific probability distribution P(λi) over which E(a,b) was obtained, since the corresponding P(AB|ab) which you obtained your E(a,b) from, was obtained by marginalizing over a specific P(λi) . Do you agree?

billschnieder said:

Fast forward to then to the resulting CHSH inequality
|E(a,b) + E(a,b') + E(a',b) - E(a',b')| <= 2
In your opinion then, is the P(λi) the same for each of the above terms, or do you believe it doesn't matter.

The same probability distribution should apply to each of the four terms, but the inequality should hold regardless of the specific probability distribution (assuming the universe is a local realist one and the specific experimental conditions assumed in the derivation apply).

"billschnieder said:

So then, if it was found that it is possible in a local realist universe for P(λi) to be different for at least one of the terms in the inequality, above, then the inequality will not apply to those situations where P(λi) is not the same. In other words, the inequalities above are limited to only those cases for which a uniform P(λi) can be guaranteed between all terms within the inequality. Do you disagree?

If you remember, our previous discussion fell apart at the point where you refused to give a straight answer to the last question above.

You say, Bell is referring to the measurement of AN entangled pair with detectors set at a and b. I agree. You also say, in order to obtain the expectation value for this pair of angles, Bell integrates over all λi, so that there is a λi probability distribution. I agree also. This is precisely why I asked you all those questions earlier and you also agreed with me that this λi probability distribution must be exactly the same for all expectation value terms in Bell's inequality.

Now please pay attention and make sure you actually understand what I am saying next before you respond.
The reason why the probability distribution of λi must be the same is the following (using the equations you presented, except using E for expectation to avoid confusion with Probability notation).
[tex]E(a,b) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(b,\lambda )[/tex]
[tex]E(a,c) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(c,\lambda )[/tex]
[tex]E(b,c) = -\int d\lambda\rho (\lambda )A(b,\lambda )A(c,\lambda )[/tex]

Note a few things about the above. There are two factorable terms inside the integral, one for each angle. You can visualize this integral in the following descrete way. We have a fixed number of λi, say (λ1, λ2, λ3, ... λn). To calculate the integral, we multiply A(a,λ1)A(b,λ1)*P(λ1) and add it to A(a,λ2)A(b,λ2)*P(λ2) ... all the way to λn. In other words, the above will not work if we did A(a,λ1)A(b,λ5)*P(λ3) or any such.

Secondly, once we have our inequality:

|E(a,b) - E(a,c)| - E(b,c) <= 1

To say the probability distribution of λi must be the same means that, if we obtained E(a,b) by integrating over a series of λi values, say (λ1, λ2, λ4), the same must apply to E(a,c) and E(b,c). In other words, it is a mathematical error to use E(a,b) calculated over (λ1, λ2, λ4), with E(a,c) calculated over (λ6, λ3, λ2) and E(b,c) calculated over (λ5, λ9, λ8) in the above inequality, because in that case ρ(λi) will not be the same across the terms the way Bell intended and we agree that he did. Note also that even if the set of λ's is the same, we still need each λ to be sampled the exact same number of times for each term.

Now what I have just describe here are the specific experimental conditions that should apply for Bell's inequality to be applicable to data obtained from any experiment.

This brings us to the sorting I mentioned earlier which you are having difficulty with.
Suppose in any actual experiment, the experimenter also had along side each pair of measurements in each run, the specific value of λ for that run. He will now have a long list of pairs of +'s and/ -'s plus one indexed λ each. Such that for the three runs of the experiment he will have three lists which look something similar to the following, except the actuall sequence of +'s and/ -'s and λ's will be different

+ - λ1
- + λ9
+ + λ6
- + λ3
...

In such a case, it will be easy to verify if his data meets the requirement that ρ(λi) is the same for each term, as you agreed to previously. He could simply sort each of the three lists according to the λ column and compare if the λ column from all three runs are the same. If they are not, ρ(λi) is different and Bell's inequality can not be applied to the data for purely mathematical reasons. In other words, if they insisted to calculate the LHS of the inequality with that data, the inequality is not guaranteed to be obeyed, for purely mathematical reasons.

(Note I am using the term "run" here to describe the three lists of already separated out data. ie, run one constitutes all the data used for calculating the E(a,b) term, run 2 the E(a,c) etc even though the experimenters may have been doing random switching from angle to angle.)

However, experimenters do not have the λ's so how can they make sure their data is compatible? If it is assumed that each specific λ contains all properties that will deterministically result in the outcome, then we do not need the λs to sort our data. We can just sort the actual result pairs so that the "a" colum of the (a,b) pair matches the "a" column of the (a,c) pair and the "b" and "c" columns also match. If we can do that, then we can be sure that ρ(λi) is the same for all three terms of the inequality and Bell's inequality should apply to our data. If we can not, it means ρ(λi) is different, and the data is mathematically not compatible with the inequality.

Let us look at this slightly differently. Consider our first list which included the λ's. After sorting all three runs by the λ's we will find that we only need three columns of +'s and/ -'s out of the 6 (2 from each run). This is because each column will be duplicated. This simply means for each λ, there are 3 simultaneously existing properties at the angles.

Now, what if instead of collecting three runs of pairs we collected a single run of triples so that the data from our experiment is
a b c
+ - + λ1
- + + λ9
+ + - λ6
- + + λ3
...

We do not need any sorting here because we can calculate all our terms from the same single run with the same ρ(λi). So we can compare ANY dataset of this type with Bell's inequality. Note, this is not the same as saying we can do the same thing even if we only measured pairs so long as triples are assumed to exist. Of course triples are assumed to exist. That is what gave us the inequalities. We are only interested now in the question of whether our dataset obtained in an experiment can fulfil the requirement of uniform ρ(λi). However, since it is not possible to measure triples in any experiment, the requirement to be able to sort the dataset applies to all datasets involving multiple runs of pairs.

Now, let us go back to the underlined text above. Since you agreed with me that ρ(λi) must be the same for each term in the inequality, how do you make sure of that in an experiment? Is that what you were alluding to with the underlined text: "which is equivalent to the average measurement result over a very large (approaching infinity) series of measurements"? In other words, why is it important that the number of measurements be very large? Please I need a specific answer to this question, assuming you are still willing to contest this issue after my very detailed explanation above.

As an aside:
You seem to have an issue with my use of

| <ab> + <ac> | - <bc> <= 1

In which I have replaced E(a,b) in Bell's notation with <ab> in mine. Where a,b represent the outcomes at angles a and b and I was referring to the fact that in calculating the averages, it is not allowed for the list of a's in the first term to contain a different number of +'s and/ -'s from that in the second term and same for "c" and "b".
You objected and said:

"a" is just a detector angle rather than a result like +1 or -1, the text makes that clear, so of course it means the same thing everywhere. But P(a,b) is an expectation value (he called it that himself), which can be understood as the average value of the product of two measurements on a pair of entangled particles with detectors at angles a and b, in the limit as the number of particle pairs measured in this way goes to infinity.

But then later, you used exactly the same notation.

I have done that several times, whenever I point out that the terms in your inequality have a meaning of this type (with the understanding that here I use notation like b*c to refer not to the product of two detector angles, but the product of the predetermined results +1 or -1 for b and c in a given triple)

This tactic of yours combined with lack of willingness to actually understand the opposing view, combined with a severe case of irrelevant argumentum ad verbosium, is the reason I do not take you seriously.

DevilsAvocado · Aug 6, 2010

JesseM, I’m sorry to say that if it continues this way, I probably have to charge you some kind of "https://www.physicsforums.com/showpost.php?p=2825463&postcount=1192""...

"a severe case of irrelevant argumentum ad verbosium"
This very fine and sophisticated grievance can only be deduced to a severe case of argumentum ad hominem abusive.

LOL! Pathetic BS is still nothing more than pathetic BS!

So, what’s up next? Well, Mr. BS already smells the defeat, and his only "hope" is semantic games and personal attacks in disguise of silly words, and after yet another 2-3 posts, the attacks will boost significantly.

And then comes the grand finale in an: "Agreement not to agree."

Jesse, we take you seriously, and Mr. BS is nothing more than a pathetic joke.

argumentum ad nauseam

billschnieder · Aug 6, 2010

One more thing.

JesseM said:

This is easier to see if you suppose λ can only take a discrete set of values from 0 to N, so the integral on the right side of (2) can be replaced by the sum [tex]\sum_{i=0}^N A(a,\lambda_i)*B(b,\lambda_i)*P\\lambda_i) [/tex].

You must agree therefore that the following is Bell's inequality.
[tex]
|\sum_{i} A(a, \lambda_{i} )A(b,\lambda_{i} ) P(\lambda_{i} ) + \sum_{i} A(a, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} )| - \sum_{i} A(b, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} ) \leq 1 [/tex]

Which can be factored in this form.
[tex]
|\sum_{i} P(\lambda_{i} )A(a, \lambda_{i} )\left [ A(b,\lambda_{i} ) + A(c,\lambda_{i} )\right ]| - \sum_{i} A(b, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} ) \leq 1
[/tex]

Bell himself did a similar factorization. Therefore if for any dataset the two equations above produce different results, it means the dataset is not compatible with Bell's inequality for purely mathematical reasons. Do you agree? If you don't please explain clearly.

billschnieder · Aug 7, 2010

In case there is any doubt left. Let us now go through Bell's paper and show step by step, and show that the physical assumptions are peripheral to the derivation of the inequality.

We start by recognizing that Bell has defined a deterministic function A(.,.) which is a two valued function with values (+1 or -1) for a single particle. This is done in equation (1) of his original paper, as follows:

Bell said:

[tex]A(a,\lambda ) = \pm 1, B(b,\lambda ) = \pm 1[/tex]

Let us show set up our own definitions side by side. Let us pick two arbitrary variables a', b' with values (+1 or -1). For our purpose, it is not important what the physical situation is between a', or b' or whether there is remote dependence between a' and b'. All that is important for us is that we have two such arbitrary variables without any regard as to what physical process may be producing them. Please do not confuse our variables a' and b', with Bell's vectors (a and b). a' and b' are rather analogous to Bell's two-valued functions A(.,.) and B(.,.). We will harmonize the notation later. In our case, the analogy of Bell's equation (1) above is the following:
[tex]a' = \pm 1, b' = \pm 1[/tex]

Now let us go to Bell's equation (2) where he defines his expectation values

Bell said:

[tex]E(a,b) = \int d\lambda \rho (\lambda )A(a,\lambda )B(b,\lambda )[/tex]

Note, what Bell is doing here is calculating the weighted average of the product A(a,λ)*B(b,λ) for all λ. Which is essentially the expectation value. Theoretically the above makes sense, where you measure each A(a,.), B(b,.) pair exactly once for a specific λ, and simply multiply with the probability of realizing that specific λ and then add up subsequent ones to get your expectation value E(a,b). But practically, you could obtain the same E(a,b) by calculating a simple average over a representative set of outcomes in which the frequency of realization of a specific λ, is equivalent to it's probability. ie

For example, if we had only 3 possible λ's (λ1, λ2, λ3) with probabilities (0.3, 0.5, 0.2) respectively. The expectation value will be
E(a,b) = 0.3*A(a,λ1)*B(b,λ1) + 0.5*A(a,λ2)*B(b,λ2) + 0.2*A(a,λ3)*B(b,λ3)

Where each outcome for a specific lambda exists exactly once. OR we can calculate it using a simple average, from a dataset of 10 data points, in which A(a,λ1),B(b,λ1) was realized exactly 3 times (3/10 = 0.3), A(a,λ2), B(b,λ2) was realized 5 times, and A(a,λ3), B(b,λ3) was realized 2 times; or any other such dataset of N entries where the relative frequencies are representative of the probabilities. Practically, this is the only way available to obtain expectation values, since no experimenter has any idea what the λ's are or how many of them there are. All they can do is assume that by measuring a large number of points, their data will be as representative as illustrated above.(This is the fair sampling assumption which is however not the focus of this post.) So then in this case, assuming discrete λ's, that Bell's equation (2) is equivalent to the following simple average:
[tex]E(a,b) = \frac{1}{N} \sum_{i}^{N} A(a,\lambda _i)B(b,\lambda _i)[/tex]
Since in any real experiment we do not know which λ is realized for any specific iteration, we can drop lambda from the equation altogether without any impact, where we have simply absorbed the λ into the specific variant of the functions A,B operating for iteration i (that is Ai and Bi)
[tex]E(a,b) = \frac{1}{N} \sum_{i}^{N} A(a)_{i}B(b)_{i}[/tex]
And we could adopt a simplified notation in which we replace the function [tex]A(a)_i[/tex] with the outcome [tex]\alpha _i[/tex] and [tex]B(b)_i[/tex] with [tex]\beta _i[/tex]. Note that the outcomes of our functions are restricted to values (+1 or -1) and we could say [tex]\alpha = \pm 1, \beta = \pm 1[/tex]

To get:
[tex]E(a,b) = \frac{1}{N} \sum_{i}^{N} \alpha _{i} \beta _{i} = <\alpha \beta >[/tex]
Let us then develop our analogy involving our a' and b' to the same point. Remember our first assumption was that we had two such arbitrary variables a' and b' with values (+1 or -1). Now consider the situation in which we had a list of pairs of such variables of length N. Let us designate our list [(a',b')] to indicate that each entry in the list is a pair of (a',b') values. Let us define the expectation value of the pair product for our list as follows:
[tex]E(a',b') = \frac{1}{N} \sum_{i}^{N} {a}'_{i} {b}'_{i} = <a'b'>[/tex]
For all practical purposes, this equation is exactly the same as the previous one and the terms a' and b' are mathematically equivalent to α and β respectively. What this shows is that the physical assumptions about existence of hidden variables, locality etc are not necessary to obtain an expression for the expectation values for a pair product. We have obtained the same thing just by defining two variables a', b' with values (+1 and -1) and calculating the expectation value for the paired product of a list of pairs of these variables. You could say the reason Bell obtained the same same expression is because he just happened to be dealing with two functions which can have values (+1 and -1) for physical reasons and experiments producing a list of such pairs. And he just happened to be interested in the pair product of those functions for physical reasons. But the structure of the calculation of the expectation value is determined entirely by the mathematics and not the physics. Once you have two variables with values (+1 and -1) and a list of pairs of such values, the above equations should arise no matter the process producing the values, whether physical, mystical, non-local, spooky, super-luminal, or anything you can dream about. That is why I say the physical assumptions are peripheral.

Note a few things about the above equation. a'_i and b'_i must be multiplied with each other. If we independently reorder the columns in our list so that we have different pairings of a'_i and b'_i, we will obtain the same expectation value only in the most improbable of situations. To see this, consider the simple list below

a' b'
+ -
- +
- +
+ -

<a'b'> = -1/4

If we rearrange the b' column so that the pairing is no longer the same, we may have something like the following were we have the same number of +'s and -'s but their pairing is different:

a' b'
+ +
- -
- -
+ +

<a'b'> = 1/4
Which tells us that we are dealing with an entirely different dataset.

billschnieder · Aug 7, 2010

(continued from the last post)

So far we have dealt with pairs, just like Bell up to his equation (14). Let us then, following in Bell's footsteps introduce the third variable (see page 406 of his original paper).

Bell said:

It follows that c is another unit vector
[tex]E(a,b) - E(a,c) = -\int d\lambda \rho (\lambda )[A(a,\lambda )A(b,\lambda )-A(a,\lambda )A(c,\lambda )][/tex]
[tex]= \int d\lambda \rho (\lambda )A(a,\lambda )A(b,\lambda )[A(b,\lambda)A(c,\lambda )-1][/tex]
using (1), whence
[tex]\left | E(a,b)-E(a,c) \right |\leq \int d\lambda \rho [1 - A(b,\lambda)A(c,\lambda )][/tex]
The second term on the right is E(b,c), whence
1 + E(b,c) >= |E(a,b) - E(a,c)| ... (15)

Note a few things here: Bell factorizes at will within the integral. ρ(λ) is a factor of every term under the integral. That is why I explained in my previous detailed post that ρ(λ) must be the same for all three terms. Secondly, Bell derives the expectation value term E(b,c) by factoring out the corresponding A(b,.) and A(c,.) terms from E(a,b) and E(a,c). Therefore, E(b,c) does not contain different A(b,.) and A(c,.) terms but the exact same ones present in E(a,b) and E(a,c). In other words, in order to obtain all three expectation values E(a,b), E(a,c) and E(b,c), we ONLY need three lists of outcomes corresponding to A(a,.), A(b,.), A(c,.) or in simpler notation, we only need a single list of triples [(a',b',c')] to calculate all terms for

1 + <b'c'> >= |<a'b'> - <a'c'>|

So then, we are destined to obtain this inequality for any list of triples of two valued variables (or outcomes of two-valued functions) were the allowed values are (+1 or -1), no matter the physical, metaphysical or mystical situation generating the triples. It is an entirely arithmetic relationship entirely determined by the fact that we are using three such two-variables. Suppose now that we generate from our list of triples, three lists of pairs corresponding to [(a',b')], [(a',c')] and [(b',c')], we can simply calculate our averages and be done with it. It doesn't matter if the order of pairs in the lists are randomized so long as the pairs are kept together. In this case, we can still sort them as described in my previous detailed description, to regenerate our list of triples from the three lists of pairs. However, if we were to randomize without keeping the pairs together, it will be impossible to regenerate our original list of triples from the resulting lists of pairs, and Bell's inequality will not apply to our data.

Now the way Bell-test experiments are usually done, is analogous to collecting three lists of pairs randomly with the assumption that these three lists are representative of the three lists of pairs which we would have obtain from a list of triple, had we been able to measure at three angles simultaneously. And if each list was sufficiently long, the averages will be close to those of the ideal situation assumed by Bell. Again, remember that within each list of pairs actually measured, the individual pairs such as (a',b')_i measured together are assumed to have originated from a specific theoretical triple, (a',c')_j from another triple, and (b',c')_k from another triple. Therefore, our dataset from a real experiment is analogous to our three theoretical lists above, where we randomized the order but kept the pairs together while randomizing. Which means, it should be possible to regenerate our single list of triples simply by resorting the three lists of pairs while keeping the individual pairs together, as I explained previously. If we can not do this, it means either that:
a) our data is most likely of the second kind in which randomization did not keep the pairs together or
b) each list of pairs resulted from different lists of triples and/or
c) our lists of pairs are not representative of the list of triples from which they arose

In any of these cases, Bell's inequality does not and can not apply to the data. In other words, it is simply a mathematical error to use the inequality in such situations. Also note that these represent the only scenarios in which "average value of a*b for all triples" is different from "average value of a*b for measured pairs only". And in this case, the fair sampling assumption can not hold.

JesseM · Aug 7, 2010

(reply to post #1208, part 1)

billschnieder said:

Note the underlined texts as we will come back to it. Now let us consider our previous discussion about this in post #857.

billschnieder said:

JesseM said:

The same probability distribution should apply to each of the four terms, but the inequality should hold regardless of the specific probability distribution (assuming the universe is a local realist one and the specific experimental conditions assumed in the derivation apply).

So then, if it was found that it is possible in a local realist universe for P(λi) to be different for at least one of the terms in the inequality, above, then the inequality will not apply to those situations where P(λi) is not the same. In other words, the inequalities above are limited to only those cases for which a uniform P(λi) can be guaranteed between all terms within the inequality. Do you disagree?

If you remember, our previous discussion fell apart at the point where you refused to give a straight answer to the last question above.

Well, no, you are completely misremembering why our previous discussion "fell apart". In fact I did give you a clear answer to this question in post #861:

When you suggest the possibility that P(λi) could be "different for at least one of the terms in the inequality", that would imply that P(λi) depends on the choice of detector settings, since each expectation value is defined relative to a particular combination of detector settings. Am I understanding correctly, or are you talking about something else?

If I am understanding you right, note that it's generally accepted that one of the assumptions needed in Bell's theorem is something called the "no-conspiracy assumption", which says the decisions about detector settings should not be correlated with the values of the hidden variables.

...

So, I agree the inequality can only be assumed to hold if the choice of detector settings and the value of the hidden variables are statistically independent (which means the probability distribution P(λi) does not change depending on the detector settings), but this is explicitly included as an assumption in the more rigorous modern derivations. If you dispute that a "conspiracy" of the type being ruled out here would in fact have some very physically implausible features so that it's reasonable to rule it out, I can give you some more detailed arguments for why it's so implausible.

Then in post #862 you said:

You are wondering off now, JesseM. Try not to pre-empt the discussion. The question I asked should have a straightforward answer. The reason why P(λi) might be different shouldn't affect the answer you give to my question. If you believe P(λi) will be different when a conspiracy is involved, then you should have no problem admitting that Bell's inequalities do not apply to situations in which there is conspiracy.

And in post #863 I responded to the last sentence (...'then you should have no problem admitting that Bell's inequalities do not apply to situations in which there is conspiracy') by saying:

Didn't I already "admit" that in my last post? Read again:

So, I agree the inequality can only be assumed to hold if the choice of detector settings and the value of the hidden variables are statistically independent (which means the probability distribution P(λi) does not change depending on the detector settings)

So, I made quite clear that my answer to your question was "yes", I agreed that the inequality can only be assumed to hold if the probability distribution P(λi) is assumed to be the same for each of the terms E(a,b), E(a,b'), E(a',b) and E(a',b'). But I additionally explained that assuming the probability distribution was the same for each term was equivalent to the no-conspiracy assumption, i.e. P(λi) = P(λi | a,b) = P(λi | a,b') = P(λi | a',b) = P(λi | a',b'). Your complaint in subsequent posts was not that I had failed to give clear answers to any of your questions, but just a complaint that you didn't like the fact that I made additional commentary about the reasoning behind my answers, commentary which I thought would help people reading the thread to better understand the issues being discussed. You wanted me to shut up and not make any additional comments I deemed relevant, and restrict myself only to short answers to your questions. For example in post #864 you made it clear that you did understand I had answered your questions, and just wanted me to snip out all the surrounding commentary about my answers:

So then, I will assume that the last few posts did not happen, and I will consider that the responses moving forward are as follows:

So then, if it was found that it is possible in a local realist universe for P(λi) to be different for at least one of the terms in the inequality, above, then the inequality will not apply to those situations where P(λi) is not the same. In other words, the inequalities above are limited to only those cases for which a uniform P(λi) can be guaranteed between all terms within the inequality. Do you disagree?

... I agree ...

Do you believe P(λi) can different between the terms if and only if conspiracy is involved?

Yes ...

See how short and to the point this would have been. You would have saved yourself all the typing effort, and to boot, we don't have to start a new rabbit trail about the meaning of "conspiracy"!

Then later in that same post you made clear that your actual objection was to my additional explanatory commentary, and threatened to end the discussion if I wouldn't agree to restrict my comments only to short answers to your questions:

But if you now define conspiracy in a manner that I don't agree with, I will be forced to challenge it because if I don't it may appear as though I agree with that definition, then we end up 20 posts later, discussing whose definition of "conspiracy" is correct, having left the original topic. The more you write, the more things need to be challenged in your posts and the more off-topic the discussions will get. This is why I insist that the discussion be focused. I hope you will recognize and respect this, otherwise there is no point continuing this discussion.

It is certainly reasonable to expect that one's discussion partner will give clear answers to any questions you ask, but it's not reasonable to expect that they restrict themselves only to short answers to your questions and not make any additional commentary they think is relevant. That unreasonable expectation on your part was why the earlier discussion shut down, not because I didn't "give a straight answer" to any of the questions you asked.

Sorry to spend so much time rehashing old disagreements but I don't like being accused of refusing to answer any question, that's something I will always try my best to do. Moving on to the substance of your current post:

billschnieder said:

You say, Bell is referring to the measurement of AN entangled pair with detectors set at a and b. I agree.

So, you agree that "resorting" the data, in the way you did in post #1187, is out of the question? That no physicist would interpret a term like E(a,b) to possibly involve taking the result from a detector with setting a during a trial where the two detectors were set to a,b' and multiplying it by the result from a detector with setting b during a trial where the two detectors were set to a',b?

billschnieder said:

You also say, in order to obtain the expectation value for this pair of angles, Bell integrates over all λi, so that there is a λi probability distribution. I agree also. This is precisely why I asked you all those questions earlier and you also agreed with me that this λi probability distribution must be exactly the same for all expectation value terms in Bell's inequality.

Yes, I did agree, giving you "a straight answer" to this question even though I added some additional commentary about why it is reasonable to expect the probability distribution to be the same regardless of detector settings.

billschnieder said:

Now please pay attention and make sure you actually understand what I am saying next before you respond.
The reason why the probability distribution of λi must be the same is the following (using the equations you presented, except using E for expectation to avoid confusion with Probability notation).
[tex]E(a,b) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(b,\lambda )[/tex]
[tex]E(a,c) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(c,\lambda )[/tex]
[tex]E(b,c) = -\int d\lambda\rho (\lambda )A(b,\lambda )A(c,\lambda )[/tex]

I don't understand how you can say that those equations are "the reason why" the probability distribution is the same. Are you suggesting that those equations can be taken as definitions of E(a,b) and E(a,c) and E(b,c), and thus it is true by definition that the probability distribution [tex]\rho (\lambda )[/tex] is the same in each case? I would say that a term like E(a,b) is understood to be defined as the expectation value for the product of two measurements on a pair of entangled particles when the detectors are set to a and b, and that Bell then tries to physically justify why we would expect E(a,b) to be given by the equation above in a local realist universe. So any feature of the equations, like [tex]\rho (\lambda )[/tex] being the same in each, cannot be justified by pointing to the equations themselves, there has to be a physical justification for it or else someone following the derivation would have no reason to agree that the equations above are actually correct in a local realist universe. Do you agree that the derivation depends on the idea that there's a physical justification for assuming [tex]\rho (\lambda )[/tex] is the same in each of those three equations, that we can't just point to the equations themselves to explain the "reason" that [tex]\rho (\lambda )[/tex] is the same?

If you disagree with that, I would just point you again to Bell's paper http://cdsweb.cern.ch/record/142461/files/198009299.pdfpapers which I brought up earlier in post #1171 when showing that the simple Bell inequality I originally brought up was one that Bell had actually discussed. On p. 15 of the pdf file (p. 14 of the paper itself) he does bring up the other inequality we had been discussing before you refused to continue if I didn't keep my answers short:

|E(a,b) + E(a,b') + E(a',b) - E(a',b')| <= 2

Then on p. 16 of the pdf (p. 15 of the paper), in the "Envoi" section, he discusses possible objections one might have to his conclusion that the inequality should be obeyed in a local realist universe. And at the bottom of this page, he explicitly brings up the possibility that [tex]\rho(\lambda)[/tex] could be different from term to term, and gives a physical argument for why he considers this very implausible:

Secondly, it may be that it is not permissible to regard the experimental settings a and b in the analyzers as independent variables, as we did. We supposed them in particular to be independent of the supplementary variable λ, in that a and b could be changed without changing the probability distribution [tex]\rho(\lambda)[/tex]. Now even if we have arranged that a and b are generated by apparently random radioactive devices, housed in separate boxes and thickly shielded, or by Swiss national lottery machines, or by elaborate computer programmes, or by apparently free willed experimental physicists, or by some combination of all of these, we cannot be sure that a and b are not significantly influenced by the same factors λ that influence A and B. But this way of arranging quantum mechanical correlations would be even more mind boggling than one in which causal chains go faster than light. Apparently separate parts of the world would be deeply and conspiratorially entangled, and our apparent free will would be entangled with them.

So, clearly he doesn't think that E(a,b) and E(a,b') can be said to have the same probability distribution on λ by definition, rather he provides a physical argument to justify this idea.

billschnieder said:

Note a few things about the above. There are two factorable terms inside the integral, one for each angle. You can visualize this integral in the following descrete way. We have a fixed number of λi, say (λ1, λ2, λ3, ... λn). To calculate the integral, we multiply A(a,λ1)A(b,λ1)*P(λ1) and add it to A(a,λ2)A(b,λ2)*P(λ2) ... all the way to λn. In other words, the above will not work if we did A(a,λ1)A(b,λ5)*P(λ3) or any such.

Agreed--note that if you mixed them up in that way you would no longer be computing an "expectation value" for the product of the two measurement results on a single pair of entangled particles, since it's assumed that on each trial with a single pair, λ takes a single value on that trial (its value is supposed to be determined by the values of all hidden variables on a given trial)

JesseM · Aug 7, 2010

(reply to post #1208, part 2)

billschnieder said:

Secondly, once we have our inequality:

|E(a,b) - E(a,c)| - E(b,c) <= 1

To say the probability distribution of λi must be the same means that, if we obtained E(a,b) by integrating over a series of λi values, say (λ1, λ2, λ4), the same must apply to E(a,c) and E(b,c). In other words, it is a mathematical error to use E(a,b) calculated over (λ1, λ2, λ4), with E(a,c) calculated over (λ6, λ3, λ2) and E(b,c) calculated over (λ5, λ9, λ8) in the above inequality, because in that case ρ(λi) will not be the same across the terms the way Bell intended and we agree that he did.

True, but now you're talking about a completely different sense of what it would mean for ρ(λi) to "not be the same across the terms" than what I was talking about. I wasn't talking about only adding some values of λ in the sums for each term, I was just talking about how each term could involve a different probability distribution on all possible values of λi, i.e. one might use a probability distribution P1(λ) such that P1(λ5) = 0.03% while another might use a different probability distribution P2(λ) such that P2(λ5) = 1.7%. That is what it would mean to violate the no-conspiracy assumption, it doesn't have anything to do with only adding some values of λ in the sum for each term. Even if the no-conspiracy assumption was violated, the discrete case in a local realist universe (where the result A was always completely predetermined by the value of λ and the choice of detector setting a, b, or c) where there were N possible values of λ would still look like this:

[tex]E(a,b) = - \sum_{i=1}^N A(a,\lambda_i)*A(b,\lambda_i)*P_1 (\lambda_i)[/tex]
[tex]E(b,c) = - \sum_{i=1}^N A(b,\lambda_i)*A(c,\lambda_i)*P_2 (\lambda_i)[/tex]
[tex]E(a,c) = - \sum_{i=1}^N A(a,\lambda_i)*A(c,\lambda_i)*P_3 (\lambda_i)[/tex]

You can see that the only difference here is that the three sums have different probability distributions on λ--P1, P2, and P3--but each sum still includes every possible value of λ (i.e. λ1, λ2, λ3, ... , λN)

Perhaps you are worried that even if we assume the probability distribution P(λ) is the same for each term, there could be trillions of values of λ and thus the subset of trials where we used detector angles a,b might involve a totally different collection of λi's than the subset of trials where we used detector angles b,c or the subset where we used detector angles a,c. If so, this objection is misguided, and once again the reason has to do with the Law of large numbers. "Expectation values" are theoretical calculations about what the average result of some experiment would be in the limit as the number of trials goes to infinity. And one can show mathematically that if you're dealing with an experiment that only has two possible results +1 and -1, then for a reasonably large number of trials (say, 1000) the probability that the average experimental result will differ significantly from the expectation value becomes astronomically small, regardless of how many possible values can be taken by other variables "behind the scenes" which determine whether the final result +1 or -1. This was the point I made back in post #51 on the 'Understanding Bell's Logic' thread, which you never responded to:

I'm fairly certain that the rate at which the likelihood of significant statistical fluctuations drops should not depend on the number of λ_n's in the integral. For example, suppose you are doing the experiment in two simulated universes, one where there are only 10 possible states for λ and one where there are 10,000 possible states for λ. If you want to figure out the number N of trials needed so that there's only a 5% chance your observed statistics will differ from the true probabilities by more than one sigma, it should not be true that N in the second simulated universe is 1000 times bigger than N in the first simulated universe! In fact, despite the thousandfold difference in possible values for λ, I'd expect N to be exactly the same in both cases. Would you disagree?

To see why, remember that the experimenters are not directly measuring the value of λ on each trial, but are instead just measuring the value of some other variable which can only take two possible values, and which value it takes depends on the value of λ. So, consider a fairly simple simulated analogue of this type of situation. Suppose I am running a computer program that simulates the tossing of a fair coin--each time I press the return key, the output is either "T" or "H", with a 50% chance of each. But suppose the programmer has perversely written an over-complicated program to do this. First, the program randomly generates a number from 1 to 1000000 (with equal probabilities of each), and each possible value is associated with some specific value of an internal variable λ; for example, it might be that if the number is 1-20 that corresponds to λ=1, while if the number is 21-250 that corresponds to λ=2 (so λ can have different probabilities of taking different values), and so forth up to some maximum λ=n. Then each possible value of λ is linked in the program to some value of another variable F, which can take only two values, 0 and 1; for example λ=1 might be linked to F=1, λ=2 might be linked to F=1, λ=3 might be linked to F=0, λ=4 might be linked to F=1, etc. Finally, on any trial where F=0, the program returns the result "H", and on any trial where F=1, the program returns the result "T". Suppose the probabilities of each λ, along with the value of F each one is linked to, are chosen such that if you take [sum over i from 1 to n] P(λ=i)*(value of F associated with λ=i), the result is exactly 0.5. Then despite the fact that there may be a very large number of possible values of λ, each with its own probability, this means that in the end the probability of seeing "H" on a given trial is 0.5, and the probability of seeing "T" on a given trial is also 0.5.

Now suppose that my friend is also using a coin-flipping program, where the programmer picked a much simpler design in which the computer's random number generator picks a digit from 1 to 2, and if it's 1 it returns the output "H" and if it's 2 it returns the output "T". Despite the differences in the internal workings of our two programs, there should be no difference in the probability either of us will see some particular statistics on a small number of trials! For example, if either of us did a set of 30 trials, the probability that we'd get more than 20 heads would be determined by the binomial distribution, which in this case says there is only an 0.049 chance of getting 20 or more heads (see the calculator http://stattrek.com/Tables/Binomial.aspx). Do you agree that in this example, the more complex internal set of hidden variables in my program makes no difference in statistics of observable results, given that both of us can see the same two possible results on each trial, with the same probability of H vs. T in both cases?

For a somewhat more formal argument, just look at http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter8.pdf, particularly the equation that appears on p. 3 after the sentence that starts "By Chebyshev's inequality ..." If you examine the equation and the definition of the terms above, you can see that if we look at the the average value for some random value X after n trials (the [tex]S_n / n[/tex] part), the probability that it will differ from the expectation value [tex]\mu[/tex] by an amount greater than or equal to [tex]\epsilon[/tex] must be smaller than or equal to [tex]\sigma^2 / n\epsilon^2[/tex], where [tex]\sigma^2[/tex] is the variance in the value of the original random variable X. And both the expectation value for X and the variance of X depend only on the probability that X takes different possible values (like the variable F in the coin example which has an 0.5 chance of taking F=0 and an 0.5 chance of taking F=1), it shouldn't matter if the value of X on each trial is itself determined by the value of some other variable λ which can take a huge number of possible values.

billschnieder said:

Note also that even if the set of λ's is the same, we still need each λ to be sampled the exact same number of times for each term.

No, see above. The expectation value is the average value we'd expect theoretically in the limit as the number of trials approaches infinity, and my argument from post #51 of "Understanding Bell's logic" explains why, if we do say three runs with 1000 trials each for all three possible combinations of different detector settings, it'd be astronomically unlikely for the average results seen experimentally in each run to differ significantly from the expectation values (assuming that the theoretical assumptions about the laws of physics that went into deriving expressions for the expectation values are actually correct), even if there happen to be 200 googolplex possible values of λ. If you disagree, perhaps you should actually address my example with the coin-flipping program rather than just dismissing it as irrelevant like you did on the "Understanding Bell's logic" thread.

billschnieder said:

Now what I have just describe here are the specific experimental conditions that should apply for Bell's inequality to be applicable to data obtained from any experiment.

Nope, there is no need for each run to sample all values of λi (or for different runs to sample the same values of λi), just as there wouldn't be such a need in the coin-flipping simulation example where the result "heads" or "tails" on each flip depends on the value of an internal random variable λ which can take a huge number of possible values, but the total probability of getting "heads" or "tails" on each flip is still 0.5 (so the theoretical expectation value if heads=+1 and tails=-1 would be 0), and the law of large numbers still says that if you do a few hundred flips the probability that the fraction of "heads" will be significantly different from 0.5 (or the probability that the average value with heads=+1 and tails=-1 is significantly different from 0) will be astronomically small, even if you sampled only a tiny fraction of the possible values of the internal variable λ.

billschnieder said:

This brings us to the sorting I mentioned earlier which you are having difficulty with.
Suppose in any actual experiment, the experimenter also had along side each pair of measurements in each run, the specific value of λ for that run. He will now have a long list of pairs of +'s and/ -'s plus one indexed λ each. Such that for the three runs of the experiment he will have three lists which look something similar to the following, except the actuall sequence of +'s and/ -'s and λ's will be different

+ - λ1
- + λ9
+ + λ6
- + λ3
...

In such a case, it will be easy to verify if his data meets the requirement that ρ(λi) is the same for each term, as you agreed to previously. He could simply sort each of the three lists according to the λ column and compare if the λ column from all three runs are the same.

Again, you misunderstood what I meant when I agreed "ρ(λi) is the same for each term", see the discussion above starting with the paragraph that begins "True, but now you're talking about a completely different sense..." I just meant that the "true" probability distribution for a given pair of settings like a,b, which in frequentist terms can be understood as giving the fraction of trials/iterations with each value of λi that would be obtained in the limit as the number of trials/iterations with those settings went to infinity, would be identical to the "true" probability distribution for a different pair of settings like b,c. Then the law of large numbers indicates that even if you only do 3 runs with 1000 iterations each, and the λi's were completely different on each run, it's still astronomically improbable that the average values you obtain for each run will differ significantly from the "true" expectation values for each setting which can be calculated from the "true" probability distribution ρ(λi).

billschnieder said:

If they are not, ρ(λi) is different and Bell's inequality can not be applied to the data for purely mathematical reasons.

No, you're confusing the theoretical ρ(λi) which appears in the equations calculating expectation values with the actual truth about the fraction of trials/iterations with each value of λi on some finite set of runs, which might better be denoted F(λi). If the number of trials/iterations is not much larger than the number of possible values of λi, then F(λi) might well be wildly different than ρ(λi), but exactly the same would be true in my coin flip simulation example and it wouldn't change the fact that if you do 1000 simulated flips, the chance you will have gotten a number of heads significantly different than 500 is astronomically small. If you think it's actually necessary to sample every value of λi in order to be highly confident that our average result was very close to the "true" expectation value, then you're just misunderstanding how the law of large numbers works.

billschnieder said:

In other words, if they insisted to calculate the LHS of the inequality with that data, the inequality is not guaranteed to be obeyed, for purely mathematical reasons.

Even if all the theoretical assumptions used in the expectation value equations are correct, there's some small probability that experimental data won't satisfy the inequality, but for a reasonably large number of trials/iterations on each run (say, 1000), this probability becomes astronomically small (the probability that the experimental average differs by a given amount from the expectation value can be calculated using the http://stattrek.com/Tables/Binomial.aspx).

JesseM · Aug 7, 2010

(reply to post #1208, part 3)

billschnieder said:

However, experimenters do not have the λ's so how can they make sure their data is compatible? If it is assumed that each specific λ contains all properties that will deterministically result in the outcome, then we do not need the λs to sort our data. We can just sort the actual result pairs so that the "a" colum of the (a,b) pair matches the "a" column of the (a,c) pair and the "b" and "c" columns also match. If we can do that, then we can be sure that ρ(λi) is the same for all three terms of the inequality and Bell's inequality should apply to our data.

I'm not sure I follow what you mean here. Suppose we do only 4 iterations with each pair of different detector settings, and get these results (with the understanding that notation like a=+1 means 'the result with detector set to angle a was +1):

For run with setting (a,b):
1. (a=+1, b=-1)
2. (a=-1, b=-1)
3. (a=-1, b=+1)
4. (a=+1, b=-1)

For run with setting (b,c):
1. (b=-1, c=+1)
2. (b=-1, c=-1)
3. (b=-1, c=+1)
4. (b=+1,c=-1)

For run with setting (a,c):
1. (a=+1, c=-1)
2. (a=+1, c=+1)
3. (a=-1, c=-1)
4. (a=-1, c=+1)

Then we can arrange these results into four rows of three iterations from three runs, such that in each row the value of a is the same for both iterations that sampled a, in each row the value of b is the same for both iterations that sampled b, and in each row the value of c is the same for both iterations that sampled c:

1. (a=+1, b=-1) 3. (b=-1, c=+1) 2. (a=+1, c=+1)
2. (a=-1, b=-1) 1. (b=-1, c=+1) 4. (a=-1, c=+1)
3. (a=-1, b=+1) 4. (b=+1,c=-1) 3. (a=-1, c=-1)
4. (a=+1, b=-1) 2. (b=-1, c=-1) 1. (a=+1, c=-1)

So, we could "resort" the iteration labels for the second run (middle column) such that the former third iteration was now labeled the first, the former first iteration was now labeled the second, the former fourth iteration was now labeled the third, and the former second iteration was now labeled the fourth. Likewise for the third run (right column) we could say the former second iteration was now labeled the first, the former fourth iteration was now labeled the second, the third iteration remained the third, and the former first iteration was now labeled the fourth. Is this the type of "resorting" you mean?

If so, I don't see how this ensures that "ρ(λi) is the same for all three terms of the inequality", or what you even mean by that. For example, isn't it possible that if the number of possible values of λ is 1000, then even though iteration #1 of the first run has been grouped in the same row as iteration #3 of the second run and iteration #2 of the third run (according to their original labels), that doesn't mean the value of λ was the same for each of these three iterations? For example, might it not have been the case that iteration #1 of the first run had λ₂₀₃, iteration #3 of the second run had λ₇₆₉, and iteration #2 of the third run had λ₄₈₈?

As a separate issue it is of course true that if your full set of data can be resorted in this way, that's enough to guarantee mathematically that the data will obey Bell's inequality. But this is a very special case, I think it would be fairly unlikely that the full set of iterations from each run could be resorted such that every row would have the same value of a,b,c throughout, even if the data was obtained in a local realist universe that obeyed Bell's theoretical assumptions, and even if the overall averages from each run actually did obey the Bell inequality.

billschnieder said:

If we can not, it means ρ(λi) is different, and the data is mathematically not compatible with the inequality.

But again that doesn't seem to be true (if I am interpreting your meaning correctly), the prediction that experimental data is highly unlikely to violate the inequality in a local realist universe doesn't require that the values of λ matched on the three experimental runs with different pairs of detector settings. The law of large numbers means that if the equations giving the theoretical expectation values are correct, and the theoretical expectation values obey some inequality, then the probability that experimental data from a finite series of runs would violate the inequality will become astronomically small for a reasonable number (say, a few hundred or a few thousand) trials/iterations, even if this number is vastly smaller than the number of possible values of λ, whose value (along with the detector settings) determines the results on each trial.

billschnieder said:

Let us look at this slightly differently. Consider our first list which included the λ's. After sorting all three runs by the λ's we will find that we only need three columns of +'s and/ -'s out of the 6 (2 from each run). This is because each column will be duplicated. This simply means for each λ, there are 3 simultaneously existing properties at the angles.

Each value of λ is associated with a triplet of predetermined results for settings a,b,c, so if you could somehow know the value of λ on each trial and you knew what settings were used on that trial, that would be sufficient to tell you the results obtained on that trial. Is that basically what you're saying here, or are you making some additional point?

billschnieder said:

Now, what if instead of collecting three runs of pairs we collected a single run of triples so that the data from our experiment is
a b c
+ - + λ1
- + + λ9
+ + - λ6
- + + λ3
...

We do not need any sorting here because we can calculate all our terms from the same single run with the same ρ(λi). So we can compare ANY dataset of this type with Bell's inequality.

You could only "compare it with Bell's inequality" by changing the meaning of the terms in Bell's inequality, which deal with expectation values for experiments where the experimenter only collected a pair of results on each trial, with some specific pair of detector settings. As I've said before, it is of course true that you can prove an inequality like this in a purely mathematical way:

1 + (average value of b*c for all triples)
>= |(average value of a*b for all triples) - (average value of a*c for all triples)|

But that's not Bell's inequality! The terms in Bell's inequality have a meaning like this:

1 + (average value of b*c for all trials where experimenter sampled b and c)
>= |(average value of a*b for all trials where experimenter sampled a and b) - (average value of a*c for all trials where experimenter sampled a and c)|

billschnieder said:

However, since it is not possible to measure triples in any experiment, the requirement to be able to sort the dataset applies to all datasets involving multiple runs of pairs.

No, this is not a "requirement" unless you adopt the strawman position that the inequality is supposed to be guaranteed to hold with probability 1, even for a finite number of trials. But no physicist would claim that, the claim is just that in a local realist universe the actual averages should approach the ideal expectation values as the number of trials becomes large, so in a local realist universe matching Bell's theoretical assumptions, an experiment matching his experimental conditions should have a very tiny probability of yielding data that violates the inequality.

billschnieder said:

Now, let us go back to the underlined text above. Since you agreed with me that ρ(λi) must be the same for each term in the inequality

As noted above I may have meant something different by this than you do, I was talking about the "true" probability distribution and not the actual fraction of trials/iterations with a given value of λi (I used the notation F(λi) to distinguish this second from the first).

billschnieder said:

Is that what you were alluding to with the underlined text: "which is equivalent to the average measurement result over a very large (approaching infinity) series of measurements"? In other words, why is it important that the number of measurements be very large? Please I need a specific answer to this question, assuming you are still willing to contest this issue after my very detailed explanation above.

It's important because true probabilities are understood to be different from actual frequencies on a finite number of trials in the frequentist view, and I don't think there's any sensible way to interpret the probabilities that appear in Bell's proof in non-frequentist terms. An "expectation value" like E(a,b) would be interpreted in frequentist terms as the expected average result in the limit as the number of trials (on a run with detector settings a,b) goes to infinity, and likewise the ideal probability distribution ρ(λi) would in frequentist terms give the fraction of all trials where λ took the specific value λi, again in the limit as the number of trials goes to infinity. Then you can show theoretically that given Bell's physical assumptions, we can derive an inequality like this one:

1 + E(b,c) >= |E(a,b) - E(a,c)|

Then by the law of large numbers, you can show that the likelihood of a significant difference between the "true" expectation value E(b,c) and the experimental average (average for product of two results on all trials where detectors were set to b and c) becomes tiny as the number of trials becomes reasonably large (say, 1000), regardless of whether the ideal probability distribution ρ(λi) is very different from the actual function F(λi) describing the fraction of trials with each value of λi (both functions would be unknown to the experimenter but they should have some true objective value which might be known to an omniscient observer). So, from this we can conclude that with a reasonably large number of trials, it'd be astronomically unlikely in a local realist universe for the experimental data to violate this inequality:

1 + (average value of b*c for all trials where experimenter sampled b and c)
>= |(average value of a*b for all trials where experimenter sampled a and b) - (average value of a*c for all trials where experimenter sampled a and c)|

What specific step(s) in this reasoning do you have an objection to?

billschnieder said:

As an aside:
You seem to have an issue with my use of

| <ab> + <ac> | - <bc> <= 1

In which I have replaced E(a,b) in Bell's notation with <ab> in mine. Where a,b represent the outcomes at angles a and b and I was referring to the fact that in calculating the averages, it is not allowed for the list of a's in the first term to contain a different number of +'s and/ -'s from that in the second term and same for "c" and "b".

OK, the phrase I bolded above now helps clarify what you meant when you said "the symbols ("a", "b" and "c") mean exactly the same thing from term to term", but there was really no way I could have been expected to deduce that without you spelling it out explicitly! Your requirement that we be able to "resort" the data from all three runs such that every row of three iterations from three runs has the same values of a,b,c throughout is a completely idiosyncratic idea no physicist ever brings up in discussions of Bell's theorem, and before post #1208 you hadn't explained it (your previous example involving 'resorting' didn't involve lining up three iterations from three runs, rather it involved creating a fake 'triple' from an iteration of the second run where a and c were measured and an iteration from the third run where b and c were measured, combining the values of a and c from the first iteration with the value of b from the second...see the end of my post #1191 for a discussion of this).

billschnieder said:

You objected and said:

JesseM said:

"a" is just a detector angle rather than a result like +1 or -1, the text makes that clear, so of course it means the same thing everywhere. But P(a,b) is an expectation value (he called it that himself), which can be understood as the average value of the product of two measurements on a pair of entangled particles with detectors at angles a and b, in the limit as the number of particle pairs measured in this way goes to infinity.
But then later, you used exactly the same notation.

The point of my objection was that I didn't understand what you meant when you said 'In Bell's inequality the the "a" in the first two terms are exactly the same.' Whenever I used notation like a*b, I always explained that this was really meant to be a shorthand for the product of two measurement results (each either +1 or -1) on a single pair of particles with detectors set to angles a and b. But that doesn't help to understand what you might mean by 'the "a" in the first two terms are exactly the same', and you didn't explain the meaning before, how was I supposed to know you were talking about reordering each list of iterations such that the value of a in the ith iteration of the run with settings a,b would always match the value of a in the ith iteration of the run with settings a,c? (assuming I have finally understood what you meant, if not please explain) Like I said this is a very idiosyncratic notion of yours and I'm not a mind reader so unless you spell it out I'm not going to know what you're talking about. I didn't assume that the "a" in your phrase 'the "a" in the first two terms are exactly the same' did refer to the detector angle, I just didn't know what it meant and was expressing confusion, and I explicitly asked you for a clarification on this in the second part of my reply (post #1206) when I said:

billschnieder said:

It doesn't mean you need to resort it in order to calculate the terms. It just means being able to resort the data is evidence that the symbols are equivalent. It is just another way of saying the symbols ("a", "b" and "c") mean exactly the same thing from term to term.

I still don't know what you mean by "mean exactly the same thing from term to term". a, b and c are just placeholders, for each triple each one can take value +1 or -1, for example in the first triple on your list you might have a=+1 while on the second triple you might have a=-1. Do you just mean that each term deals with averages from exactly the same list of triples, rather than each term dealing with averages from a separate list of triples?

billschnieder said:

This tactic of yours combined with lack of willingness to actually understand the opposing view, combined with a severe case of irrelevant argumentum ad verbosium, is the reason I do not take you seriously.

Again, this is very uncharitable, not to mention paranoid. When I express confusion about a vague phrase of yours, you act as though it's some sort of sneaky "tactic", and you imagine your posts to be such models of clear exposition that any failure to immediately grok what you are saying must reveal a "lack of willingness to actually understand the opposing view" (speaking of lack of willingness, I do try to address all your arguments as best I can, whereas you immediately dismiss anything that you don't immediately see the relevance of like my coin-flipping simulation example from the 'Understanding Bell's logic' thread...what's more, addressing all your arguments itself requires long posts, and then you interpret this too in a hostile mocking way as 'argumentum ad verbosium'). If you would move away from such a hostile/paranoid mindset, and consider that there might be some truth in what I said at the end of post #1190:

But of course the most charitable and fair assumption is that communication about complex issues like these is sometimes difficult and arguments that may seem clear to you can seem genuinely ambiguous to intelligent readers who aren't privy to all your thought processes.

...then this discussion would probably proceed a lot more smoothly and with less hostility.

JesseM · Aug 7, 2010

billschnieder said:

You must agree therefore that the following is Bell's inequality.
[tex]
|\sum_{i} A(a, \lambda_{i} )A(b,\lambda_{i} ) P(\lambda_{i} ) + \sum_{i} A(a, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} )| - \sum_{i} A(b, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} ) \leq 1 [/tex]

Which can be factored in this form.
[tex]
|\sum_{i} P(\lambda_{i} )A(a, \lambda_{i} )\left [ A(b,\lambda_{i} ) + A(c,\lambda_{i} )\right ]| - \sum_{i} A(b, \lambda_{i} )A(c,\lambda_{i} ) P(\lambda_{i} ) \leq 1
[/tex]

Bell himself did a similar factorization. Therefore if for any dataset the two equations above produce different results, it means the dataset is not compatible with Bell's inequality for purely mathematical reasons. Do you agree? If you don't please explain clearly.

If by "dataset" you mean some finite collection of experimental results, then I don't agree. The above equations are correct only insofar as they refer to the "true" probabilities and expectation values, which in frequentist terms can be understood in terms of fractions of trials with different possible results in the limit as the number of trials goes to infinity. But as I said in the following section of post #1215, Bell's proof is primarily about these ideal "true" probabilities and expectation values, then if you want to connect this with experimental data you have to invoke the law of large numbers (which is really implicit in all physical predictions involving probabilities, so physicists typically don't state this explicitly):

true probabilities are understood to be different from actual frequencies on a finite number of trials in the frequentist view, and I don't think there's any sensible way to interpret the probabilities that appear in Bell's proof in non-frequentist terms. An "expectation value" like E(a,b) would be interpreted in frequentist terms as the expected average result in the limit as the number of trials (on a run with detector settings a,b) goes to infinity, and likewise the ideal probability distribution ρ(λi) would in frequentist terms give the fraction of all trials where λ took the specific value λi, again in the limit as the number of trials goes to infinity. Then you can show theoretically that given Bell's physical assumptions, we can derive an inequality like this one:

1 + E(b,c) >= |E(a,b) - E(a,c)|

Then by the law of large numbers, you can show that the likelihood of a significant difference between the "true" expectation value E(b,c) and the experimental average (average for product of two results on all trials where detectors were set to b and c) becomes tiny as the number of trials becomes reasonably large (say, 1000), regardless of whether the ideal probability distribution ρ(λi) is very different from the actual function F(λi) describing the fraction of trials with each value of λi (both functions would be unknown to the experimenter but they should have some true objective value which might be known to an omniscient observer). So, from this we can conclude that with a reasonably large number of trials, it'd be astronomically unlikely in a local realist universe for the experimental data to violate this inequality:

1 + (average value of b*c for all trials where experimenter sampled b and c)
>= |(average value of a*b for all trials where experimenter sampled a and b) - (average value of a*c for all trials where experimenter sampled a and c)|

billschnieder · Aug 7, 2010

The points made in your recent posts have already been pre-empted and rebutted in my posts
#1211 and #1212 so consider those as responses. You probably did not see them before developing your recent responses. If there are any points you still contest after reading those two posts, please indicate and I will re-explain in yet simpler terms.

JesseM · Aug 7, 2010

billschnieder said:

Now let us go to Bell's equation (2) where he defines his expectation values

Bell said:

[tex]E(a,b) = \int d\lambda \rho (\lambda )A(a,\lambda )B(b,\lambda )[/tex]

Perhaps I am over-interpreting your use of the word "defines", but as I argued towards the end of post #1213 (starting with the paragraph that begins 'I don't understand how you can say...'), this paragraph cannot be taken as the definition of E(a,b), rather E(a,b) is understood to be defined in a physical way as the expectation value for the product of two measurements on an entangled particle pair with detector settings a and b. This expectation value is understood as a sum of the different possible measurement outcomes weighted by their "true" probabilities:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

And here the probabilities are the "objective" ones that would correspond in frequentist terms to the frequencies in the limit as the number of trials went to infinity.

Bell then gives some physical arguments as to why we'd expect the expectation value to take this form:

[tex]E(a,b) = \int d\lambda \rho (\lambda )A(a,\lambda )B(b,\lambda )[/tex]

And here as before, [tex]\rho(\lambda)[/tex] is assumed to be the "objective" probability distribution, not something we need to measure or even make guesses about in practice. We don't need to know anything about the details of this probability distribution to derive a general inequality that is expected to apply to the "true" probabilities of different measurement results under any set of local realist laws, and then we can use the law of large numbers to conclude that if we do some sufficient number of trials, our actual experimental averages are astronomically unlikely to differ from the expectation values determined by the "true" probabilities. Once again, here's my summary of the logic from post #1215:

true probabilities are understood to be different from actual frequencies on a finite number of trials in the frequentist view, and I don't think there's any sensible way to interpret the probabilities that appear in Bell's proof in non-frequentist terms. An "expectation value" like E(a,b) would be interpreted in frequentist terms as the expected average result in the limit as the number of trials (on a run with detector settings a,b) goes to infinity, and likewise the ideal probability distribution ρ(λi) would in frequentist terms give the fraction of all trials where λ took the specific value λi, again in the limit as the number of trials goes to infinity. Then you can show theoretically that given Bell's physical assumptions, we can derive an inequality like this one:

1 + E(b,c) >= |E(a,b) - E(a,c)|

Then by the law of large numbers, you can show that the likelihood of a significant difference between the "true" expectation value E(b,c) and the experimental average (average for product of two results on all trials where detectors were set to b and c) becomes tiny as the number of trials becomes reasonably large (say, 1000), regardless of whether the ideal probability distribution ρ(λi) is very different from the actual function F(λi) describing the fraction of trials with each value of λi (both functions would be unknown to the experimenter but they should have some true objective value which might be known to an omniscient observer). So, from this we can conclude that with a reasonably large number of trials, it'd be astronomically unlikely in a local realist universe for the experimental data to violate this inequality:

1 + (average value of b*c for all trials where experimenter sampled b and c)
>= |(average value of a*b for all trials where experimenter sampled a and b) - (average value of a*c for all trials where experimenter sampled a and c)|

If you disagree with any of the above, please go back and address my specific arguments in posts #1213-1215.

billschnieder said:

Note, what Bell is doing here is calculating the weighted average of the product A(a,λ)*B(b,λ) for all λ. Which is essentially the expectation value. Theoretically the above makes sense, where you measure each A(a,.), B(b,.) pair exactly once for a specific λ, and simply multiply with the probability of realizing that specific λ and then add up subsequent ones to get your expectation value E(a,b). But practically, you could obtain the same E(a,b) by calculating a simple average over a representative set of outcomes in which the frequency of realization of a specific λ, is equivalent to it's probability. ie

For example, if we had only 3 possible λ's (λ1, λ2, λ3) with probabilities (0.3, 0.5, 0.2) respectively. The expectation value will be
E(a,b) = 0.3*A(a,λ1)*B(b,λ1) + 0.5*A(a,λ2)*B(b,λ2) + 0.2*A(a,λ3)*B(b,λ3)

Where each outcome for a specific lambda exists exactly once. OR we can calculate it using a simple average, from a dataset of 10 data points, in which A(a,λ1),B(b,λ1) was realized exactly 3 times (3/10 = 0.3), A(a,λ2), B(b,λ2) was realized 5 times, and A(a,λ3), B(b,λ3) was realized 2 times; or any other such dataset of N entries where the relative frequencies are representative of the probabilities. Practically, this is the only way available to obtain expectation values, since no experimenter has any idea what the λ's are or how many of them there are.

The comment above is completely misguided, since the basic definition of "expectation value" in this experiment has nothing at all to do with knowing the value of λ, it is just understood to be:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

Bell argues on a theoretical basis that E(a,b) should also be given by the integral involving [tex]\rho(\lambda)[/tex], but the above should be understood as the basic meaning of an "expectation value". And by the law of large numbers, if you repeat the experiment a fairly large number of times (say 1000), the chances that the fraction of trials where you got some particular result (say, +1 with setting a and +1 with setting b) is significantly different from the "true probability" of that result (in this case P(detector with setting a gets result +1, detector with setting b gets result +1)) would become astronomically small, even if the number of trials was tiny compared to the number of possible values of λ. I gave a bunch of argument for this claim about the law of large numbers in post #1214, so if you disagree please go back and address that post. If you don't disagree, then you can see why in order to compare the inequality with experimental data we don't have to consider λ at all, we just have to use our dataset of pairs to find the average for the product of two results on each of the three combinations of different detector settings.

billschnieder said:

All they can do is assume that by measuring a large number of points, their data will be as representative as illustrated above.

They assume the averages from their data are close to the "true" expectation values E(a,b), E(b,c) and E(a,c), which can be justified by the law of large numbers, but there is no need to assume that the (unknown) frequencies of different values of λi which occurred in the particle pairs they sampled was anything like the "true" probability distribution p(λi). Do you disagree?

billschnieder said:

So then in this case, assuming discrete λ's, that Bell's equation (2) is equivalent to the following simple average:
[tex]E(a,b) = \frac{1}{N} \sum_{i}^{N} A(a,\lambda _i)B(b,\lambda _i)[/tex]

How is it equivalent? It's quite possible that P(λ2) could be very different from P(λ3), for example, in which case you need to weigh the terms A(a,λ2)*B(b,λ2) and A(a,λ3)*B(b,λ3) by the probabilities of those values if you want to get an accurate expectation value. The correct discrete version would have to look like this:
[tex]E(a,b) = \frac{1}{N} \sum_{i}^{N} A(a,\lambda _i)*B(b,\lambda _i)*P(\lambda_i)[/tex]

billschnieder said:

Since in any real experiment we do not know which λ is realized for any specific iteration, we can drop lambda from the equation altogether without any impact, where we have simply absorbed the λ into the specific variant of the functions A,B operating for iteration i (that is Ai and Bi)
[tex]E(a,b) = \frac{1}{N} \sum_{i}^{N} A(a)_{i}B(b)_{i}[/tex]

Well, the i's in λi weren't supposed to be iterations, but rather were just a way of indexing all physically possible values that the hidden variables could take on that type of experiment--there could well be more possible values of i than particles in the observable universe! So if i in the equation above is supposed to refer to iterations you've significantly changed the meaning of the index, from something theoretical to something empirical. And again, Bell's reasoning is based on the "true" or "objective" probabilities of different outcomes which give the "true" expectation value, which is different from the empirical average which you are computing above, although the law of large numbers means that the difference between the two becomes small for a reasonably large number of trials (again see post #1214 on this point). Still, it's important to distinguish theoretical from empirical, so let's use E(a,b) to be the "true" expectation value for the product of the measurements with settings a and b, and Avg(a,b) to be the empirical average of all the products of measurement results on a run with settings a and b, and then we can say that in the limit as the number of trials/iterations in a run goes to infinity, Avg(a,b) should approach E(a,b) with probability 1. In this case I would rewrite the above as:

[tex]Avg(a,b) = \frac{1}{N} \sum_{i}^{N} A(a)_{i}B(b)_{i}[/tex]

billschnieder said:

And we could adopt a simplified notation in which we replace the function [tex]A(a)_i[/tex] with the outcome [tex]\alpha _i[/tex] and [tex]B(b)_i[/tex] with [tex]\beta _i[/tex]. Note that the outcomes of our functions are restricted to values (+1 or -1) and we could say [tex]\alpha = \pm 1, \beta = \pm 1[/tex]

To get:
[tex]E(a,b) = \frac{1}{N} \sum_{i}^{N} \alpha _{i} \beta _{i} = <\alpha \beta >[/tex]

Which I would rewrite as:

[tex]Avg(a,b) = \frac{1}{N} \sum_{i}^{N} \alpha _{i} \beta _{i} = <\alpha \beta >[/tex]

billschnieder said:

Let us then develop our analogy involving our a' and b' to the same point. Remember our first assumption was that we had two such arbitrary variables a' and b' with values (+1 or -1). Now consider the situation in which we had a list of pairs of such variables of length N. Let us designate our list [(a',b')] to indicate that each entry in the list is a pair of (a',b') values. Let us define the expectation value of the pair product for our list as follows:
[tex]E(a',b') = \frac{1}{N} \sum_{i}^{N} {a}'_{i} {b}'_{i} = <a'b'>[/tex]

Again this doesn't work as a theoretical expectation value since i refers to some number of iterations, whereas a theoretical expectation value for an experiment which can give anyone of N results R1, R2, ..., RN always has the form [tex]E(R) = \sum_{i=1}^N R_i * P(R_i)[/tex]. However, it does work as a way of computing the average for the product of a' and b' for a list of values, so in my notation:

[tex]Avg(a',b') = \frac{1}{N} \sum_{i}^{N} {a}'_{i} {b}'_{i} = <a'b'>[/tex]

billschnieder said:

For all practical purposes, this equation is exactly the same as the previous one and the terms a' and b' are mathematically equivalent to α and β respectively. What this shows is that the physical assumptions about existence of hidden variables, locality etc are not necessary to obtain an expression for the expectation values for a pair product.

As I said, you are not really computing an expectation value but just an average, which in the limit as the number N of iterations went to infinity would approach the true expectation value with probability 1.

billschnieder said:

We have obtained the same thing just by defining two variables a', b' with values (+1 and -1) and calculating the expectation value for the paired product of a list of pairs of these variables. You could say the reason Bell obtained the same same expression is because he just happened to be dealing with two functions which can have values (+1 and -1) for physical reasons and experiments producing a list of such pairs. And he just happened to be interested in the pair product of those functions for physical reasons. But the structure of the calculation of the expectation value is determined entirely by the mathematics and not the physics. Once you have two variables with values (+1 and -1) and a list of pairs of such values, the above equations should arise no matter the process producing the values, whether physical, mystical, non-local, spooky, super-luminal, or anything you can dream about. That is why I say the physical assumptions are peripheral.

Physical assumptions are peripheral to calculating averages from experimental data, it's true, and they're also peripheral to writing down expectation values in terms of the "true" probabilities as I did when I wrote [tex]E(R) = \sum_{i=1}^N R_i * P(R_i)[/tex], with the following equation as a special case of this general form:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

...but you can't derive useful inequalities like 1 + E(b,c) >= |E(a,b) - E(a,c)| from such simple definitions! For that you need to make some physical assumptions which allow you to show that the "true" expectation values can also be written in some more specific form, such as:

[tex]E(a,b) = - \sum_{i=1}^N A(a,\lambda_i)*A(b,\lambda_i)*P(\lambda_i)[/tex]
[tex]E(b,c) = - \sum_{i=1}^N A(b,\lambda_i)*A(c,\lambda_i)*P(\lambda_i)[/tex]
[tex]E(a,c) = - \sum_{i=1}^N A(a,\lambda_i)*A(c,\lambda_i)*P(\lambda_i)[/tex]

...and then it's from these more specific forms that you derive the inequalities.

billschnieder said:

Note a few things about the above equation. a'_i and b'_i must be multiplied with each other. If we independently reorder the columns in our list so that we have different pairings of a'_i and b'_i, we will obtain the same expectation value only in the most improbable of situations. To see this, consider the simple list below

a' b'
+ -
- +
- +
+ -

<a'b'> = -1/4

If we rearrange the b' column so that the pairing is no longer the same, we may have something like the following were we have the same number of +'s and -'s but their pairing is different:

a' b'
+ +
- -
- -
+ +

<a'b'> = 1/4
Which tells us that we are dealing with an entirely different dataset.

OK, sure, if you are allowed to resort pairs at will you can get different averages for the products of pairs. But in Bell's theorem it's assumed that all the "products of two measurement results" are each from pairs of measurements on a single pair of entangled particles, you're not allowed to resort the data in this way.

JesseM · Aug 7, 2010

billschnieder said:

(continued from the last post)

So far we have dealt with pairs, just like Bell up to his equation (14). Let us then, following in Bell's footsteps introduce the third variable (see page 406 of his original paper).

Bell said:

It follows that c is another unit vector
[tex]E(a,b) - E(a,c) = -\int d\lambda \rho (\lambda )[A(a,\lambda )A(b,\lambda )-A(a,\lambda )A(c,\lambda )][/tex]
using (1), whence
[tex]\left | E(a,b)-E(a,c) \right |\leq \int d\lambda \rho [1 - A(b,\lambda)A(c,\lambda )][/tex]
The second term on the right is E(b,c), whence
1 + E(b,c) >= |E(a,b) - E(a,c)| ... (15)

Note a few things here: Bell factorizes at will within the integral. ρ(λ) is a factor of every term under the integral. That is why I explained in my previous detailed post that ρ(λ) must be the same for all three terms.

And I explained in #1213 that it doesn't make any sense to use these equations as the reason why ρ(λ) should be the same in all three terms, since the equations he writes down for E(a,b) and E(b,c) and E(a,c) are not meant to be definitions of the expectation values, but rather conclusions about how the expectation values can be written down in a universe that obeys local realist laws along with the no-conspiracy assumption. See everything in post #1213 starting with the paragraph that begins "I don't understand how you can say..."

Anyway, if we accept Bell's physical argument that in a local realist universe we should be able to write the expectation values as follows:

[tex]E(a,b) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(b,\lambda )[/tex]
[tex]E(a,c) = -\int d\lambda\rho (\lambda )A(a,\lambda )A(c,\lambda )[/tex]
[tex]E(b,c) = -\int d\lambda\rho (\lambda )A(b,\lambda )A(c,\lambda )[/tex]

...then we can see why the factorization he does in the equations you wrote above should be justified. But he does need to make that physical argument to justify it.

Also, there is some ambiguity in what you mean when you say "ρ(λ) must be the same for all three terms", I discussed this at the start of post #1214. I was interpreting it just as a statement that the "true" or "objective" probability distributions on different values of λ (which would give the frequencies of different values of λ that would be expected in the limit as the number of trials went to infinity) should not depend on the detector settings. If you mean something different, like that the actual finite run of trials on each detector setting should involve the same frequencies of different values of λ, then I disagree that Bell's equation implies anything of the sort since it only deals with "true" probabilities and not empirical results, but again see post #1214 for the detailed discussion on this point.

billschnieder said:

Secondly, Bell derives the expectation value term E(b,c) by factoring out the corresponding A(b,.) and A(c,.) terms from E(a,b) and E(a,c). Therefore, E(b,c) does not contain different A(b,.) and A(c,.) terms but the exact same ones present in E(a,b) and E(a,c).

I don't know why you have replaced terms like A(b,λ) with notation like A(b,.)--easier to type, or some deeper significance? Anyway, Bell is assuming that for any given value of λi, A(a,λi) is the same regardless of whether the other detector was on setting b or setting c, and so forth for A(b,λi) and A(c,λi). In other words, the result at a given detector depends only on that detector's setting and the value of all hidden variables on that trial, it doesn't depend on the other detector's setting (and we wouldn't expect it to in a local realist universe!) Is this all you're saying, or do you think the factorization has some further implications?

billschnieder said:

In other words, in order to obtain all three expectation values E(a,b), E(a,c) and E(b,c), we ONLY need three lists of outcomes corresponding to A(a,.), A(b,.), A(c,.) or in simpler notation, we only need a single list of triples [(a',b',c')] to calculate all terms for

1 + <b'c'> >= |<a'b'> - <a'c'>|

No, again it seems like you are confusing theoretical terms with empirical results. E(a,b) doesn't depend on what results we got on any finite series of trials, it's the "true" expectation value that can be defined as

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

Where each of the P's represents the "true" or "objective" probability for that pair of results, as distinguished from the fraction of some finite number of trials where that pair of results was seen (as always, in frequentist terms the objective probabilities would be the fraction of trials with that pair of results in the limit as the number of trials goes to infinity).

billschnieder said:

So then, we are destined to obtain this inequality for any list of triples of two valued variables (or outcomes of two-valued functions) were the allowed values are (+1 or -1), no matter the physical, metaphysical or mystical situation generating the triples.

But that's not the situation with Bell's theorem. Rather, with Bell's theorem we have three runs with different combinations of detector settings (a,b), (b,c) and (a,c), and considering the average from each run. Bell is showing that if we know the true expectation values for each individual run, in a local realist universe they should obey:

1 + E(b,c) >= |E(a,b) - E(a,c)|

Since each expectation value is for a different run, even if you assume that every iteration of every run is determined by a set of triples, you can't derive the above equation from arithmetic alone since each expectation value would deal with a different collection of triples. So, you do need to consider the "physical, metaphysical or mystical situation generating the triples". And once you are convinced that the above equation should hold for the true expectation values, then by the law of large numbers you can conclude that if you do 1000 trials on each run, in a local realist universe you are astronomically unlikely to see a violation of the following inequality on your data:

1 + (average for product of results on the run with settings b and c) >=
|(average for product of results on the run with settings a and b) -
(average for product of results on the run with settings a and c)|

billschnieder said:

Suppose now that we generate from our list of triples, three lists of pairs corresponding to [(a',b')], [(a',c')] and [(b',c')], we can simply calculate our averages and be done with it. It doesn't matter if the order of pairs in the lists are randomized so long as the pairs are kept together. In this case, we can still sort them as described in my previous detailed description, to regenerate our list of triples from the three lists of pairs.

See my questions and arguments about your "resorting" procedure in post #1215. First I clarified what I thought you meant by this form of "resorting" at the start of the post with a simple example, perhaps you can tell me if I've got it right or not. If I have got it right, then please address my subsequent comments and questions:

If so, I don't see how this ensures that "ρ(λi) is the same for all three terms of the inequality", or what you even mean by that. For example, isn't it possible that if the number of possible values of λ is 1000, then even though iteration #1 of the first run has been grouped in the same row as iteration #3 of the second run and iteration #2 of the third run (according to their original labels), that doesn't mean the value of λ was the same for each of these three iterations? For example, might it not have been the case that iteration #1 of the first run had λ₂₀₃, iteration #3 of the second run had λ₇₆₉, and iteration #2 of the third run had λ₄₈₈?

As a separate issue it is of course true that if your full set of data can be resorted in this way, that's enough to guarantee mathematically that the data will obey Bell's inequality. But this is a very special case, I think it would be fairly unlikely that the full set of iterations from each run could be resorted such that every row would have the same value of a,b,c throughout, even if the data was obtained in a local realist universe that obeyed Bell's theoretical assumptions, and even if the overall averages from each run actually did obey the Bell inequality.

billschnieder said:

Now the way Bell-test experiments are usually done, is analogous to collecting three lists of pairs randomly with the assumption that these three lists are representative of the three lists of pairs which we would have obtain from a list of triple, had we been able to measure at three angles simultaneously.

Yes, that's true. Since there are only eight possible distinct triples, and the value of λ on each trial completely determines the type of triple on that trial, and we assume the true probability distribution P(λ) is the same regardless of the detector settings, then with some reasonably large number of trials (say 1000) on each run we do expect that:

Fraction of trials on first run where the hidden triple was a=+1, b=-1 and c=+1

is very close to

Fraction of trials on second run where the hidden triple was a=+1, b=-1 and c=+1

and to

Fraction of trials on third run where the hidden triple was a=+1, b=-1 and c=+1

And likewise for the fractions of the other seven types of triples that occurred on each run. Do you agree this is a reasonable expectation thanks to the law of large numbers?

billschnieder said:

And if each list was sufficiently long, the averages will be close to those of the ideal situation assumed by Bell. Again, remember that within each list of pairs actually measured, the individual pairs such as (a',b')_i measured together are assumed to have originated from a specific theoretical triple, (a',c')_j from another triple, and (b',c')_k from another triple. Therefore, our dataset from a real experiment is analogous to our three theoretical lists above, where we randomized the order but kept the pairs together while randomizing. Which means, it should be possible to regenerate our single list of triples simply by resorting the three lists of pairs while keeping the individual pairs together, as I explained previously.

Even if the data was drawn from triples, and the probability of different trials didn't depend on the detector settings on each run, there's no guarantee you'd be able to exactly resort the data in the manner of my example in post #1215, where we were able to resort the data so that every row (consisting of three pairs from three runs) had the same value of a,b,c throughout. You might be able to sort it so that most rows of three pairs had the same value of a,b,c throughout, but probably not all. This would at least give a way of roughly estimating the frequencies of different types of triples, though.

billschnieder said:

If we can not do this, it means either that:
a) our data is most likely of the second kind in which randomization did not keep the pairs together or

Well, we know this does not apply in Bell tests, where every data pair is always from a single trial with a single pair of measurements on a single pair of entangled particles.

billschnieder said:

b) each list of pairs resulted from different lists of triples and/or

If the frequencies of each of the 8 types of triples differed significantly in three runs with a significant (say, 1000 or more) number of trials in each, this would imply either an astronomically unlikely statistical miracle or it would imply that the no-conspiracy assumption is false and that the true probabilities of different triples actually does change depending on the detector settings.

billschnieder said:

c) our lists of pairs are not representative of the list of triples from which they arose

Not sure I follow what you mean here. Are you suggesting that even if we had a triple like a=+1, b=-1, c=+1 we might still get result -1 with detector setting a? If so what would be the point of assuming the data arose from triples in the first place? Remember that Bell's assumption of predetermined results on each axis came from the fact that whenever both particles were measured on the same axis they always gave opposite results--in a local realist universe where the decisions about the two detector settings can have a spacelike separation, it seems impossible to explain this result otherwise (though some of Bell's later proofs dropped the assumption of always getting opposite or identical results when both experimenters used the same setting).

billschnieder said:

In any of these cases, Bell's inequality does not and can not apply to the data. In other words, it is simply a mathematical error to use the inequality in such situations.

No, the fact that Bell's inequality is observed not to work is empirical evidence that one of the assumptions used in the derivation must be false, like the assumption that local realism is true (with the conclusion of predetermined triples following from this assumption along with the observation that using the same angle always yields opposite results), or the no-conspiracy assumption. Unless you want to argue (and you probably do) that even if we assume the validity of those theoretical assumptions, this does not necessarily imply Bell's inequality should hold for the type of experiment he describes.

billschnieder said:

Also note that these represent the only scenarios in which "average value of a*b for all triples" is different from "average value of a*b for measured pairs only". And in this case, the fair sampling assumption can not hold.

What do you mean by "fair sampling assumption"? This page says "It states that the sample of detected pairs is representative of the pairs emitted", but that could be true and Bell's inequality could still fail for some other reason like a violation of the no-conspiracy assumption.

JesseM · Aug 7, 2010

billschnieder said:

The points made in your recent posts have already been pre-empted and rebutted in my posts
#1211 and #1212 so consider those as responses. You probably did not see them before developing your recent responses. If there are any points you still contest after reading those two posts, please indicate and I will re-explain in yet simpler terms.

Having replied to these, I saw nothing in them that could be considered a rebuttal of any of the points I made in #1213-#1215. I indicated in my replies to #1211 and #1212 where I thought various claims made in those posts had been disputed or questioned in #1213-#1215, so if you disagree with some of the things I say in my recent replies you can go back and address the corresponding arguments/questions in the earlier posts.

billschnieder · Aug 8, 2010

JesseM said:

billschnieder said:

Now let us go to Bell's equation (2) where he defines his expection values ...

Perhaps I am over-interpreting your use of the word "defines", but as I argued towards the end of post #1213 (starting with the paragraph that begins 'I don't understand how you can say...'), this paragraph cannot be taken as the definition of E(a,b), rather E(a,b) is understood to be defined in a physical way as the expectation value for the product of two measurements on an entangled particle pair with detector settings a and b.

You are grasping at straws here. First of all, I said the equation is Bell's definition of HIS expectation values for the situation he is working with.
Secondly, nobody said anything about the probabilities in the equation not being true probabilities, so you are complaining about an inexistent issue. Thirdly, you object to my statement but go on to say the exact same thing. This is what I said after the equation:

billschnieder said:

Note, what Bell is doing here is calculating the weighted average of the product A(a,λ)*B(b,λ) for all λ. Which is essentially the expectation value. Theoretically the above makes sense, where you measure each A(a,.), B(b,.) pair exactly once for a specific λ, and simply multiply with the probability of realizing that specific λ and then add up subsequent ones to get your expectation value E(a,b). But practically, you could obtain the same E(a,b) by calculating a simple average over a representative set of outcomes in which the frequency of realization of a specific λ, is equivalent to it's probability. ie

For example, if we had only 3 possible λ's (λ1, λ2, λ3) with probabilities (0.3, 0.5, 0.2) respectively. The expectation value will be
E(a,b) = 0.3*A(a,λ1)*B(b,λ1) + 0.5*A(a,λ2)*B(b,λ2) + 0.2*A(a,λ3)*B(b,λ3)

Where each outcome for a specific lambda exists exactly once. OR we can calculate it using a simple average, from a dataset of 10 data points, in which A(a,λ1),B(b,λ1) was realized exactly 3 times (3/10 = 0.3), A(a,λ2), B(b,λ2) was realized 5 times, and A(a,λ3), B(b,λ3) was realized 2 times; or any other such dataset of N entries where the relative frequencies are representative of the probabilities.

And this is how it is described on Wikipedia:

Wikipedia said:

http://en.wikipedia.org/wiki/Expected_value
In probability theory and statistics, the expected value (or expectation value, or mathematical expectation, or mean, or first moment) of a random variable is the integral of the random variable with respect to its probability measure.

For discrete random variables this is equivalent to the probability-weighted sum of the possible values.

For continuous random variables with a density function it is the probability density-weighted integral of the possible values.

The term "expected value" can be misleading. It must not be confused with the "most probable value." The expected value is in general not a typical value that the random variable can take on. It is often helpful to interpret the expected value of a random variable as the long-run average value of the variable over many independent repetitions of an experiment.

The expected value may be intuitively understood by the law of large numbers: The expected value, when it exists, is almost surely the limit of the sample mean as sample size grows to infinity.

So when you say:

JesseM said:

This expectation value is understood as a sum of the different possible measurement outcomes weighted by their "true" probabilities:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

...

The comment above is completely misguided, since the basic definition of "expectation value" in this experiment has nothing at all to do with knowing the value of λ, it is just understood to be:

E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)

It clearly shows that you do not understand probability or statistics. Clearly the definition of expectation value is based on probability weighted sum, and law of large numbers is used as an approximation, that is why it says in the last sentence above that the expectation values is "almost surely the limit of the sample mean as the sample size grows to infinity"

You are trying to restrict the definition by suggesting that expection value is defined ONLY over the possible paired outcomes (++, --, +-, -+) and not possible λ's, but that is naive, and short-sighted but also ridiculous as we will see shortly. Now let us go back to the first sentence of the wikipedia definition above and notice the last two words "probability measure". In case you do not know what that means, a probability meaure is simply any real valued function which assigns 1 to the entire probablity space and maps events into the range from 0 to 1. An expectation value can be defined over any such probabiliy measure, not just the one you pick and choose for argumentation purposes. In Bell's equation (2),
[tex] \int d\lambda \rho (\lambda ) = 1 [/tex]
Therefore ρ(λ) is a probability measure over the paired products A(a,λ)A(b,λ) and Bell's equation (2) IS defining an expectation value for paired products irrespective of any physical assumptions. There is no escape for you here.

billschnieder · Aug 8, 2010

JesseM said:

If you disagree with any of the above, please go back and address my specific arguments in posts #1213-1215

Of course I disagree with a lot of it, for reasons I have already explained in above, I do not see the need to respond specifically. Anyone following the discussion will immediately recognize this fact. For example, you argued earlier that there was a difference between "average value of b*c for all measurements" and "average value of b*c for all triples" with the former one being the one applicable to Bell's inequality:

JesseM said:

Here you seem to be talking about conditions under which an inequality like this:

|(average value of a*b for all triples in which experimenter measured a and b) + (average value of a*c for all triples in which experimenter measured a and c)| - (average value of b*c for all triples in which experimenter measures b and c) <= 1

...can be derived. This is an entirely separate issue from the other point I was arguing, which was just the idea that the above inequality is not guaranteed to hold in spite of the fact that its arithmetical analogue is guaranteed:

|(average value of a*b for all triples) + (average value of a*c for all triples)| - (average value of b*c for all triples) <= 1

Anyway, if you agree that these types of inequalities are conceptually separate, that Bell's inequality was of the top type, and that a proof of the bottom one doesn't constitute a proof of the top

You continued to object despite my argument that as far as Bell's inequality is concerned, the two are equivalent. But now as your argument morphs to try and avoid the trap which requires ρ(λ) to be the same between terms, you are now claiming that the two are really really the same with a probability close to 1, because of the law of large numbers.

JesseM said:

Then by the law of large numbers, you can show that the likelihood of a significant difference between the "true" expectation value E(b,c) and the experimental average (average for product of two results on all trials where detectors were set to b and c) becomes tiny as the number of trials becomes reasonably large (say, 1000), regardless of whether the ideal probability distribution ρ(λi) is very different

There is no escape for you here either.

JesseM] [quote="billschnieder said:

All they can do is assume that by measuring a large number of points, their data will be as representative as illustrated above.

They assume the averages from their data are close to the "true" expectation values E(a,b), E(b,c) and E(a,c), which can be justified by the law of large numbers, but there is no need to assume that the (unknown) frequencies of different values of λi which occurred in the particle pairs they sampled was anything like the "true" probability distribution p(λi). Do you disagree?[/quote]
Yes I disagree. Again here you are grasping at straws. The law of large number is only able to approximate the true expectation value, precisely because ρ(λi) for the ver large sample will almost always not be significantly different from the true probability distribution. If it differs significantly, the law of large numbers will definitely not produce the true expectation value. Just by measuring an extremely large number does not guarantee a representative sample.

So by assuming a that the expectation values are the same for a very large number of measurements, they are in effect also assuming that the probability distribution ρ(λi) in the sample is representative of the true distribution. From these silly mistakes and your recent discussion with RUTA in another thread, I am convinced that you do not understand probability and statistics. Unless you really understand it but are just trying to obfuscate.

JesseM said:

billschnieder said:

[tex]E(a,b)= \frac{1}{N} \sum_{i}^{N}A(a,\lambda _i)B(b,\lambda _i)[/tex]

How is it equivalent? It's quite possible that P(λ2) could be very different from P(λ3), for example, in which case you need to weigh the terms A(a,λ2)*B(b,λ2) and A(a,λ3)*B(b,λ3) by the probabilities of those values if you want to get an accurate expectation value. The correct discrete version would have to look like this
[tex]E(a,b)= \frac{1}{N} \sum_{i}^{N}A(a,\lambda _i)*B(b,\lambda _i)*P(\lambda _i)[/tex]

You were not following when I explained earlier the following:

billschnieder said:

For example, if we had only 3 possible λ's (λ1, λ2, λ3) with probabilities (0.3, 0.5, 0.2) respectively. The expectation value will be
E(a,b) = 0.3*A(a,λ1)*B(b,λ1) + 0.5*A(a,λ2)*B(b,λ2) + 0.2*A(a,λ3)*B(b,λ3)

Where each outcome for a specific lambda exists exactly once. OR we can calculate it using a simple average, from a dataset of 10 data points, in which A(a,λ1),B(b,λ1) was realized exactly 3 times (3/10 = 0.3), A(a,λ2), B(b,λ2) was realized 5 times, and A(a,λ3), B(b,λ3) was realized 2 times; or any other such dataset of N entries where the relative frequencies are representative of the probabilities. Practically, this is the only way available to obtain expectation values, since no experimenter has any idea what the λ's are or how many of them there are. All they can do is assume that by measuring a large number of points, their data will be as representative as illustrated above.(This is the fair sampling assumption which is however not the focus of this post.) So then in this case, assuming discrete λ's, that Bell's equation (2) is equivalent to the following simple average

So your objection above is short-sighted because practically in any experiment P(λ) can not be known so the expectation value can not be calculated using P(λ), but can be calculated as a simple average from a large number of samples which is representative in the sense that the relative frequencies of realizing specific λ's are not significantly different from the true probability of the specific λ's. So your "correction" above is wrong because, you failed to understand the part where I explained that the realizations of the λ's is not unique. In other words, each specific λ, occurs multiple times with the relative frequency corresponding to it's probability.

JesseM said:

Still, it's important to distinguish theoretical from empirical, so let's use E(a,b) to be the "true" expectation value for the product of the measurements with settings a and b, and Avg(a,b) to be the empirical average of all the products of measurement results on a run with settings a and b, and then we can say that in the limit as the number of trials/iterations in a run goes to infinity, Avg(a,b) should approach E(a,b) with probability 1.

That is an completely artificial distinction. Bell is calculating expectation values, and the only time when a simple average can be substituted for the expectation value is when it is calculated over a representative/fair sample. So your insistence on relabelling the term is just grasping at straws. If you insist on pursuing this ridiculous idea, I ask that you write down the expression for the expectation value for the following example:

You are given a theoretical list of N pairs of real-valued numbers x and y. Write down the mathematical expression for the expectation value for the paired product. Once you have done that, try and swindle your way out of the fact that
a) The structure of the expression so derived does not depend on the actual value N. ie, N could be 5, 100, or infinity.
b) The expression so derived is a theoretical expression not "empirical".
c) The expression so derived is the same as the simple average of the paired products.

JesseM said:

Again this doesn't work as a theoretical expectation value since i refers to some number of iterations, whereas a theoretical expectation value for an experiment which can give anyone of N results R1, R2, ..., RN

Again this is not a serious objection because any serious person would not suggest that because we used i as the iterator in one equation means it must have the exact same meaning in a different equation. I already explained and you understood, that in the first case where we were doing a weighted average over λ's, i was iterating over each λ, with each specific λ occurring exactly once. In the second case which is a simple average, i is iterating over each case instance in a representative sample with the understanding that a specific λ will occur multiple times with the relative frequency corresponding to it's probability. Where the actual value of N does not matter so long as the relative frequencies of ALL λ's in our theoretical list is representative of the "true" probability distribution. The two expressions so calculated are exactly equivalent and both are expectation values. So there is no genuine objection here, and no way to escape either.

billschnieder · Aug 8, 2010

JesseM said:

...but you can't derive useful inequalities like 1 + E(b,c) >= |E(a,b) - E(a,c)| from such simple definitions! For that you need to make some physical assumptions

This is what your entire argument boils down to. You are still struggling to suggest that physical assumptions are needed to derive Bell's inequality. But as I have explained, all you need are the following purely mathematical requirements:

1) a theoretical list of triples (a,b,c) of two-valued variables restricted in value to +/-1
2) Expressions of the expectation value of cyclical paired-products extracted from the list of triples E(a*b), E(a*c) and E(b*c), which I have shown convincingly to be equivalent to <ab>, <ac> and <bc> respectively.

That is all needed. I have shown that the expression for the expectation values E(a,b) is similar to Bell's. I will now show using notation analogous to that at the top of page 406 of Bell's paper that that the above necessarily lead to the inequalities obtained by Bell, without any physical assumptions. Note that despite your claims, you haven't actually pointed to any point in the derivation in which a physical assumption is required.

[tex]<a'b'> - <a'c'> = - \frac{1}{N}\sum_{i}^{N}({a}'_i{b}'_i - {a}'_i{c}'_i)[/tex]
since b' = 1/b' (from b' = +/-1) it follows that
[tex]= \frac{1}{N}\sum_{i}^{N}{a}'_i{b}'_i(\frac{c'_i}{b'_i}-1)[/tex]
and since a'b' = +/-1 it follows that the RHS is maximum when a'b'=1, therefore:
[tex] |<a'b'> - <a'c'>| \leq \frac{1}{N}\sum_{i}^{N}(1 - {b}'_i{c}'_i) [/tex]
[tex] |<a'b'> - <a'c'>| + <b'c'> \leq 1[/tex]
Note, and you can replace a' with -a' or b' with -b' or c' with -c' in the above and get the full family of Bell's original inequalities.

The above mirrors exactly what Bell did at the top of page 406! Now if you continue to argue that there is a physical assumption hidden in there, please show me using Bell's derivation of page 406 AND show above where you think I sneaked in a physical assumption in order to obtain the same expression. Note also, if you do not understand the above derivation, it means you clearly do not understand Bell's derivation at top of page 406

JesseM said:

And I explained in #1213 that it doesn't make any sense to use these equations as the reason why ρ(λ) should be the same in all three terms.

Any serious person following Bell's derivation would have noticed that the integral on the right hand side of the first equation on page 406 is obtained by subtracting two different integrals for E(a,b) and E(a,c) and joining the integral signs into a single integral over λ. In mathematics, this is normally understood by any serious student worthy of a pass grade to mean that E(a,b) and E(a,c) are defined over the same distribution of λ. Also, on the third expression (1st inequality) on page 406 where Bell factors out and recombines the A(b,λ) originally from the E(a,b) term and the A(c,λ) originally from the E(a,b) to generate a new A(c,λ)(b,λ) term all under the same integral over λ, and subsequently separates the RHS into two integrals over the same λ, with the first part yielding 1 and the othe yielding the E(b,c) term. Any person seriously trying to understand my argument rather than just quibble, would understand that the requirement for ρ(λ) to be the same between all the terms is inherent in the derivation. Duh! No doubt you do not yet recognize that your so-called objections were rebutted by Bell himself, even before you thought about them. Sorry no escape here either.

JesseM said:

billschnieder said:

In other words, in order to obtain all three expectation values E(a,b), E(a,c) and E(b,c), we ONLY need three lists of outcomes corresponding to A(a,.), A(b,.), A(c,.) or in simpler notation, we only need a single list of triples [(a',b',c')] to calculate all terms for

1 + <b'c'> >= |<a'b'> - <a'c'>|

No, again it seems like you are confusing theoretical terms with empirical results.
...
But that's not the situation with Bell's theorem. Rather, with Bell's theorem we have three runs with different combinations of detector settings (a,b), (b,c) and (a,c), and considering the average from each run. Bell is showing that if we know the true expectation values for each individual run, in a local realist universe they should obey:

1 + E(b,c) >= |E(a,b) - E(a,c)|

Since each expectation value is for a different run, even if you assume that every iteration of every run is determined by a set of triples, you can't derive the above equation from arithmetic alone since each expectation value would deal with a different collection of triples.

You do not understand Bell's work. Look again at page 406 and tell me how many distinct A(.,λ) type functions do you see. I can identify only three A(a,λ), A(b,λ), A(c,λ), not 6, which is what you are claiming Bell used in his derivation. The 3 expectation values E(a,b), E(a,c) and E(b,c) are merely cyclical combinations of these same terms. So you are off base here. There is no justification in Bell's work for suggesting that Bell is dealing with three 6 separate terms corresponding to three separate runs. You have provided no proof, either mathematical or logical to justify the ridiculous idea that Bell's inequality is derived from 6 separate terms rather than just 3.

However, as I have been pointing out to you over and over, the reason we can not guarantee that an actual experiment will obey Bell's inequality is due to the fact that actual experiments measure 6 different terms while Bell's derivation mandates the use of only 3. So at least here you seem to be seeing the light, only backwards.

billschnieder · Aug 8, 2010

JesseM said:

See my questions and arguments about your "resorting" procedure in post #1215. First I clarified what I thought you meant by this form of "resorting" at the start of the post with a simple example, perhaps you can tell me if I've got it right or not. If I have got it right, then please address my subsequent comments and questions

Yes you claimed to have "clarified" what I mean by resorting, even though I had explained with a detailed example back in post #1187 what I meant. In any case you say:

JesseM said:

I'm not sure I follow what you mean here. Suppose we do only 4 iterations with each pair of different detector settings, and get these results (with the understanding that notation like a=+1 means 'the result with detector set to angle a was +1):

For run with setting (a,b):
1. (a=+1, b=-1)
2. (a=-1, b=-1)
3. (a=-1, b=+1)
4. (a=+1, b=-1)

For run with setting (b,c):
1. (b=-1, c=+1)
2. (b=-1, c=-1)
3. (b=-1, c=+1)
4. (b=+1,c=-1)

For run with setting (a,c):
1. (a=+1, c=-1)
2. (a=+1, c=+1)
3. (a=-1, c=-1)
4. (a=-1, c=+1)

Then we can arrange these results into four rows of three iterations from three runs, such that in each row the value of a is the same for both iterations that sampled a, in each row the value of b is the same for both iterations that sampled b, and in each row the value of c is the same for both iterations that sampled c:

1. (a=+1, b=-1) 3. (b=-1, c=+1) 2. (a=+1, c=+1)
2. (a=-1, b=-1) 1. (b=-1, c=+1) 4. (a=-1, c=+1)
3. (a=-1, b=+1) 4. (b=+1,c=-1) 3. (a=-1, c=-1)
4. (a=+1, b=-1) 2. (b=-1, c=-1) 1. (a=+1, c=-1)

Let us call your three runs (runs 1, 2, 3) and calculate <ab>, <ac> and <bc> from each one.
<a1b1> = -1/4
<a2c2> = -1/4
<b3c3> = 0

Now looking at your resorted list with 6 columns: a1, b1, a2, c2, b3, c3, we can verify that
<a1b1> = <a1b3> = <a2b3> = <a2b1> = -1/4
and
<a2c2> = <a1c2> = <a2c3> = <a1c3> = -1/4
and
<b3c3> = <b1c3> = <b1c2> = <b3c2> = 0

The reason this holds is because after resorting, we see that all the a columns are identical, just like the b and c. So your dataset of 6 columns is in fact just a dataset of 3 columns with each column repeated once. If a dataset cannot be sorted like you did above, all those terms are not guaranteed to be the same. And if they are not the same, Bell's inequality can not be applied to the dataset.

JesseM said:

If so, I don't see how this ensures that "ρ(λi) is the same for all three terms of the inequality", or what you even mean by that. For example, isn't it possible that if the number of possible values of λ is 1000, then even though iteration #1 of the first run has been grouped in the same row as iteration #3 of the second run and iteration #2 of the third run (according to their original labels), that doesn't mean the value of λ was the same for each of these three iterations?

Please, pay attention for once: Every pair of outcomes at those angles is deterministically determined by the specific λ being realized for that iteration. So if for example we had only 5 possible λ's (λ1, λ2, λ3, λ4, λ5), the only possible outcomes are (++, +-, -+, --) which means some of the λ's must result in the same outcome. If say λ5 and λ3 each result in the same outcome (++) deterministically, and each of them was realized in the experiment exactly once, when you resort it, it doesn't matter whether the (++) at the top of the resorted list corresponds to λ5 or λ3 for the following reasons. If in your large number of iterations, λ5 and λ3 are fairly represented, you will still have the right number of (++)'s for both λ5 and λ3 and it doesn't matter if the specific (++) you got at the top is a λ5 ++ or a λ3 ++. Also, if for the three angles under consideration a,b,c a number of λ's deterministically resulted in the same outcomes for (a,b), (b,c) and (a,c) those lambdas are effectively equivalent as far as the experiment is concerned and you could combine them, updating the combined P(λ) appropriately. Finally as clearly explained in my posts #1211 and #1212, being able to sort the data is a test to see if the data meets the mathematical consistency required by Bell's derivation, in which the (b,c) term is derived by factoring out the b from the (a,b) term and factoring out the c from the (a,c) term and multiplying them together. Such factorization imposes a consistency requirement that unless you can do that, the inequality can not be derived and any data which can not be factored likewise, is mathematically incompatible with the inequality.

JesseM said:

Even if the data was drawn from triples, and the probability of different trials didn't depend on the detector settings on each run, there's no guarantee you'd be able to exactly resort the data in the manner of my example in post #1215, where we were able to resort the data so that every row (consisting of three pairs from three runs) had the same value of a,b,c throughout

That is why I cautioned you earlier not to prematurely blurb your claim that conspiracy must be involved for ρ(λi) to be different. Now we get an admission, however reluctantly that it is possible for ρ(λi) to be different without conspiracy. You see, the less you talk (write), the less you will have to recant later as I'm sure you are realizing.

billschnieder · Aug 8, 2010

JesseM said:

If the frequencies of each of the 8 types of triples differed significantly in three runs with a significant (say, 1000 or more) number of trials in each, this would imply either an astronomically unlikely statistical miracle or it would imply that the no-conspiracy assumption is false and that the true probabilities of different triples actually does change depending on the detector settings.

First I would like for you to explain from where you pulled the 1000 number. What rule of mathematics, statistics, or any other field of science enabled you to suggest that 1000 or more was a significantly large number of trials??
Secondly I already explained to you in my response to your Scratch lotto example that, all you need to violate that requirement is for the probability of detection to vary with angle. In other words, a biased sample will do that without any conspiracy. Since the rest of the arguments above have failed, I will predict that you will hang on this one and try to change the discussion to one about scratch lotto cards. Let's wait and see ...

JesseM said:

Not sure I follow what you mean here. Are you suggesting that even if we had a triple like a=+1, b=-1, c=+1 we might still get result -1 with detector setting a?

Why would you choose the most improbable of meanings. I mean that the list of pairs is not representative of the list of triples. Which clearly means that the relative frequency of each specific pair in the list of pairs is not the same as the relative frequency of the same pair in the list of triples.

JesseM said:

In any of these cases, Bell's inequality does not and can not apply to the data. In other words, it is simply a mathematical error to use the inequality in such situations.

No, the fact that Bell's inequality is observed not to work is empirical evidence that one of the assumptions used in the derivation must be false, like the assumption that local realism is true

Hehe, you are again grasping at straws here, trying to sneak in a physical assumption. I have just exhaustively and conclusively explained to you that the requirement to be able to sort the data, and for ρ(λi) to be the same across the three terms is a mathematical requirement of Bell's derivation. In other words, Bell could not have been able to derive his inequalities if these were false. I have also pointed out and you agreed that in any real experiment, these mathematical requirements are not guaranteed to be obeyed. So contrary to your claim that the reason experiments violate Bell's inequality is due to failure of some other physical assumption which you haven't demonstrated to be material for deriving the inequality, the real reason is failure to meet the mathematical conditions that must apply for the inequality to apply to the data.

JesseM said:

billschnieder said:

If we can not do this, it means either that:
a) our data is most likely of the second kind in which randomization did not keep the pairs together or

Well, we know this does not apply in Bell tests, where every data pair is always from a single trial with a single pair of measurements on a single pair of entangled particles.

You do not understand Bell test experiments then. Contrary to your claims, it applies because experimenters are not always sure which particle on one arm corresponds to which particle on the other arm. Have you ever heard of the coincidence time window?

JesseM said:

Also note that these represent the only scenarios in which "average value of a*b for all triples" is different from "average value of a*b for measured pairs only". And in this case, the fair sampling assumption can not hold

What do you mean by "fair sampling assumption"? This page says "It states that the sample of detected pairs is representative of the pairs emitted", but that could be true and Bell's inequality could still fail for some other reason like a violation of the no-conspiracy assumption.

Another objection for objection sake. You object but then present a definition which is essentially what I have given.

billschnieder said:

c) our lists of pairs are not representative of the list of triples from which they arose

If you see a difference, illustrate it.

JesseM said:

Having replied to these, I saw nothing in them that could be considered a rebuttal of any of the points I made in #1213-#1215. I indicated in my replies to #1211 and #1212 where I thought various claims made in those posts had been disputed or questioned in #1213-#1215, so if you disagree with some of the things I say in my recent replies you can go back and address the corresponding arguments/questions in the earlier posts.

All I saw was quibbling, unsubstantiated claims and nothing substantive as I have illustrated in the last few posts.

Is action at a distance possible as envisaged by the EPR Paradox.

Similar threads

Hot Threads

Recent Insights