billschnieder said:
According to you, the wikipedia article is wrong.
No, just that it was failing to adequately distinguish between two notions of the "mean" which could lead to certain readers (you) becoming confused. There weren't any statements that were clearly incorrect.
billschnieder said:
Why don't you correct it.
Your wish is my command. I have edited the opening section of the article to more clearly distinguish between the "sample mean" and the "population mean", and make clear that the expected value is equal to the population mean, not the sample mean:
For a data set, the mean is the sum of the values divided by the number of values. The mean of a set of numbers x1, x2, ..., xn is typically denoted by \bar{x}, pronounced "x bar". This mean is a type of arithmetic mean. If the data set was based on a series of observations obtained by sampling a statistical population, this mean is termed the "sample mean" to distinguish it from the "population mean". The mean is often quoted along with the standard deviation: the mean describes the central location of the data, and the standard deviation describes the spread. An alternative measure of dispersion is the mean deviation, equivalent to the average absolute deviation from the mean. It is less sensitive to outliers, but less mathematically tractable.
If a series of observations is sampled from a larger population (measuring the heights of a sample of adults drawn from the entire world population, for example), or from a probability distribution which gives the probabilities of each possible result, then the larger population or probability distribution can be used to construct a "population mean", which is also the expected value for a sample drawn from this population or probability distribution. For a finite population, this would simply be the arithmetic mean of the given property for every member of the population. For a probability distribution, this would be a sum or integral over every possible value weighted by the probability of that value. It is a universal convention to represent the population mean by the symbol μ.[1] In the case of a discrete probability distribution, the mean of a discrete random variable x is given by taking the product of each possible value of x and its probability P(x), and then adding all these products together, giving \mu = \sum x P(x).[2]
The sample mean may be different than the population mean, especially for small samples, but the law of large numbers dictates that the larger the size of the sample, the more likely it is that the sample mean will be close to the population mean.[3]
As an experiment, let's now see if anyone edits it on the ground that it's incorrect (as opposed to edits for stylistic or other reasons). No fair editing it yourself!
billschnieder said:
It is obvious you are the one who is way off base and you know it.
So, you wish to completely ignore the quotes from various statistics texts I provided? You trust a user-edited site like wikipedia over published texts? Here they are again:
JesseM said:
(edit: See for example
this book which distinguishes the 'sample mean' \bar X from the 'population mean' \mu, and says the sample mean 'may, or may not, be an accurate estimation of the true population mean \mu. Estimates from small samples are especially likely to be inaccurate, simply by chance.' You might also look at
this book which says 'We use \mu, the symbol for the mean of a probability distribution, for the population mean', or
this book which says 'The mean of a discrete probability distribution is simply a weighted average (discussed in Chapter 4) calculated using the following formula: \mu = \sum_{i=1}^n x_i P[x_i ]').
billschnieder said:
All the grandstanding is just a way to stay afloat, not a serious argument agains the well accepted meaning of expectation value.
Wikipedia:
http://en.wikipedia.org/wiki/MeanWikipedia:
http://en.wikipedia.org/wiki/Expected_value
Neither of these sources claim that the expected value is equal to the "sample mean" (i.e. the average of the results obtained on a series of trials), which is what I thought you were claiming when you said:
billschnieder said:
You are given a theoretical list of N pairs of real-valued numbers x and y. Write down the mathematical expression for the expectation value for the paired product.
...
Wow! The correct answer is <xy>
Of course if the "theoretical list" is supposed to represent a population rather than results from a series of trials, and we assume we are picking randomly from the population using a method that has an equal probability of returning any member from the list, in that case I would agree the answer is <xy>. But once again your statement of the problem didn't provide enough information, because the list could equally well be interpreted as a sample, and in that case the expectation value for the paired product would
not necessarily be equal to <xy> since <xy> would just be the sample mean--do you disagree?
JesseM said:
Again, you said nothing about "randomly picking" from a list, you just gave a list itself and asked for the probabilities of one entry on that list.
billschnieder said:
Yes, that is exactly what I did, and you answered that it was impossible to do because you wanted to use ONLY a probability approach that involved "trials".
No I didn't, I just said not enough information was provided. If you specify that the list is intended to be a population and we are picking randomly from the population, that's A-OK with me. I already told you this was fine with me at the end of post #1277.
billschnieder said:
You do the same thing for dice and coins and you have done the same thing in you famous scratch-lotto examples
In the scratch lotto example I explicitly specified that on each trial the experimenters were picking a box at random to scratch, and at some point I bet I even pedantically specified that "at random" means "equal probability of any of the three boxes". With coins and dice it's generally an implicit assumption that each result is equally probable unless the coin/die is specified to be weighted or something.
JesseM said:
Well, excuse me for thinking your question was supposed to have some relation to the topic we were discussing, namely Bell's theorem.
billschnieder said:
While discussing Bell's INEQUALITIES, Not Bell's theorem which we haven't discussed at all
Bell's theorem is just that Bell's inequalities must be obeyed in any local hidden variables theory, and since QM theoretically predicts they will be violated in some circumstances, QM is theoretically incompatible with local hidden variables. Anyway, if you want to be pedantic we're discussing the entirety of Bell's derivation of the inequalities, and whether an analysis of the derivation implies that the inequality is only applicable under some limited circumstances (like it only being applicable to data where it is possible to "resort" in the manner you suggested). My claim is that the correct interpretation of the probabilities in Bell's derivation is that they were meant to be "limit frequentist" probabilities, and that if you look at the derivation with this interpretation in mind it all makes sense, and it shows the final inequalities do
not have the sort of limited applicability you claim.
billschnieder said:
and continue to claim that Bell's equation (2) is not a standard mathematical definition for the expectation value of a paired product.
Nope, it's not. The standard mathematical definition for the expectation value of some variable x (whether it is obtained by taking a product of two other random variables A and B or in some other way) is just a sum or integral over all possible values of x weighted by their probabilities or probability densities, i.e. either \mu = \sum_{i=1}^N x_i P(x_i) or \int x \rho(x) \, dx. You can see that this standard expression for the expectation value involves
no variables besides x itself. Now depending on the nature of the specific situation we are considering, it may be that functions like P(x) or ρ(x) can themselves be shown to be equal to some functions of other variables, and this is exactly where Bell's equation (2) comes from. Here, I'll give a derivation:
If x is the product of the two measurement results A and B with detector settings a and b, then according to what I said above the "standard form" for the expectation value should be \mu = \sum_{i=1}^N x_i P(x_i), and since we know that this is an expectation value for a certain pair of detector angles a and b, and that the two measurement results A and B are themselves always equal to +1 or -1, this can be rewritten as:
(+1)*P(x=+1|a,b) + (-1)*P(x=-1|a,b) = (+1)*[P(A=+1, B=+1|a,b) + P(A=-1, B=-1|a,b)] + (-1)*[P(A=+1, B=-1|a,b) + P(A=-1, B=+1|a,b)]
Then in that last expression, each term like P(A=+1, B=+1|a,b) can be rewritten as P(A=+1, B=+1, a, b)/P(a,b). So by
marginalization (and assuming for convenience that λ is discrete rather than continuous), we have:
P(A=+1, B=+1|a,b) = \sum_{i=1}^N \frac{P(A=+1, B=+1, a, b, \lambda_i )}{P(a,b)}
And P(A=+1, B=+1, a, b, λi) = P(A=+1, B=+1|a, b, λi)*P(a, b, λi) = P(A=+1, B=+1|a, b, λi)*P(λi | a, b)*P(a,b), so substituting into the above sum gives:
P(A=+1, B=+1|a,b) = \sum_{i=1}^N P(A=+1, B=+1 | a, b, \lambda_i )*P(\lambda_i | a, b)
And if we make the
physical assumption that P(λi | a, b) = P(λi) (the no-conspiracy assumption which says the probability of different values of hidden variables is independent of the detector settings), this reduces to
P(A=+1, B=+1|a,b) = \sum_{i=1}^N P(A=+1, B=+1 | a, b, \lambda_i )*P(\lambda_i )
Earlier I showed that the expectation value, written in its standard form, could be shown in this scenario to be equal to the expression
(+1)*[P(A=+1, B=+1|a,b) + P(A=-1, B=-1|a,b)] + (-1)*[P(A=+1, B=-1|a,b) + P(A=-1, B=+1|a,b)]
So, we can rewrite that as
(+1)*[ \sum_{i=1}^N P(A=+1, B=+1 | a, b, \lambda_i )*P(\lambda_i ) + \sum_{i=1}^N P(A=-1, B=-1 | a, b, \lambda_i )*P(\lambda_i )]+ (-1)*[ \sum_{i=1}^N P(A=+1, B=-1 | a, b, \lambda_i )*P(\lambda_i ) + \sum_{i=1}^N P(A=-1, B=+1 | a, b, \lambda_i )*P(\lambda_i )]
Or as a single sum:
\sum_{i=1}^N P(λi) * [(+1*+1)*P(A=+1, B=+1|a,b,λi) + (-1*-1)*P(A=-1, B=-1|a,b,λi) + (+1*-1)*P(A=+1, B=-1|a,b,λi) + (-1*+1)*P(A=-1, B=+1|a,b,λi)]
And naturally if the value of a along with the specific choice of λi completely determine the value of A, and likewise the value of b along with the specific choice of λi completely determines the value of B (another
physical assumption), then for any given i in the sum above, three of the conditional probabilities will be 0 and the other will be 1, so it's not hard to see (tell me if you want this step explained further) why the above can be reduced to:
\sum_{i=1}^N A(a,\lambda_i ) B(b, \lambda_i ) P(\lambda_i )
...which is just the discrete form of Bell's equation (2). So, hopefully you require no further proof that although Bell's equation (2) gives one form of the expectation value, it was not meant to contradict the idea that the expectation value can
also be written in the standard form:
(+1)*P(product of A and B is +1) + (-1)*P(product of A and B is -1)
...which given the knowledge that both A and B are always either +1 or -1, and A is the result for the detector with setting a while B is the result for the detector with setting b, can be written as:
E(a,b) = (+1*+1)*P(detector with setting a gets result +1, detector with setting b gets result +1) + (+1*-1)*P(detector with setting a gets result +1, detector with setting b gets result -1) + (-1*+1)*P(detector with setting a gets result -1, detector with setting b gets result +1) + (-1*-1)*P(detector with setting a gets result -1, detector with setting b gets result -1)
...which is the equation I have been bringing up over and over. Last time I brought it up, you responded in post #1275 with:
False! The above equation does not appear in Bell's work and is not the expectation value he is calculating in equation (2).
Hopefully the above derivation shows you why Bell's equation (2) is entirely consistent with the above "standard form" of the expectation value, given the physical assumptions he was making. If you still don't agree, please show me the specific step in my derivation that you think is incorrect.
billschnieder said:
Oh so now you are saying if given a population from which you can easiliy calculate relative frequencies, you will still not be able to use your favorite "limit frequentist" approach to obtain estimates of true probabilities because the process used to sample the population might not be fair. Wow! You have really outdone yourself. If the "limit frequentist" approach is this useless, how come you stick to it, if not just for argumentation purposes?
It's useful in theoretical proofs involving probabilities, such as the derivation of the conclusion that Bell's inequality should apply to the "limit frequentist" expectation values in any local realist universe. And for experimental data, as long as the sample size is large we can use empirical frequencies to
estimate a range for the limit frequentist probabilities with any desired degree of confidence, even though we can never be 100% confident the true limit frequentist probability lies in that range (but that's just science for you, you can never be 100% sure of any claim based on empirical evidence, even though you can be very very confident).