# Bell's theorem in nLab

I'm trying to follow this mathematical explanation of Bell's theorem.
The problem I find is with the assumption of a probability density for the hidden variable. That implies - and my question is: am I wrong? why? - that you can expect the same distribution of such a variable for any repetition of the experiment (measurement). Now that's inconsistent (imho) with the nature itself of the variable, that should represent an unknown law of physics, an expression whose values are clearly function of time, at least, and space. I mean, I could just imagine that distribution as a Dirac delta of a predetermined result, but function of the angle of measurement and different for each measurement (cause each one will happen in a different space-time...). I fail to see how a (unique) unitary integral of a density for the hidden variable is realistic.

chiro
Hey giulio_hep.

If the distribution is a function of some parameter then it too will remain constant.

In statistics what happens is that we look at probabilities of something that remains constant under variation - like a mean or a median or a proportion or something that has a numerical structure of some sort.

The distribution itself looks at the probability structure that the value itself will take a specific value or interval of values depending on the measure. You can have things like discrete distributions that are finite or not (Poisson is an example of something that is not finite), continuous (like things distributed over some subset of the real numbers) or even stochastic (like what is done in stochastic calculus).

We assume the probabilities don't change and are static - usually as what you would have if you had complete or near complete information and that is why the distribution stays the same.

We have to make assumptions about the constraints of the probabilities since there are many ways of defining them, but the assumption of the underlying distribution staying the same is because it relates to a situation of complete information which does not change.

We can measure it through data, create assumptions or mix both forms of information to get that underlying constant distribution that relates to some parameter. The distribution represents all possible values the parameter can take under that distribution.

In terms of understanding how much information density you have to accurately gauge the probabilities in the first place is another question entirely and that involves understanding the nature of information and association to what you are measuring. Correlated data points don't increase information density like independent ones and if things become more collinear and correlated then information density grows slower and can even stop if you aren't getting anything new.

What statisticians do is look for models with reasonable assumptions and then combine that information will actual sample data and use both to try and make estimates and inferences below some uncertainty threshold in the context of information used. Scientists and engineers do the same thing but they do it in ways more suitable to their profession and professional goals.

My question is strictly related to Bell's theorem, I'm discussing the mathematical proof of the theorem and in particular the way to formalize local realism.
Correlated data points don't increase information density like independent ones.
My assertion is that this disproves the theorem, because the constraint that the density integrates to 1 can be relaxed: in other words I don't see how local hidden variables would be ruled out if their underlying distribution doesn't stay the same over time... From a Math standpoint, it looks to me like Bell's inequality doesn't account for time dependences of hidden variables.

chiro
What distribution and parameters are involved? Are you assuming that the distribution you use reflects probabilities across all information or are you assuming something different?

Basically in statistics it's assumed that the distribution assumed represents complete information and in combination with convergence theorems (like strong law) we assume that as data becomes available the estimates of different statistics, expectations and probabilities will converge to what is considered the invariant parameters - which can be functions of expectations or even the probabilities themselves.

I'm not a physicist or scientist but I took enough statistics to know the statistical stuff.

Ehm... one should read the link from nLab and try to follow the proof of the theorem. I'll try to do a last attempt to explain my objection from a pure statistical point of view, but I can't guarantee that some relevant details won't be missing. Basically they assume that the sum of the probabilities to find the spin (just think about it as either 1 or -1) along 3 axes is 1, because, if you do *all* the corresponding measurements, you'll find that. Then they compute the correlation of each *one* of those single values with another associated (entangled) spin. Thus follows (skipping many details here for brevity, read the linked page from nLab) a contradiction between two theories... In very simple words, I think there is a mathematical error because you can correlate events from a real sampling but you can't extend the data with other fictitious probabilities (the ones along a direction that they can't actually measure, because the first measurement destroys the entanglement): it looks to me like a sort of overfitting the model and getting a wrong conclusion. Thanks anyway, but now I suspect this topic is more appropriate for the physics forum.

chiro
Basically what I gather from reading the proof is that they are partitioning the information given from the observables from the information located in the "hidden variables" and if information exists in the hidden variables then it means that it will be shown in the correlation component since if information is not present in local observation then you will have a weak correlation with respect to your hidden variable since a component of the information in the system will exist independently within that variable.

I actually think you might be better off using a Bayesian approach or an information theoretic approach with either Shannon Entropy or Fisher Information and look at it from the point of view of information.

In conditional entropy you have H(X and Y) = H(X) + H(Y) - H(X|Y) where H(X|Y) represents the conditional entropy of X given Y. Note that in a state of perfect correlation H(X|Y) = H(X) = H(Y) but in a complete state of independence you have H(X|Y) = H(X) and H(Y|X) = H(Y).

Thus you can setup a bound for this to be

H(X) < H(X and Y) < H(X) + H(Y) or
H(Y) < H(X and Y) < H(X) + H(Y) which leads to
min(H(X),H(Y)) < H(X and Y) < H(X) + H(Y)

Correlation will mean that the entropy will be towards the lower end while independence will make it go higher.

You can extend this to any number of variables by using first principles - including the three used in the example.

I use entropies in place of probabilities because they are general and provided you only have a finite number of states - they are universally defined regardless of distribution and are a universal way of describing the density of information regardless of alphabet or distributions.

I would recommend setting up a three variable entropy bound by looking at related H(X and Y and Z), H(X), H(Y), H(Z), H(X|Y,Z), H(Y|X,Z) and H(Z|X,Y).

You don't have to assume any distribution and you should get bounds that either confirm or deny the premise of Bell's inequality. It could be a good exercise.

giulio_hep
Let me cross reference a nice answer from an other forum that states the same things I wrote here in other words: Bell's realism is fake because it is not contextual, i.e. the hidden variable is not allowed to have a state.

chiro