San K said:
Question: why do hidden variables need to imply a linear variation?
The short answer is that, in the presence of the hypotheses of counterfactual definiteness and locality, the linearity of the laws of probability leads to the linearity of the Bell inequality.
But let me spell out the logic in greater detail. The example I'll discuss comes from http://quantumtantra.com/bell2.html. We start with the experimental prediction of quantum mechanics that when you send two entangled photons into polarizers that are oriented at the same angle, the photons do identical things: they either both go through or they both don't. If you believe in local hidden variables, then you can conclude from this that right when the two photons were created, when they were presumably in sthe same place (that's the locality assumption), they decided in advance what polarizer orientations they should go through and which ones they shouldn't go through. So they basically have a list of "good angles" and "bad angles". If, for instance, one of the photons encounters a 15 degree-oriented polarizer, it will check whether 15 degrees is good or bad, and if it's good then it will go through. If x is the angle a polarizer is oriented, let's say P(x)=1 if x is a good angle, and P(x)=0 if x is a bad angle.
Now Bell's theorem is concerned with the probability that the two photons behave differently if the polarizers are turned to different orientations. But since, as we said, the photons are just deciding to go through or not go through based on a previously agreed upon decision about what angles are good and bad, all we're talking is the probability that P(θ1)≠P(θ2), where θ1 is the angle of the first polarizer and θ2 is the angle of the second polarizer. In the Herbert proof I linked to, the specific case we're talking about is the probability that P(-30)≠P(30), i.e. the probability that if you turn one polarizer at -30 degrees and the other one at 30 degrees, you get a mismatch.
Now under what conditions is the statement P(-30)≠P(30) true? Well, it can only be true if either P(-30)≠P(0) OR P(0)≠P(30) (because if both of these were false we would have P(-30)=P(0)=P(30)). The word "OR" is the crucial part, because one of the basic rules of probability is that the probability of A OR B is less than or equal to the probability of A plus the probability of B. So the probability that P(-30)≠P(30) is less than or equal to the probability that P(-30)≠P(0) plus the probability that P(0)≠P(30) - and Bingo, we've derived a Bell inequality! And note the crucial role counterfactual definiteness played in the proof: we are assuming it makes sense to talk about P(0), even though we only measured P(-30) and P(30). In other words, the assumption is that measurements that we did not make still have well defined answers as to what would have happened if you made them.
Does that make sense? The form of Bell's inequality, A+B≤C, fundamentally comes from the fact that probabilities are (sub)additive.