I'm not sure why you think there's any assumption about the product of two different measurement axes equaling to one. There's not. And it's very strange for you to say you're a realist when you used an argument talking about how measurements that happen to not happen aren't well defined.
Maybe it will help if I break down Bell's steps to reach 14(b) more than his paper does.
First, we assume there is some hidden variable ##\lambda## that determines the measurement outcomes for both Alice and Bob no matter what direction ##v## they each measure. By experiment, we know the measurement result must always be +1 or -1:
$$\forall v: A_\lambda(v) \in \{-1, +1\}$$ $$\forall v: B_\lambda(v) \in \{-1, +1\}$$
Furthermore, by experiment, we know that when Alice and Bob measure in the same direction then the measurement outcomes must be opposite:
$$\forall v: A_\lambda(v) = -B_\lambda(v)$$
If Alice measures along ##a## and Bob measures along ##b##, and then they multiply their results together, they get the parity measurement result ##A_\lambda(a) \cdot B_\lambda(b)## which will also either be -1 or +1. We assume the observed probability distribution ##P## of this parity measurement result is determined by some hidden, but consistent across experiments, probability distribution ##p## of ##\lambda##:
$$\forall a, b: P(a, b) = \sum_\lambda p(\lambda) \cdot A_\lambda(a) \cdot B_\lambda(b)$$
**NO MORE ASSUMPTIONS ARE INTRODUCED BEYOND THIS POINT. JUST THE MATH OF SUMS.**
Using the fact that ##A## is opposite to ##B##, we can rewrite the above equation in terms of just ##A##:
$$\forall a, b: P(a, b) = -\sum_\lambda p(\lambda) \cdot A_\lambda(a) \cdot A_\lambda(b)$$
For compactness, I'm going to shorten ##A_\lambda(x)## into just ##x_\lambda## for various symbols ##x##. The compact version of the above equation is:
$$\forall a, b: P(a, b) = -\sum_\lambda p(\lambda) a_\lambda b_\lambda$$
Now consider what happens when we compute the difference in predicted probabilities between two possible observations:
$$P(a, b) - P(a, c)$$
We expand the definition inline:
$$\forall a, b, c: P(a, b) - P(a, c) = \left(-\sum_\lambda p(\lambda) a_\lambda b_\lambda\right) - \left(-\sum_\lambda p(\lambda) a_\lambda c_\lambda\right)$$
Because the two sums are over the same set, and addition is associative and commutative, we can merge the sums:
$$\forall a, b, c: P(a, b) - P(a, c) = -\sum_\lambda \big(p(\lambda) a_\lambda b_\lambda - p(\lambda) a_\lambda c_\lambda\big)$$
We factor out ##p(\lambda) a_\lambda## and flip the subtraction to cancel out the leading negation:
$$\forall a, b, c: P(a, b) - P(a, c) = \sum_\lambda p(\lambda) a_\lambda \left(c_\lambda - b_\lambda\right)$$
Now, because ##b_\lambda## is either -1 or +1, we can multiply by ##b_\lambda^2=1## without changing the computed result:
$$\forall a, b, c: P(a, b) - P(a, c) = \sum_\lambda p(\lambda) a_\lambda b_\lambda^2 \left(c_\lambda - b_\lambda\right)$$
We keep one ##b_\lambda## outside, and distribute the other one over the subtraction:
$$\forall a, b, c: P(a, b) - P(a, c) = \sum_\lambda p(\lambda) a_\lambda b_\lambda \left(b_\lambda c_\lambda - b_\lambda b_\lambda\right)$$
Again, we know that ##b_\lambda b_\lambda = 1##, so we can simplify:
$$\forall a, b, c: P(a, b) - P(a, c) = \sum_\lambda p(\lambda) a_\lambda b_\lambda \left(b_\lambda c_\lambda - 1\right)$$
This last equation is the one you were saying we couldn't reach without assuming that ##A_\lambda(x) \cdot A_\lambda(y) = 1## for ##x \neq y##. But notice that I never made that assumption. I only ever assumed that ##A_\lambda(x)^2 = 1##.
It's true that, in practice, you will experimentally measure the difference in predicted probabilities by doing many runs of an experiment measuring each part. But that doesn't change the fact that the math should still give the right answer. If the system was really like a probability distribution over a hidden variable, we'd be able to sample the difference in probabilities by sampling each probability and then subtracting.