wle said:
The derivations I've seen use Bayes' theorem here, which implies $$P(A, B| a, b, \lambda) = P(A | a, b, \lambda) P(B | A, a, b, \lambda) \,.$$
I think we're talking about slightly different things. The factoring that you are talking about is always true, for any ##\lambda##. Reichenbach's Principle claims that there always must be some choice of ##\lambda## such that the factoring looks like this:
$$P(A, B| a, b, \lambda) = P(A | a, b, \lambda) P(B | a, b, \lambda)$$
(no ##A## in the second term).
The distinction is illustrated by my example with twins playing basketball. I have some way to randomly pick a pair of identical twins out of the population. Let ##P(A)## be the probability that the first twin plays basketball. Let ##P(B)## be the probability that the second twin plays basketball. Let ##P(A,B)## be the probability that they both play basketball. Most likely, twins are alike in their basketball playing abilities, or at least more alike than any two random people. So ##P(A,B) \neq P(A) P(B)##.
Without understanding anything at all about basketball playing ability, you can, using pure logic, write:
##P(A, B) = P(A) P(B | A)##
That's basically true by definition of conditional probability. So that kind of factoring isn't actually telling us anything about the root causes of basketball ability. On the other hand, let's suppose that we identify a bunch of factors that might come into play. Let ##\lambda_1## be genetics, let ##\lambda_2## be the schools they attend, let ##\lambda_3## be some characterization of their homelife (do they have siblings, do they have two parents, are the parents rich, etc.). Then some collection of such parameters would be a causal explanation of the correlation if:
##P(A, B | \overrightarrow{\lambda}) = P(A | \overrightarrow{\lambda}) P(B | \overrightarrow{\lambda})##
Reichenbach's principle, as formalized as factorability of probabilities, says that if you knew enough about the causes of basketball-playing ability, then it should no longer be necessary to know whether twin ##A## plays basketball to accurately predict whether twin ##B## plays basketball.
(Actually, I realize this example doesn't quite work, because the mere fact that one twin plays basketball might influence the other twin. We can account for this by saying that at some point, before the twins ever play basketball for the first time, we separate the twins and separately try them out on different sports, and decide independently which sport they like the best.)
It's only AFTER you have factored probability distributions by coming up with a complete set of causal factors is it the case that you can apply locality considerations. If you haven't already factored it, then imposing locality is a mistake. We can show this with the basketball players.
We take two twins and take them to distant locations and measure their basketball playing ability. Then using pure logic, we write:
##P(A,B) = P(A) P(B | A)##
If we then say: the basketball tests are far apart, so one twin playing basketball cannot influence the other twin's abilities. So we assume that
##P(B | A) = P(B)##
But that assumption is FALSE. Even though ##A## and ##B## are far apart, that doesn't mean that they are uncorrelated, and therefore it doesn't mean that ##P(B | A) = P(B)##