# Joint probability from conditional probability?

by Demystifier
Tags: conditional, joint, probability
 Sci Advisor P: 4,630 Hi, I am a quantum physicist who needs a practical help from mathematicians. The physical problem that I have can be reduced to the following mathematical problem: Assume that we have two correlated variables a and b. Assume that we know all conditional probabilities P(a|b), P(b|a) for all possible values of the variables a and b. What I want to know are all joint probabilities P(a,b). However, a priori they are not given. I want to ask the following: What is the best I can conclude about P(a,b) from knowledge of P(a|b), P(b|a)? Are there special cases (except the trivial case in which a and b are independent) in which P(a,b) can be determined uniquely? Any further suggestions? Thank you in advance!
 Sci Advisor HW Helper P: 2,481 Denoting random variables with capital letters and omitting the Prob{.} part: A|B = AB/B B|A = AB/A where AB is the joint probability. Since you know A|B and B|A, you have 2 equations in 3 unknowns (AB, A, B); you need a 3rd equation; for example: A = f(B). Or, without the shorthand notation, Prob{A} = f(Prob{B}). See also: http://en.wikipedia.org/wiki/Copula_(statistics)
 P: 71 Well, since $$p(a,b) = p(b|a) p(a) = p(a|b) p(b)$$ where $$p(a) = \int p(a,b)db$$ $$p(b) = \int p(a,b)da$$ we have that $$\frac{p(a)}{p(b)} = \frac{p(a|b)}{p(b|a)}$$ $p(a) / p(b)$ is obviously the product of $p(a)$ and $1/p(b)$, so $p(a|b) / p(b|a)$ must also be possible to rewrite as such a product. Once that is done, it is trivial to identify $p(a)$ and $1/p(b)$ out of that expression. If $p(a|b) / p(b|a)$ is not separable in one a-dependent and one b-dependent factor, then there is a inconsistency between $p(a|b)$ and $p(b|a)$, and they can not be conditional probability distributions from the same joint distribution. Good luck, -Emanuel
 P: 270 Joint probability from conditional probability? I don't think that will always work, winterfors. For one thing, what happens if one of the denominators is 0? Consider the example were a is a Gaussian variable with mean zero and unit variance, and b is exactly equal to a. The marginal density of b is, naturally, also a unit Gaussian, and the joint density is degenerate (it's like a scalar Gaussian on the diagonal in the (a,b)-plane). The conditional density is then $P(a|b) = 1_b(a)$, and vice-versa. The ratio of the conditional densities, then, is 1 when a=b, and undefined otherwise. This is enough for us to see that a and b are actually the same variable, and so of course have the same marginal, but it doesn't give us any idea what said marginal is. I.e., the whole thing would work out exactly the same if a were given a different marginal distribution.
P: 71
 Quote by quadraphonics I don't think that will always work
You're absolutely right.

There are situations where $p(a)/p(b)$ is undefined because both $p(a|b)$ and $p(b|a)$ are zero. In that case, there is no way of deducing a joint distribution without additional information. The expression of $p(a|b)/p(b|a)$ may also be just too complicated to easily separated into an $a$-dependent and a $b$-dependent factor.

An even more common problem is that $p(a|b)$ and $p(b|a)$ may be derived from different sources, and it may in such cases be incorrect to view them as conditionals of the same joint distribution $p(a,b)$.

-Emanuel
 P: 270 Isn't it also a problem if just one of the conditional distributions assigns zero probability to some region where the corresponding marginal has nonzero probability? I.e., you're trying to infer something about the marginal from a condition that eliminates all information about it. And, after all, divide by zero is undefined... But I think the approach should be fine, in principle, if you add the restriction that all of the distributions in question are nonzero on the support of the pertinent random variables (which is probably the case that most people are interested in). It may still be impractical to actually work out the expressions for the distributions (and there may be no closed form expression, as they will typically require normalization), but it should all be well-defined... For exponential family distributions, my intuition is that this should always work out nicely, due to the exponents playing nicely with the ratio (i.e., it turns into a linear separation of functions of a and b, instead of a ratio separation). Also note that exponential distributions tend to fulfill the nonzero requirement up front, except for a few boundary cases (which should be degenerate anyway, if my intuition holds...)
 HW Helper P: 1,377 "An even more common problem is that $$p(a \mid b)$$ and $$p(b \mid a)$$ may be derived from different sources, and it may in such cases be incorrect to view them as conditionals of the same joint distribution." I'm not sure what you mean by this.
P: 71
 Quote by quadraphonics Isn't it also a problem if just one of the conditional distributions assigns zero probability to some region where the corresponding marginal has nonzero probability?
If just one of $$p(a)$$ and $$p(b)$$ is zero, one can just invert both sides of

$$\frac{p(a)}{p(b)} = \frac{p(a|b)}{p(b|a)}$$

and get a well-defined equation.
P: 71
 Quote by statdad "An even more common problem is that $$p(a \mid b)$$ and $$p(b \mid a)$$ may be derived from different sources, and it may in such cases be incorrect to view them as conditionals of the same joint distribution." I'm not sure what you mean by this.
Uhm, it gets a bit technical in terms of what information sources have been used to construct each of the two conditional probability distributions. The short answer is that if they come from completely different sources, one has to assume a marginal distribution for each of them separately, and these can not be derived from the other conditional distribution.

One could do like this: Let's call our two conditional distributions $$q(b | a)$$ and $$r(a | b)$$.

We can construct two separate joint distributions by using two non-informative priors $$q(a)$$ and $$r(b)$$ :

$$q(a,b) = q(b | a)q(a)$$
$$r(a,b) = r(a | b)r(b)$$

These can then be combined into a third, joint probability distribution

$$p(a,b) = K \frac{q(a,b) r(a,b)}{\mu(a)\mu(b)}$$ ,

where $\mu(a)$ and $\mu(b)$ are homogeneous probability densities and $$K$$ is a normalization constant

$$\frac{1}{K} = \int \int \frac{q(a,b) r(a,b)}{\mu(a)\mu(b)} da db$$

I'm not sure this makes things any clearer for you, if you're interested this kind of combining of probability distributions you can have a look in the book "Inverse Problem Theory and Model Parameter Estimation" by Albert Tarantola, around pages 13 and 32.
 HW Helper P: 1,377 Thank you winterfors - I understand your calculations (although I've never seen $$\mu$$ to represent a density rather than a distribution measure - merely notation), and actually am aware of their basis. I am guilty of one of two things: * Either taking away an incomplete understanding of the o.p.'s question, or * Noting that you were referring to a more general situation than the one in the current discussion