Joint probability from conditional probability?

  1. Demystifier

    Demystifier 5,368
    Science Advisor

    Hi,
    I am a quantum physicist who needs a practical help from mathematicians. :smile:

    The physical problem that I have can be reduced to the following mathematical problem:
    Assume that we have two correlated variables a and b. Assume that we know all conditional probabilities
    P(a|b), P(b|a)
    for all possible values of the variables a and b.
    What I want to know are all joint probabilities P(a,b). However, a priori they are not given. I want to ask the following:
    What is the best I can conclude about P(a,b) from knowledge of P(a|b), P(b|a)?
    Are there special cases (except the trivial case in which a and b are independent) in which P(a,b) can be determined uniquely?
    Any further suggestions?

    Thank you in advance! :smile:
     
  2. jcsd
  3. EnumaElish

    EnumaElish 2,483
    Science Advisor
    Homework Helper

    Denoting random variables with capital letters and omitting the Prob{.} part:

    A|B = AB/B
    B|A = AB/A

    where AB is the joint probability. Since you know A|B and B|A, you have 2 equations in 3 unknowns (AB, A, B); you need a 3rd equation; for example: A = f(B). Or, without the shorthand notation, Prob{A} = f(Prob{B}).

    See also: http://en.wikipedia.org/wiki/Copula_(statistics)
     
  4. Well, since

    [tex]p(a,b) = p(b|a) p(a) = p(a|b) p(b)[/tex]
    where
    [tex]p(a) = \int p(a,b)db[/tex]
    [tex]p(b) = \int p(a,b)da[/tex]

    we have that

    [tex]\frac{p(a)}{p(b)} = \frac{p(a|b)}{p(b|a)}[/tex]

    [itex]p(a) / p(b)[/itex] is obviously the product of [itex]p(a)[/itex] and [itex]1/p(b)[/itex], so [itex]p(a|b) / p(b|a)[/itex] must also be possible to rewrite as such a product. Once that is done, it is trivial to identify [itex]p(a)[/itex] and [itex]1/p(b)[/itex] out of that expression.

    If [itex]p(a|b) / p(b|a)[/itex] is not separable in one a-dependent and one b-dependent factor, then there is a inconsistency between [itex]p(a|b)[/itex] and [itex]p(b|a)[/itex], and they can not be conditional probability distributions from the same joint distribution.

    Good luck,

    -Emanuel
     
  5. I don't think that will always work, winterfors. For one thing, what happens if one of the denominators is 0?

    Consider the example were a is a Gaussian variable with mean zero and unit variance, and b is exactly equal to a. The marginal density of b is, naturally, also a unit Gaussian, and the joint density is degenerate (it's like a scalar Gaussian on the diagonal in the (a,b)-plane). The conditional density is then [itex]P(a|b) = 1_b(a)[/itex], and vice-versa. The ratio of the conditional densities, then, is 1 when a=b, and undefined otherwise. This is enough for us to see that a and b are actually the same variable, and so of course have the same marginal, but it doesn't give us any idea what said marginal is. I.e., the whole thing would work out exactly the same if a were given a different marginal distribution.
     
  6. You're absolutely right.

    There are situations where [itex]p(a)/p(b)[/itex] is undefined because both [itex]p(a|b)[/itex] and [itex]p(b|a)[/itex] are zero. In that case, there is no way of deducing a joint distribution without additional information. The expression of [itex]p(a|b)/p(b|a)[/itex] may also be just too complicated to easily separated into an [itex]a[/itex]-dependent and a [itex]b[/itex]-dependent factor.

    An even more common problem is that [itex]p(a|b)[/itex] and [itex]p(b|a)[/itex] may be derived from different sources, and it may in such cases be incorrect to view them as conditionals of the same joint distribution [itex]p(a,b)[/itex].

    -Emanuel
     
  7. Isn't it also a problem if just one of the conditional distributions assigns zero probability to some region where the corresponding marginal has nonzero probability? I.e., you're trying to infer something about the marginal from a condition that eliminates all information about it. And, after all, divide by zero is undefined...

    But I think the approach should be fine, in principle, if you add the restriction that all of the distributions in question are nonzero on the support of the pertinent random variables (which is probably the case that most people are interested in). It may still be impractical to actually work out the expressions for the distributions (and there may be no closed form expression, as they will typically require normalization), but it should all be well-defined... For exponential family distributions, my intuition is that this should always work out nicely, due to the exponents playing nicely with the ratio (i.e., it turns into a linear separation of functions of a and b, instead of a ratio separation). Also note that exponential distributions tend to fulfill the nonzero requirement up front, except for a few boundary cases (which should be degenerate anyway, if my intuition holds...)
     
  8. statdad

    statdad 1,478
    Homework Helper

    "An even more common problem is that [tex] p(a \mid b) [/tex] and [tex] p(b \mid a) [/tex] may be derived from different sources, and it may in such cases be incorrect to view them as conditionals of the same joint distribution."

    I'm not sure what you mean by this.
     
    Last edited: Oct 9, 2008
  9. If just one of [tex]p(a)[/tex] and [tex]p(b)[/tex] is zero, one can just invert both sides of

    [tex]\frac{p(a)}{p(b)} = \frac{p(a|b)}{p(b|a)}[/tex]

    and get a well-defined equation.
     
  10. Uhm, it gets a bit technical in terms of what information sources have been used to construct each of the two conditional probability distributions. The short answer is that if they come from completely different sources, one has to assume a marginal distribution for each of them separately, and these can not be derived from the other conditional distribution.

    One could do like this: Let's call our two conditional distributions [tex] q(b | a) [/tex] and [tex] r(a | b) [/tex].

    We can construct two separate joint distributions by using two non-informative priors [tex] q(a) [/tex] and [tex] r(b) [/tex] :

    [tex] q(a,b) = q(b | a)q(a) [/tex]
    [tex] r(a,b) = r(a | b)r(b) [/tex]

    These can then be combined into a third, joint probability distribution

    [tex] p(a,b) = K \frac{q(a,b) r(a,b)}{\mu(a)\mu(b)} [/tex] ,

    where [itex] \mu(a)[/itex] and [itex] \mu(b)[/itex] are homogeneous probability densities and [tex] K[/tex] is a normalization constant

    [tex] \frac{1}{K} = \int \int \frac{q(a,b) r(a,b)}{\mu(a)\mu(b)} da db [/tex]

    I'm not sure this makes things any clearer for you, if you're interested this kind of combining of probability distributions you can have a look in the book "Inverse Problem Theory and Model Parameter Estimation" by Albert Tarantola, around pages 13 and 32.
    It's available for download at

    http://www.ipgp.jussieu.fr/~tarantola/Files/Professional/Books/index.html

    Cheers,

    -Emanuel
     
  11. For a particular a or b, sure, but we need the expression to hold for all a and b in the support of the marginals in order for the approach to work, don't we? Perhaps it would still work to do the factorization in a piecewise manner and then stitch the results back together? This would work as long as there is no region where both conditionals are zero (but where the marginals are not... which there can't be, if the conditionals come from the same joint distribution). However, I'm not sure this is possible, since you wouldn't know how to normalize each of the pieces. But perhaps it can all be worked out... I will think on it a bit more...

    Regardless of the prospects for a piecewise solution, a sufficient condition is that at least one of the conditionals never equals zero in regions where the marginal is positive. Then, you put that conditional in the denominator and proceed. If both conditionals have (disjoint) zero regions, then neither choice of denominator works for the entire support of the marginals.
     
  12. statdad

    statdad 1,478
    Homework Helper

    Thank you winterfors - I understand your calculations (although I've never seen [tex] \mu [/tex] to represent a density rather than a distribution measure - merely notation), and actually am aware of their basis. I am guilty of one of two things:
    * Either taking away an incomplete understanding of the o.p.'s question, or
    * Noting that you were referring to a more general situation than the one in the current discussion
     
Know someone interested in this topic? Share this thead via email, Google+, Twitter, or Facebook

Have something to add?