Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

How to prove Bayes' rule for probability measures?

  1. Aug 16, 2011 #1
    Consider a probability space [itex](\Theta, \Sigma_\Theta, P_\Theta)[/itex], where [itex]P_\Theta[/itex] is a probability measure on the sigma-algebra [itex]\Sigma_\Theta[/itex].

    Each element [itex]x \in \Theta[/itex] maps onto another probability measure [itex]P_{\Omega | x}[/itex], on a sigma-algebra [itex]\Sigma_\Omega[/itex] on another space [itex]\Omega[/itex].

    In this situation, one should (as far as I can see) be able to write write a measure-theoretic generalization of Bayes' rule

    [tex]{P_{\Theta |y}}(A) = \int\limits_{x \in A} {\frac{{d{P_{\Omega |x}}}}{{d{P_\Omega }}}(y)d{P_\Theta }} [/tex]

    for any [itex]A \subseteq \Theta [/itex], given an observation [itex]y \in \Omega[/itex] where

    [tex]{P_\Omega } = \int\limits_{x \in \Theta } {{P_{\Omega |x}}d{P_\Theta }} [/tex]

    and [itex]{d{P_{\Omega |x}}/d{P_\Omega }}[/itex] is the Radon–Nikodym derivative of [itex]P_{\Omega |x}[/itex] with respect to [itex]P_{\Omega}[/itex].

    The problem is that I cannot see how to prove it (I;m sure the proof is fairly simple). Anyone wants to help?
  2. jcsd
  3. Aug 16, 2011 #2
  4. Aug 16, 2011 #3
    As far as I can see, it's not quite what I'm looking for. What I'm trying to do above is to reformulate Bayes' rule for probability densities, usually expressed

    [tex]p(x|y) = \frac{{p(y|x)}}{{p(y)}}p(x)[/tex]

    which follows trivially from the definition of a joint probability density [itex]p(x,y) = p(y|x) p(x)[/itex]. But for probabilty measures, it gets slightly more tricky...
  5. Aug 17, 2011 #4

    The rule for probability densities follows from Bayes' Rule and the Law of Total Probabilities:

    [itex] f_X(x|Y=y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} = \frac{f_Y(y|X=x)\,f_X(x)}{f_Y(y)} = \frac{f_Y(y|X=x)\,f_X(x)}{\int_{-\infty}^{\infty} f_Y(y|X=\xi )\,f_X(\xi )\,d\xi }\! [/itex].

    Do you want to reformulate this?

    EDIT: Any reformulation will need to take into account that the application of Bayes' Rule must be expressed as a posterior probability which is defined in terms of a probability space and the Law of Total Probabilities.
    Last edited: Aug 17, 2011
  6. Aug 18, 2011 #5
    What is this mapping? How exactly is [itex]P_{\Omega | x}[/itex] defined?
  7. Aug 18, 2011 #6
    I only meant that for every [itex]x \in \Theta[/itex] there is one probability measure [itex]P_{\Omega | x}[/itex] on [itex]\Sigma_\Omega[/itex] over the space [itex]\Omega[/itex]. The probaility measures [itex]P_{\Omega | x}[/itex] are thus consitional on [itex]x[/itex].

    This allows us to define a joint probability measure on the (Cartesian) product space [itex](\Theta \times \Omega ,{\Sigma _\Theta } \times {\Sigma _\Omega })[/itex]
    [tex]{P_{\Theta \times \Omega }}(C) \equiv \int\limits_{x\in A} {{P_{\Omega |x}}({B_x})d{P_\Theta }} [/tex] for any [itex]C \in {\Sigma _\Theta } \times {\Sigma _\Omega }[/itex], where [itex]A \in {\Sigma _\Theta }[/itex] and [itex]{B_x} \in {\Sigma _\Omega }[/itex] are defined as [itex]A = \{ x:(x,y) \in C\} [/itex] and [itex]{B_x} = \{ y:(x,y) \in C\} [/itex].

    If one could prove that [tex]{P_{\Omega |x}} = \frac{{d{P_{\Theta \times \Omega }}}}{{d{P_\Theta }}}(x)[/tex] where [itex]d{P_{\Theta \times \Omega }}/d{P_\Theta }[/itex] is the Radon–Nikodym derivative of [itex]{P_{\Theta \times \Omega }}[/itex] with respect to [itex]{P_\Theta }[/itex], then the theorem above (that I want to prove), i.e.
    [tex]{P_{\Theta |y}}(A) = \int\limits_{x\in A} {\frac{{d{P_{\Omega |x}}}}{{d{P_\Omega }}}(y)d{P_\Theta }} [/tex] would follow by symmetry, but I'm not sure of how to do that...
    Last edited: Aug 18, 2011
  8. Aug 18, 2011 #7
    Perhaps I've misunderstood you, but... [itex]f[/itex] would be a Radon-Nikodym derivative if
    P_{\Theta\times\Omega}(A) = \int_A f \, dP_\Theta
    [/tex] for all [itex]A \in \Sigma_\Theta\times\Sigma_\Omega[/itex] (product sigma-field). However this doesn't make any sense, because you have different measurable spaces on the left and right hand sides. Actually, the right hand side does not mean anything.
  9. Aug 18, 2011 #8
    In the equation you refer to, the integration is not over a subset [itex]A \in \Sigma_\Theta\times\Sigma_\Omega[/itex] but over a subset of [itex]\Theta[/itex].
  10. Aug 18, 2011 #9
    I was just quoting your definition of [itex]P_{\Theta\times\Omega}[/itex]. Later in you question you refer to [itex]\frac{dP_{\Theta\times\Omega}}{dP_\Omega}[/itex] as a Radon-Nikodym derivative, but this, as I tried to point out in my previous reply, doesn't make sense.
  11. Aug 18, 2011 #10
    Yeah, I meant that [itex]{dP_{\Theta\times\Omega}}/{dP_\Omega}[/itex] would not be a function [itex]\Theta\rightarrow[0,\infty)[/itex], but a function [itex]\Theta\rightarrow\Gamma [/itex], where [itex]\Gamma[/itex] is a set of probaility measures on [itex]\Sigma_\Omega[/itex]. It might be a bit of a strech to call it a Radon-Nikodym derivative...
  12. Aug 18, 2011 #11
    Yes, reformulating this using probaility measures instead of probability densities would allow me to prove what I want.
  13. Aug 19, 2011 #12
    Hmm.. you denote by [itex]P_{\Omega | x}[/itex] just some arbitarily chosen probability measure on [itex](\Omega, \Sigma_\Omega)[/itex], i.e. you have a family of measures parametrized by [itex]x \in \Theta[/itex]. Then you define
    To answer the question, whether
    [tex]{P_{\Omega |x}} = \frac{{d{P_{\Theta \times \Omega }}}}{{d{P_\Theta }}}(x)[/tex], where the RHS is not a Radon-Nikodym derivative, you need to define the RHS, i.e. it cannot be just some function [itex]\Theta \to \Gamma[/itex]. Also you cant define it using the same equation you used for definition of [itex]P_{\Theta\times\Omega}[/itex]. A theorem would show that two separately defined things are equal, but I see only one defined object.

    I sort of understand what you are trying to do, but not quite. Can you give a simple example of calculations with concrete values/sets of what the theorem would look like? E.g. in discrete case?
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook