How to prove Bayes' rule for probability measures?

  • Thread starter winterfors
  • Start date
  • Tags
    Probability
In summary: A \in \Sigma_\Theta\times\Sigma_\Omega (product sigma-field). However this doesn't make any sense, because you have different measurable spaces on the left and right hand sides. Actually, the right hand side means that P_{\Theta\times\Omega}(A) is the measure of the probability that the event B takes place in the subspace A\times\Omega under the distribution given by P_{\Omega |x}(B).
  • #1
winterfors
71
0
Consider a probability space [itex](\Theta, \Sigma_\Theta, P_\Theta)[/itex], where [itex]P_\Theta[/itex] is a probability measure on the sigma-algebra [itex]\Sigma_\Theta[/itex].

Each element [itex]x \in \Theta[/itex] maps onto another probability measure [itex]P_{\Omega | x}[/itex], on a sigma-algebra [itex]\Sigma_\Omega[/itex] on another space [itex]\Omega[/itex].

In this situation, one should (as far as I can see) be able to write write a measure-theoretic generalization of Bayes' rule

[tex]{P_{\Theta |y}}(A) = \int\limits_{x \in A} {\frac{{d{P_{\Omega |x}}}}{{d{P_\Omega }}}(y)d{P_\Theta }} [/tex]

for any [itex]A \subseteq \Theta [/itex], given an observation [itex]y \in \Omega[/itex] where

[tex]{P_\Omega } = \int\limits_{x \in \Theta } {{P_{\Omega |x}}d{P_\Theta }} [/tex]

and [itex]{d{P_{\Omega |x}}/d{P_\Omega }}[/itex] is the Radon–Nikodym derivative of [itex]P_{\Omega |x}[/itex] with respect to [itex]P_{\Omega}[/itex].


The problem is that I cannot see how to prove it (I;m sure the proof is fairly simple). Anyone wants to help?
 
Physics news on Phys.org
  • #3
SW VandeCarr said:
This proof may or may not be what you are looking for. It's based on a filtration: [itex] F_1, F_2, ...,F_{N-1}, F_N [/itex] as an increasing sequence of sigma algebras.

http://01law.wordpress.com/2011/04/09/bayes-rule-and-forward-measure/

As far as I can see, it's not quite what I'm looking for. What I'm trying to do above is to reformulate Bayes' rule for probability densities, usually expressed

[tex]p(x|y) = \frac{{p(y|x)}}{{p(y)}}p(x)[/tex]

which follows trivially from the definition of a joint probability density [itex]p(x,y) = p(y|x) p(x)[/itex]. But for probabilty measures, it gets slightly more tricky...
 
  • #4
winterfors said:
As far as I can see, it's not quite what I'm looking for. What I'm trying to do above is to reformulate Bayes' rule for probability densities, usually expressed

[tex]p(x|y) = \frac{{p(y|x)}}{{p(y)}}p(x)[/tex]

which follows trivially from the definition of a joint probability density [itex]p(x,y) = p(y|x) p(x)[/itex]. But for probabilty measures, it gets slightly more tricky...
The rule for probability densities follows from Bayes' Rule and the Law of Total Probabilities:

[itex] f_X(x|Y=y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} = \frac{f_Y(y|X=x)\,f_X(x)}{f_Y(y)} = \frac{f_Y(y|X=x)\,f_X(x)}{\int_{-\infty}^{\infty} f_Y(y|X=\xi )\,f_X(\xi )\,d\xi }\! [/itex].

Do you want to reformulate this?

EDIT: Any reformulation will need to take into account that the application of Bayes' Rule must be expressed as a posterior probability which is defined in terms of a probability space and the Law of Total Probabilities.
 
Last edited:
  • #5
winterfors said:
Each element [itex]x \in \Theta[/itex] maps onto another probability measure [itex]P_{\Omega | x}[/itex], on a sigma-algebra [itex]\Sigma_\Omega[/itex] on another space [itex]\Omega[/itex].

What is this mapping? How exactly is [itex]P_{\Omega | x}[/itex] defined?
 
  • #6
vladb said:
What is this mapping? How exactly is [itex]P_{\Omega | x}[/itex] defined?

I only meant that for every [itex]x \in \Theta[/itex] there is one probability measure [itex]P_{\Omega | x}[/itex] on [itex]\Sigma_\Omega[/itex] over the space [itex]\Omega[/itex]. The probaility measures [itex]P_{\Omega | x}[/itex] are thus consitional on [itex]x[/itex].

This allows us to define a joint probability measure on the (Cartesian) product space [itex](\Theta \times \Omega ,{\Sigma _\Theta } \times {\Sigma _\Omega })[/itex]
[tex]{P_{\Theta \times \Omega }}(C) \equiv \int\limits_{x\in A} {{P_{\Omega |x}}({B_x})d{P_\Theta }} [/tex] for any [itex]C \in {\Sigma _\Theta } \times {\Sigma _\Omega }[/itex], where [itex]A \in {\Sigma _\Theta }[/itex] and [itex]{B_x} \in {\Sigma _\Omega }[/itex] are defined as [itex]A = \{ x:(x,y) \in C\} [/itex] and [itex]{B_x} = \{ y:(x,y) \in C\} [/itex].If one could prove that [tex]{P_{\Omega |x}} = \frac{{d{P_{\Theta \times \Omega }}}}{{d{P_\Theta }}}(x)[/tex] where [itex]d{P_{\Theta \times \Omega }}/d{P_\Theta }[/itex] is the Radon–Nikodym derivative of [itex]{P_{\Theta \times \Omega }}[/itex] with respect to [itex]{P_\Theta }[/itex], then the theorem above (that I want to prove), i.e.
[tex]{P_{\Theta |y}}(A) = \int\limits_{x\in A} {\frac{{d{P_{\Omega |x}}}}{{d{P_\Omega }}}(y)d{P_\Theta }} [/tex] would follow by symmetry, but I'm not sure of how to do that...
 
Last edited:
  • #7
winterfors said:
This allows us to define a joint probability measure on the (Cartesian) product space [itex](\Theta \times \Omega ,{\Sigma _\Theta } \times {\Sigma _\Omega })[/itex]
[tex]{P_{\Theta \times \Omega }}(C) \equiv \int\limits_{x\in A} {{P_{\Omega |x}}({B_x})d{P_\Theta }} [/tex]

Perhaps I've misunderstood you, but... [itex]f[/itex] would be a Radon-Nikodym derivative if
[tex]
P_{\Theta\times\Omega}(A) = \int_A f \, dP_\Theta
[/tex] for all [itex]A \in \Sigma_\Theta\times\Sigma_\Omega[/itex] (product sigma-field). However this doesn't make any sense, because you have different measurable spaces on the left and right hand sides. Actually, the right hand side does not mean anything.
 
  • #8
vladb said:
Perhaps I've misunderstood you, but... [itex]f[/itex] would be a Radon-Nikodym derivative if
[tex]
P_{\Theta\times\Omega}(A) = \int_A f \, dP_\Theta
[/tex] for all [itex]A \in \Sigma_\Theta\times\Sigma_\Omega[/itex] (product sigma-field). However this doesn't make any sense, because you have different measurable spaces on the left and right hand sides. Actually, the right hand side does not mean anything.

In the equation you refer to, the integration is not over a subset [itex]A \in \Sigma_\Theta\times\Sigma_\Omega[/itex] but over a subset of [itex]\Theta[/itex].
 
  • #9
winterfors said:
In the equation you refer to, the integration is not over a subset [itex]A \in \Sigma_\Theta\times\Sigma_\Omega[/itex] but over a subset of [itex]\Theta[/itex].

I was just quoting your definition of [itex]P_{\Theta\times\Omega}[/itex]. Later in you question you refer to [itex]\frac{dP_{\Theta\times\Omega}}{dP_\Omega}[/itex] as a Radon-Nikodym derivative, but this, as I tried to point out in my previous reply, doesn't make sense.
 
  • #10
vladb said:
I was just quoting your definition of [itex]P_{\Theta\times\Omega}[/itex]. Later in you question you refer to [itex]\frac{dP_{\Theta\times\Omega}}{dP_\Omega}[/itex] as a Radon-Nikodym derivative, but this, as I tried to point out in my previous reply, doesn't make sense.

Yeah, I meant that [itex]{dP_{\Theta\times\Omega}}/{dP_\Omega}[/itex] would not be a function [itex]\Theta\rightarrow[0,\infty)[/itex], but a function [itex]\Theta\rightarrow\Gamma [/itex], where [itex]\Gamma[/itex] is a set of probaility measures on [itex]\Sigma_\Omega[/itex]. It might be a bit of a strech to call it a Radon-Nikodym derivative...
 
  • #11
SW VandeCarr said:
The rule for probability densities follows from Bayes' Rule and the Law of Total Probabilities:

[itex] f_X(x|Y=y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} = \frac{f_Y(y|X=x)\,f_X(x)}{f_Y(y)} = \frac{f_Y(y|X=x)\,f_X(x)}{\int_{-\infty}^{\infty} f_Y(y|X=\xi )\,f_X(\xi )\,d\xi }\! [/itex].

Do you want to reformulate this?

EDIT: Any reformulation will need to take into account that the application of Bayes' Rule must be expressed as a posterior probability which is defined in terms of a probability space and the Law of Total Probabilities.

Yes, reformulating this using probaility measures instead of probability densities would allow me to prove what I want.
 
  • #12
Hmm.. you denote by [itex]P_{\Omega | x}[/itex] just some arbitarily chosen probability measure on [itex](\Omega, \Sigma_\Omega)[/itex], i.e. you have a family of measures parametrized by [itex]x \in \Theta[/itex]. Then you define
a joint probability measure on the (Cartesian) product space [itex](\Theta \times \Omega ,{\Sigma _\Theta } \times {\Sigma _\Omega })[/itex]
[tex]{P_{\Theta \times \Omega }}(C) \equiv \int\limits_{x\in A} {{P_{\Omega |x}}({B_x})d{P_\Theta }} [/tex]
To answer the question, whether
[tex]{P_{\Omega |x}} = \frac{{d{P_{\Theta \times \Omega }}}}{{d{P_\Theta }}}(x)[/tex], where the RHS is not a Radon-Nikodym derivative, you need to define the RHS, i.e. it cannot be just some function [itex]\Theta \to \Gamma[/itex]. Also you can't define it using the same equation you used for definition of [itex]P_{\Theta\times\Omega}[/itex]. A theorem would show that two separately defined things are equal, but I see only one defined object.

I sort of understand what you are trying to do, but not quite. Can you give a simple example of calculations with concrete values/sets of what the theorem would look like? E.g. in discrete case?
 

1. What is Bayes' rule for probability measures?

Bayes' rule for probability measures is a theorem in probability theory that describes how to update the probability of a hypothesis based on new evidence. It states that the posterior probability of a hypothesis is equal to the prior probability of the hypothesis multiplied by the likelihood of the evidence given the hypothesis, divided by the probability of the evidence.

2. Why is Bayes' rule important in probability?

Bayes' rule is important because it allows us to update our beliefs or probabilities about a hypothesis as we gather new evidence. It is a fundamental concept in Bayesian statistics and is used in a wide range of applications such as machine learning, data analysis, and decision making.

3. How can Bayes' rule be proved for probability measures?

Bayes' rule can be proved using basic concepts of probability theory, such as the definition of conditional probability and the product rule. It can also be derived using Bayes' theorem, which is a more general form of the rule that applies to continuous probability distributions.

4. Can Bayes' rule be applied to any type of probability distribution?

Yes, Bayes' rule can be applied to any type of probability distribution as long as the prior probability and likelihood can be defined for the given distribution. It is commonly used with both discrete and continuous distributions.

5. Are there any limitations to using Bayes' rule?

One limitation of Bayes' rule is that it assumes that the prior probability and likelihood are known and accurately represent the true probabilities. In some cases, it may be difficult to determine these values, leading to inaccurate results. Additionally, Bayes' rule relies on the assumption of independence between the evidence and the hypothesis, which may not always hold true in real-world scenarios.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
958
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
93
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Advanced Physics Homework Help
Replies
1
Views
1K
  • Introductory Physics Homework Help
Replies
2
Views
1K
Back
Top