# Measure theoretic definition of conditional expecation

1. Apr 8, 2012

### logarithmic

I've been looking at the measure theoretic definition of a conditional expectation and it doesn't make too much sense to me.

Consider the definition given here: https://en.wikipedia.org/wiki/Conditional_expectation#Formal_definition

It says for a probability space $(\Omega,\mathcal{A},P)$, and sigma fields $\mathcal{B}\subset\mathcal{A}$, the random variable $Y=E(X|\mathcal{B})$ is the conditional expectation, if it satisfies

$\int_{B}YdP = \int_{B} X dP$ for all $B\in\mathcal{B}$ (*).

But clearly setting $Y=X$ satisfies (*). And it goes on to say that conditional expectations are almost surely unique. So this means that $E(X|\mathcal{B})=Y=X$ almost surely?

If we consider the following example $\Omega=\{1,2,3\}$, $\mathcal{A}$ is the power set of $\Omega$, $\mathcal{B}=\{\{1,2\},\{3\},\varnothing,\Omega\}$, $X(\omega)=\omega$ and $P(1) = .25, P(2)=.65, P(3)=.1$, then if you write you (*) for all the elements of $\mathcal{B}$, you'll get $E(X|\mathcal{B})=X$. But clearly this isn't correct, given {3}, the conditional expectation should be 3, and given {1,2} the conditional expectation should be $1\frac{.25}{.9} + 2\frac{.65}{.9}$.

It's usually said that sigma fields model information. I also don't see what sort of information $\mathcal{B}=\{\{1,2\},\{3\},\varnothing,\Omega\}$ gives.

Can someone explain where my understanding is wrong, and how this relates to the more intuitive definition of conditional expectations for random variables:
$E(X|Y)=\int_{\mathbb{R}}xf_{X|Y}(x,y)dx$.

Last edited: Apr 8, 2012
2. Apr 8, 2012

### chiro

Hey logarithmic.

The easiest way to think about conditional expectation in the intuitive way (not rigorous measure theoretic way although it's equivalent) is that you want to find the mean for the variable X given some conditional information present in Y.

If you want to get an actual number, then Y needs to be 'realized': in other words, you will need specific information about what realization Y actually takes to get a number. If you don't have this, then you will get something in terms of the random variable Y itself which will give you a function and not a number that has been evaluated.

One way to think of this visually is to assume that Z = f(X,Y) is a joint distribution and that if you take one 'slice' of this PDF with respect to a specific Y = y (the realization of Y is y) then you will get a slice of this bivariate distribution which will give you a normal univariate distribution for X given Y = y which can be seen as just a normal univariate distribution and then when you take the expectation of this, the intuitive idea is just the same as taking the expectation of a univariate distribution.

Now X and Y may not be just single random variables and they may represent something more complex, but the idea is the same.

3. Apr 8, 2012

### logarithmic

Yes that makes perfect sense to me. I already fully understand that definition of the conditional expectation.

It's the measure theoretic definition I don't understand.

4. Apr 8, 2012

### chiro

For the measure theoretic definition, from what you've said it seems that you are integrating over the region of B for X with respect to the probability measure P.

The measure you are integrating with respect to will depend on the nature of P which if it is a Borel measure, should satisfy all the probability axioms for that measure.

Again think of what the slice is in the context. Think about what the slice is with respect to X (for B) in the context of probability space in terms of the visual description above.

5. Apr 8, 2012

### logarithmic

When you integrate over the region of B for X wrt the probability measure P, what you get is not the conditional expectation.

The conditional expectation is the random variable $Y:=E(X|\mathcal{B})$, such that if you did the above to Y, you would get the result above.

$\int_{B}YdP = \int_{B} X dP$ for all $B\in\mathcal{B}$

Which leads to the statement $E(X|\mathcal{B}) = X$ a.s. for all sigma fields $\mathcal{B}$, which makes no sense.

How would you calculate the conditional expectation in the example in the original post? And how is conditioning on a sigma field related to conditioning on a random variable? The former does make sense to me, the latter doesn't

Last edited: Apr 8, 2012
6. Apr 8, 2012

### chiro

I just took a look at this:

http://en.wikipedia.org/wiki/Conditional_expectation#Formal_definition

If we use this definition then the definition of the slice(s) comes from realizing that we only integrate over the right region (note the Beta in the wiki definition).

Maybe something is ambiguous in the definition you have been given.

7. Apr 8, 2012

### logarithmic

The definition I'm using is the wiki definition, and I'm still not understanding what you're saying. What's the right region, and why should that expression for the integration be true? And how does that not imply that E(X|B) = X almost surely?

8. Apr 8, 2012

### chiro

In the above definition we integrate over every element of the B. It's very subtle but if you read the wiki definition, the region of integration B is for every element of β (in other words we are finding E[X|β] and B which is the region has the form B is an element of β]

I think (but I'm not sure) that the probability space must include all events including those from not only X but also from β as well which means that you should be treating everything as if it's one giant distribution. In other words, the thing you are integrating with respect to (the dP(w)) is the entire probability space that has X and θ as its subsets.

Does this make sense?

9. Apr 8, 2012

### logarithmic

It says:
$\int_{B}E(X|\mathcal{B})dP = \int_{B} X dP$ for all $B\in\mathcal{B}$ for each $B \in \mathcal{B}$. (*)

So the region of integration isn't the union of all the $B \in \mathcal{B}$, otherwise it would say to integrate over $\mathcal{B}$ instead of $B$. Nor does integrating $\mathcal{B}$ make sense, as the region of integration for Lebesgue integrals is always a subset of the sample space $\mathcal{\Omega}$, and never a sigma field like $\mathcal{B}$.

In my example where $\mathcal{B}=\{\{1,2\},\{3\},\varnothing,\Omega\}$, (*) is saying, the following 4 equalities hold:
$\int_{\{1,2\}}E(X|\mathcal{B})dP = \int_{\{1,2\}} X dP$
$\int_{\{3\}}E(X|\mathcal{B})dP = \int_{\{3\}}X dP$
$\int_{\varnothing}E(X|\mathcal{B})dP = \int_{\varnothing} X dP$
$\int_{\Omega}E(X|\mathcal{B})dP = \int_{\Omega} X dP$.

And no, your last paragraph doesn't really make sense. I'm already integrating wrt the entire sample space $\Omega$.

In my example on the original post where $X(\omega) = \omega$, we have $X(\omega) = 1I_{\{1\}}(\omega)+2I_{\{2\}}(\omega)+3I_{\{3\}}(\omega)$, where I is an indicator function.

Then the integral
$\int_{\{1,2\}}X(\omega)dP(\omega) = \int X(\omega)I_{\{1,2\}}(\omega)dP(\omega)=\int 1I_{\{1\}}(\omega)+2I_{\{2\}}(\omega)dP(\omega)$
, i.e. the last line integrate over the sample space, and since this is a simple function, the result is
$1P({1})+2P({2})$.

And somehow, this should be equal to
$\int_{\{1,2\}}E(X|\mathcal{B})(\omega)dP(\omega)$.

Last edited: Apr 8, 2012
10. Apr 8, 2012

### chiro

I see what you're saying and I agree with you but I think what we are integrating over is with respect to the entire probability space (the P(w) in dP(w)) and the β represents the constraint with respect to the entire space of P(w).

The thing that I think that β is referring to is the actual subset of the space that represents the events corresponding to X|B. In other words, we get the actual subset of these events that are a subset of P(w) and integrate over these events only.

In the visual sense we might have say a Y|X scenario in which we get a slice but in this measure theoretic viewpoint, we consider Y and X in the context of the probability measure space P(w).

11. Apr 8, 2012

### logarithmic

I don't think any of this makes sense in the context of Lebesgue integration. Can you give a concrete example.

The dP(w) term doesn't say anything about what we're integrating over, it's just notation to remind us of the measure that is used to evaluate the integral, i.e. in Lebesgue integration it is defined that
$\int I_{S}(\omega)dP(\omega)=P(S)$.

Last edited: Apr 8, 2012
12. Apr 8, 2012

### chiro

I'm not sure why this doesn't make sense.

If the measure was the standard infinitesimal measure used in Riemann integration for a one-dimensional integral, then the measure would be dx (or whatever dummy variable x is).

In this context the measure refers to any general measure, but the space is not over the real line but instead over a set corresponding to the exhaustable probability space P with the corresponding measure that satisfies the Kolmogorov probability axioms. In other words instead of the measure refering to the set R and the measure being infinitesimal, the set being refered to is the probability space P which is a set like R, with a measure satisfying the probability axioms.

If I were going to use an example of a Lebesgue measure instead of an infinitesimal one I would create a very simple space satisfying the Kolmogorov axioms and use the characteristic function for defining the integration.

Did you have a specific set P in the mind?

13. Apr 8, 2012

### logarithmic

Yes, I set out an example in the original post.

And I worked through it, and I always get $E(X|\mathcal{B}) = X$.

"but the space is not over the real line but instead over a set corresponding to the exhaustable probability space P"

P is not a probability space, it's a probability measure, i.e. it's not a set, but a function $P:\mathcal{A}\to[0,1]$. The integration is over the set $B\subset\Omega$, the triplet $(\Omega, \mathcal{A}, P)$ is the probability space.

I have no problem with computing the integral like
$\int_{\{1,2\}} X dP$,
which the definition requires (see post 9 for my working out).

But I see no reason why this must equal
$\int_{\{1,2\}} E(X|\mathcal{B}) dP$.

And saying that these 2 integrals are equal, together with a.s. uniqueness implies that $E(X|\mathcal{B}) = X$ a.s.. This doesn't make sense because the random variable analogue, i.e. $E(X|Y) = X$, is wrong for random variables Y.

Last edited: Apr 8, 2012
14. Apr 8, 2012

### chiro

Yeah sorry P is a measure in this context but it is defined relative to the probability space itself.

What I think is happening is that B is generated that corresponds to the actual events corresponding to X|B and not just related to either X or B independently. So instead of the {1,2} for unconditional X you would get a different set for the conditional problem that corresponds to the appropriate events.

So with regards to your examples, I have a feeling that you are going to get completely different things that you integrate over for each different example (in other words it won't be {1,2} for all the cases but a set that reflects the events described by the conditional representation).

15. Apr 8, 2012

### logarithmic

Your last 2 paragraphs don't make sense to me. The problem is that your language is very loose and not explicit or mathematical.

What do you mean that B is generated? What's the actual event corresponding to X|B? What do you mean by getting different sets of conditional problems? etc.

{1,2} is just 1 element of $\mathcal{B}$, that I used as an example. But (*) needs to hold for all 4 elements of $\mathcal{B}$ too.

16. Apr 8, 2012

### chiro

Well according to the wiki definition our B is an element of β.

To me this implies that a valid B that is an element of the set β but it is not constant in the way that you are describing. I am interpreting the theorem to say that for this theorem to hold, an element must exist from the set β that satisfies this identity.

It wouldn't make sense if it were the way you wrote it and I'm sure you agree. As long as a valid B exists, then the formula should hold.

You may have missed this crucial statement (from the wiki site)

You have to find the B to get the actual answer, this theorem just says that if the right B exists then the conditional expectation exists for all general X and β

17. Apr 8, 2012

### logarithmic

Yes, $B\in\mathcal{B}$, but $\mathcal{B}$ is a sigma field, which means that $B\in\mathcal{B}$ is a family of subsets of $\Omega$. So $B$ being an element of $\mathcal{B}$ is means that $B$ is a subset of $\Omega$.

The definition isn't that there exists an element $B\in\mathcal{B}$ such that (*) holds, it says (*) is satisfied for all $B\in\mathcal{B}$. Here (*) is the integral equality in the original post, which must hold by definition.

I'm not sure how the nonconstructive part is relevant. Sure it doesn't given an explicit formula for $E(X|\mathcal{B})$, but we can find $E(X|\mathcal{B})$ easily. Just set $E(X|\mathcal{B})=X$, then in (*) we have LHS = RHS.

18. Apr 8, 2012

### chiro

It's getting late here so I'll reply sometime tomorrow

19. Apr 24, 2012

### Hawkeye18

Hello Logarithmic,
you are missing the fact that the conditional expectation is $\mathcal B$-measurable.

20. Apr 25, 2012

### Stephen Tashi

Does setting $Y = X$ produce a function that is measureable on $\mathcal{B}$ ?

I'll take Hawkeye18's hint. Suppose we can find a Borel set $a$ in the reals such that the inverse image of $a$ under $X$ is not in $\mathcal{B}$. This would not contradict the fact that $X$ is a measureable function on the sigma algebra $\mathcal{A}$. It would contradict the fact that $X$ is a measureable function on $\mathcal{B}$.

I suppose it amounts to examining the validity of the claim: If $X$ is a random variable that is measureable on the probability space $(\Omega, \mathcal{A},P)$ and $\mathcal{B}$ is a subalgebra of $\mathcal{A}$ then $X$ is measureable on the probability space $(\Omega, \mathcal{B},P)$. Sounds nice, but is it true?