Register to reply 
Measure theoretic definition of conditional expecation 
Share this thread: 
#1
Apr812, 03:14 AM

P: 108

I've been looking at the measure theoretic definition of a conditional expectation and it doesn't make too much sense to me.
Consider the definition given here: https://en.wikipedia.org/wiki/Condit...mal_definition It says for a probability space [itex](\Omega,\mathcal{A},P)[/itex], and sigma fields [itex]\mathcal{B}\subset\mathcal{A}[/itex], the random variable [itex]Y=E(X\mathcal{B})[/itex] is the conditional expectation, if it satisfies [itex]\int_{B}YdP = \int_{B} X dP[/itex] for all [itex]B\in\mathcal{B}[/itex] (*). But clearly setting [itex]Y=X[/itex] satisfies (*). And it goes on to say that conditional expectations are almost surely unique. So this means that [itex]E(X\mathcal{B})=Y=X[/itex] almost surely? If we consider the following example [itex]\Omega=\{1,2,3\}[/itex], [itex]\mathcal{A}[/itex] is the power set of [itex]\Omega[/itex], [itex]\mathcal{B}=\{\{1,2\},\{3\},\varnothing,\Omega\}[/itex], [itex]X(\omega)=\omega[/itex] and [itex]P(1) = .25, P(2)=.65, P(3)=.1[/itex], then if you write you (*) for all the elements of [itex]\mathcal{B}[/itex], you'll get [itex]E(X\mathcal{B})=X[/itex]. But clearly this isn't correct, given {3}, the conditional expectation should be 3, and given {1,2} the conditional expectation should be [itex]1\frac{.25}{.9} + 2\frac{.65}{.9}[/itex]. It's usually said that sigma fields model information. I also don't see what sort of information [itex]\mathcal{B}=\{\{1,2\},\{3\},\varnothing,\Omega\}[/itex] gives. Can someone explain where my understanding is wrong, and how this relates to the more intuitive definition of conditional expectations for random variables: [itex]E(XY)=\int_{\mathbb{R}}xf_{XY}(x,y)dx[/itex]. 


#2
Apr812, 05:27 AM

P: 4,572

The easiest way to think about conditional expectation in the intuitive way (not rigorous measure theoretic way although it's equivalent) is that you want to find the mean for the variable X given some conditional information present in Y. If you want to get an actual number, then Y needs to be 'realized': in other words, you will need specific information about what realization Y actually takes to get a number. If you don't have this, then you will get something in terms of the random variable Y itself which will give you a function and not a number that has been evaluated. One way to think of this visually is to assume that Z = f(X,Y) is a joint distribution and that if you take one 'slice' of this PDF with respect to a specific Y = y (the realization of Y is y) then you will get a slice of this bivariate distribution which will give you a normal univariate distribution for X given Y = y which can be seen as just a normal univariate distribution and then when you take the expectation of this, the intuitive idea is just the same as taking the expectation of a univariate distribution. Now X and Y may not be just single random variables and they may represent something more complex, but the idea is the same. 


#3
Apr812, 05:40 AM

P: 108

It's the measure theoretic definition I don't understand. 


#4
Apr812, 06:00 AM

P: 4,572

Measure theoretic definition of conditional expecation
For the measure theoretic definition, from what you've said it seems that you are integrating over the region of B for X with respect to the probability measure P.
The measure you are integrating with respect to will depend on the nature of P which if it is a Borel measure, should satisfy all the probability axioms for that measure. Again think of what the slice is in the context. Think about what the slice is with respect to X (for B) in the context of probability space in terms of the visual description above. 


#5
Apr812, 06:11 AM

P: 108

The conditional expectation is the random variable [itex]Y:=E(X\mathcal{B})[/itex], such that if you did the above to Y, you would get the result above. [itex]\int_{B}YdP = \int_{B} X dP[/itex] for all [itex]B\in\mathcal{B}[/itex] Which leads to the statement [itex]E(X\mathcal{B}) = X[/itex] a.s. for all sigma fields [itex]\mathcal{B}[/itex], which makes no sense. How would you calculate the conditional expectation in the example in the original post? And how is conditioning on a sigma field related to conditioning on a random variable? The former does make sense to me, the latter doesn't 


#6
Apr812, 06:25 AM

P: 4,572

http://en.wikipedia.org/wiki/Conditi...mal_definition If we use this definition then the definition of the slice(s) comes from realizing that we only integrate over the right region (note the Beta in the wiki definition). Maybe something is ambiguous in the definition you have been given. 


#7
Apr812, 06:31 AM

P: 108




#8
Apr812, 06:47 AM

P: 4,572

I think (but I'm not sure) that the probability space must include all events including those from not only X but also from β as well which means that you should be treating everything as if it's one giant distribution. In other words, the thing you are integrating with respect to (the dP(w)) is the entire probability space that has X and θ as its subsets. Does this make sense? 


#9
Apr812, 07:03 AM

P: 108

[itex]\int_{B}E(X\mathcal{B})dP = \int_{B} X dP[/itex] for all [itex]B\in\mathcal{B}[/itex] for each [itex]B \in \mathcal{B}[/itex]. (*) So the region of integration isn't the union of all the [itex]B \in \mathcal{B}[/itex], otherwise it would say to integrate over [itex]\mathcal{B}[/itex] instead of [itex]B[/itex]. Nor does integrating [itex]\mathcal{B}[/itex] make sense, as the region of integration for Lebesgue integrals is always a subset of the sample space [itex]\mathcal{\Omega}[/itex], and never a sigma field like [itex]\mathcal{B}[/itex]. In my example where [itex]\mathcal{B}=\{\{1,2\},\{3\},\varnothing,\Omega\}[/itex], (*) is saying, the following 4 equalities hold: [itex]\int_{\{1,2\}}E(X\mathcal{B})dP = \int_{\{1,2\}} X dP[/itex] [itex]\int_{\{3\}}E(X\mathcal{B})dP = \int_{\{3\}}X dP[/itex] [itex]\int_{\varnothing}E(X\mathcal{B})dP = \int_{\varnothing} X dP[/itex] [itex]\int_{\Omega}E(X\mathcal{B})dP = \int_{\Omega} X dP[/itex]. And no, your last paragraph doesn't really make sense. I'm already integrating wrt the entire sample space [itex]\Omega[/itex]. In my example on the original post where [itex]X(\omega) = \omega[/itex], we have [itex]X(\omega) = 1I_{\{1\}}(\omega)+2I_{\{2\}}(\omega)+3I_{\{3\}}(\omega)[/itex], where I is an indicator function. Then the integral [itex]\int_{\{1,2\}}X(\omega)dP(\omega) = \int X(\omega)I_{\{1,2\}}(\omega)dP(\omega)=\int 1I_{\{1\}}(\omega)+2I_{\{2\}}(\omega)dP(\omega)[/itex] , i.e. the last line integrate over the sample space, and since this is a simple function, the result is [itex]1P({1})+2P({2})[/itex]. And somehow, this should be equal to [itex]\int_{\{1,2\}}E(X\mathcal{B})(\omega)dP(\omega)[/itex]. 


#10
Apr812, 07:09 AM

P: 4,572

The thing that I think that β is referring to is the actual subset of the space that represents the events corresponding to XB. In other words, we get the actual subset of these events that are a subset of P(w) and integrate over these events only. In the visual sense we might have say a YX scenario in which we get a slice but in this measure theoretic viewpoint, we consider Y and X in the context of the probability measure space P(w). 


#11
Apr812, 07:18 AM

P: 108

The dP(w) term doesn't say anything about what we're integrating over, it's just notation to remind us of the measure that is used to evaluate the integral, i.e. in Lebesgue integration it is defined that [itex]\int I_{S}(\omega)dP(\omega)=P(S)[/itex]. 


#12
Apr812, 07:27 AM

P: 4,572

If the measure was the standard infinitesimal measure used in Riemann integration for a onedimensional integral, then the measure would be dx (or whatever dummy variable x is). In this context the measure refers to any general measure, but the space is not over the real line but instead over a set corresponding to the exhaustable probability space P with the corresponding measure that satisfies the Kolmogorov probability axioms. In other words instead of the measure refering to the set R and the measure being infinitesimal, the set being refered to is the probability space P which is a set like R, with a measure satisfying the probability axioms. If I were going to use an example of a Lebesgue measure instead of an infinitesimal one I would create a very simple space satisfying the Kolmogorov axioms and use the characteristic function for defining the integration. Did you have a specific set P in the mind? 


#13
Apr812, 07:30 AM

P: 108

And I worked through it, and I always get [itex]E(X\mathcal{B}) = X[/itex]. "but the space is not over the real line but instead over a set corresponding to the exhaustable probability space P" P is not a probability space, it's a probability measure, i.e. it's not a set, but a function [itex]P:\mathcal{A}\to[0,1][/itex]. The integration is over the set [itex]B\subset\Omega[/itex], the triplet [itex](\Omega, \mathcal{A}, P)[/itex] is the probability space. I have no problem with computing the integral like [itex]\int_{\{1,2\}} X dP[/itex], which the definition requires (see post 9 for my working out). But I see no reason why this must equal [itex]\int_{\{1,2\}} E(X\mathcal{B}) dP[/itex]. And saying that these 2 integrals are equal, together with a.s. uniqueness implies that [itex]E(X\mathcal{B}) = X[/itex] a.s.. This doesn't make sense because the random variable analogue, i.e. [itex]E(XY) = X[/itex], is wrong for random variables Y. 


#14
Apr812, 07:53 AM

P: 4,572

What I think is happening is that B is generated that corresponds to the actual events corresponding to XB and not just related to either X or B independently. So instead of the {1,2} for unconditional X you would get a different set for the conditional problem that corresponds to the appropriate events. So with regards to your examples, I have a feeling that you are going to get completely different things that you integrate over for each different example (in other words it won't be {1,2} for all the cases but a set that reflects the events described by the conditional representation). 


#15
Apr812, 08:05 AM

P: 108

What do you mean that B is generated? What's the actual event corresponding to XB? What do you mean by getting different sets of conditional problems? etc. {1,2} is just 1 element of [itex]\mathcal{B}[/itex], that I used as an example. But (*) needs to hold for all 4 elements of [itex]\mathcal{B}[/itex] too. 


#16
Apr812, 08:17 AM

P: 4,572

To me this implies that a valid B that is an element of the set β but it is not constant in the way that you are describing. I am interpreting the theorem to say that for this theorem to hold, an element must exist from the set β that satisfies this identity. It wouldn't make sense if it were the way you wrote it and I'm sure you agree. As long as a valid B exists, then the formula should hold. You may have missed this crucial statement (from the wiki site) 


#17
Apr812, 08:37 AM

P: 108

The definition isn't that there exists an element [itex]B\in\mathcal{B}[/itex] such that (*) holds, it says (*) is satisfied for all [itex]B\in\mathcal{B}[/itex]. Here (*) is the integral equality in the original post, which must hold by definition. I'm not sure how the nonconstructive part is relevant. Sure it doesn't given an explicit formula for [itex]E(X\mathcal{B})[/itex], but we can find [itex]E(X\mathcal{B})[/itex] easily. Just set [itex]E(X\mathcal{B})=X[/itex], then in (*) we have LHS = RHS. 


#18
Apr812, 08:41 AM

P: 4,572

It's getting late here so I'll reply sometime tomorrow



Register to reply 
Related Discussions  
Conditional expectation, Lebesgue measure  Calculus & Beyond Homework  2  
Value of a measure theoretic integral over a domain shrinking to a single set  Set Theory, Logic, Probability, Statistics  2  
Weird statement in my book about (measure theoretic) conditional expectation  Set Theory, Logic, Probability, Statistics  5  
Confused about settheoretic definition of a function  Set Theory, Logic, Probability, Statistics  4  
Set theoretic definition of a singleton.  Set Theory, Logic, Probability, Statistics  7 