Why does conditional probability used in mean square error equal zero?

EdMel
Messages
13
Reaction score
0
Hi guys,

I am having trouble showing that \mathbb{E}\left[(Y-\mathbb{E}[Y|X])^{2}\right]=0.

I understand the proof of why E[Y|X] minimizes the mean square error, but I cannot understand why it is then equal to zero.

I tried multiplying out the square to get \mathbb{E}\left[Y^{2}\right]-2\mathbb{E}\left[Y\mathbb{E}[Y|X]\right]+\mathbb{E}\left[\mathbb{E}[Y|X]\mathbb{E}[Y|X]\right]
but have not been able to justify \mathbb{E}\left[Y\mathbb{E}[Y|X]\right]=\mathbb{E}\left[Y^{2}\right]<br /> or \mathbb{E}\left[\mathbb{E}[Y|X]\mathbb{E}[Y|X]\right]=\mathbb{E}\left[Y^{2}\right].

Thanks in advance.
 
Physics news on Phys.org
Can you tell us your definition of the conditional expectation and what properties you are allowed to use?
 
It would also be helpful to improve the notation. If X and Y are random variables and f(X,Y) is a function of them then the notation E f(X,y) is ambiguous. It is not clear whether the expectation is being computed with respect to the distribution of X or the distribution of Y - or perhaps with respect to the joint distribution for (X,Y).

You can use a subscript to denote which distribution is used to compute the expectation. For example, if Y is not a function of X then E_X ( E_Y ( 3Y + 1) ) is the expectation of with respect to the distribution of X of the constant value E_Y(3Y + 1) Hence E_X (E_Y (3Y+1)) = E_Y (3Y+ 1).
 
It's hard to prove because it is not true unless (Y-E(Y|X))==0. Are you sure that (Y-E(Y|X)) is supposed to be squared?
 
FactChecker said:
It's hard to prove because it is not true unless (Y-E(Y|X))==0.

And before (Y - E(Y|X)) is equal or not equal to zero, it would have to mean something. How do we interpret Y - E(Y|X) ? Is it a random variable? To realize it , do we realize a value Y = y0 from the distribution of Y and then take the expected value of the constant y0 with respect to the distribution of X ?
 
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Back
Top