Why does conditional probability used in mean square error equal zero?

EdMel · Mar 27, 2014

Hi guys,

I am having trouble showing that [itex]\mathbb{E}\left[(Y-\mathbb{E}[Y|X])^{2}\right]=0[/itex].

I understand the proof of why E[Y|X] minimizes the mean square error, but I cannot understand why it is then equal to zero.

I tried multiplying out the square to get [itex]\mathbb{E}\left[Y^{2}\right]-2\mathbb{E}\left[Y\mathbb{E}[Y|X]\right]+\mathbb{E}\left[\mathbb{E}[Y|X]\mathbb{E}[Y|X]\right][/itex]
but have not been able to justify [itex]\mathbb{E}\left[Y\mathbb{E}[Y|X]\right]=\mathbb{E}\left[Y^{2}\right][/itex] or [itex]\mathbb{E}\left[\mathbb{E}[Y|X]\mathbb{E}[Y|X]\right]=\mathbb{E}\left[Y^{2}\right][/itex].

Thanks in advance.

micromass · Mar 27, 2014

Can you tell us your definition of the conditional expectation and what properties you are allowed to use?

Stephen Tashi · Mar 27, 2014

It would also be helpful to improve the notation. If [itex]X[/itex] and [itex]Y[/itex] are random variables and [itex]f(X,Y)[/itex] is a function of them then the notation [itex]E f(X,y)[/itex] is ambiguous. It is not clear whether the expectation is being computed with respect to the distribution of [itex]X[/itex] or the distribution of [itex]Y[/itex] - or perhaps with respect to the joint distribution for [itex](X,Y)[/itex].

You can use a subscript to denote which distribution is used to compute the expectation. For example, if [itex]Y[/itex] is not a function of [itex]X[/itex] then [itex]E_X ( E_Y ( 3Y + 1) )[/itex] is the expectation of with respect to the distribution of [itex]X[/itex] of the constant value [itex]E_Y(3Y + 1)[/itex] Hence [itex]E_X (E_Y (3Y+1)) = E_Y (3Y+ 1)[/itex].

FactChecker · Mar 28, 2014

It's hard to prove because it is not true unless (Y-E(Y|X))==0. Are you sure that (Y-E(Y|X)) is supposed to be squared?

Stephen Tashi · Mar 28, 2014

FactChecker said:

It's hard to prove because it is not true unless (Y-E(Y|X))==0.

And before (Y - E(Y|X)) is equal or not equal to zero, it would have to mean something. How do we interpret Y - E(Y|X) ? Is it a random variable? To realize it , do we realize a value Y = y0 from the distribution of Y and then take the expected value of the constant y0 with respect to the distribution of X ?

Why does conditional probability used in mean square error equal zero?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect