Conditional Expectation of a random variable

kblue · Dec 2, 2012

My professor made a rather concise statement in class, which sums to this: E(Y|X=x_i) = constant. E(Y|X )= variable. Could anyone help me understand how the expectation is calculated for the second case? I understand that for different values of x_i, we'll have different values for the expectation. This is where my thoughts are all muddled up:

E(Y|X)=[itex]\sum[/itex]_i y_i*P(Y=y_i|X) = [itex]\sum[/itex]_i y_i * P(X|Y=y_i)*P(Y=y_i)/P(X).

Could anyone explain the above computation, and how that is a variable? Also, it is my understanding that summing the probability P(Y=y_i|X) over all values of Y won't be 1. Is this true?

Stephen Tashi · Dec 3, 2012

kblue said:

Could anyone explain the above computation, and how that is a variable?

One not-quite-correct explanation is to confuse "random variables" with ordinary variables.

It would go like this:

If you had an ordinary function such as g(X) = 3X then it woud be fair to say that expression g(3) represents a constant and the expression g(X) represents a fuction of X, which I suppose you would call "variable".

[itex]E(Y|X)[/itex] is some function of [itex]X[/itex].
When you give [itex]X[/itex] a specific value this is denoted by [itex]E(Y|X=x)[/itex] and that notation represents a constant.

The expression [itex]E(Y|X)[/itex] is not a two-variable function. The "[itex]Y[/itex]" in that notation jus tells you that you must do a summation over all possible values of [itex]Y[/itex]. Since you do that summation, the answer is not a function of the variable [itex]Y[/itex].

If we need a more precise explanation, we must heed the saying (that was the theme of a thread on the forum recently) "Random variables are not random and they are not variables".

It would be fair to say that [itex]E(Y|X)[/itex] depends on the random variable [itex]Y[/itex] because this says it depends on the entire distribution of [itex]Y[/itex]. "Random variables" are not ordinary variables because the definition of a "random variable" carries with it all the baggage about a distribution function that is not present in the definition or odinary variables. So [itex]E(Y|X)[/itex] isn't a function of an ordinary variable named "[itex]Y[/itex]".

Random variables technically do not take on specific values. It is their realizations that have specific values. When we say something like "Suppose the random variable X = 5", what we should say is "Suppose we have realization of the random variable X and the value of that realization is 5". The statement "X=x" means a realization of the random variable X is the value x.

I, myself, would have a hard time defining the notation [itex]E(Y|X)[/itex] using those precise notions and I tend to think about in the crude way that I first explained it! The [itex]E(Y[/itex] tells you to sum a certain function of possible realizations of the random variable [itex]Y[/itex] over all possible values that a realization may take. The [itex]X[/itex] tells you that when you do that sum, you assume that one particular value of the random variable [itex]X[/itex] has been realized and we abuse notation by denoting that value with the letter [itex]X[/itex] also. That particular value is a "variable" in the ordinary sense of the word variable.
"Variables" and "constants" are not adequately explained in ordinary mathematics courses. For example in the earlier discussion the literal "x" is used to repesent a "constant". We are asked to pretend it is a specific numerical value, yet at the same time it could be any specific numerical value. By contrast, in the function [itex]g(X) = 3X[/itex] we might be asked to pretend the literal "[itex]X[/itex]" is a "variable", but it seems to be on the same footing as the literal "x" insofar as it can take on any specific value. In ordinary math classes, you have to make your way though discussions that distinguish between variables and constants without have formal training of how to do that. (And most people with mathematical aptitude are able to.)

If you've taken logic courses or done structured computer programming, you know that symbols have a certain "scope". Within a certain context (such as an argument to a function) they can take unspecified values and within another context (such as a "read-only" global variable referenced inside a function but initialized outside of it) they hold only one specific value. That's the sort of formalism needed to deal with the distinction between variables and constants in a rigorous manner.

Also, it is my understanding that summing the probability P(Y=y_i|X) over all values of Y won't be 1. Is this true?

No, I don't think that's true if by "summing" you mean that you assume each term in the sum assumes the same unspecified realization of the random variable X.

To understand the computation you asked about, think about Bayes rule.

Conditional Expectation of a random variable

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Conditional Expectation of a random variable

Similar threads