Kakashi
- 31
- 1
Variance is the average of the squared distance from the mean of the distribution fX∣Y. When y is not specified, Var(X∣Y) is a random variable that depends on Y.
$$ Var(X|Y)=E[X^{2}|Y]-(E[X|Y])^{2} $$
Taking expectations:
$$ E[Var(X|Y)]=E[E[X^{2}|Y]-(E[X|Y])^{2}]=E[E[X^{2}|Y]]-E[(E[X|Y])^{2}]=E[X^{2}]-E[(E[X|Y])^{2}] $$
and
$$ Var(E[X|Y])=E[(E[X|Y])^{2}]-(E[E[X|Y]])^{2} $$
Adding these gives:
$$ E[Var(X|Y)]+Var(E[X|Y])=E[X^{2}]-(E[X])^{2}=Var(X) $$
Thinking from a more practical point of view: when we divide the real line into small intervals, run the experiment many times, count how often X falls within each interval, and then divide by both the total number of trials and the length of the interval, we obtain fX, from which we can compute E[X] and Var(X).
How do we know that there is another random variable Y that we can condition on, find fX∣Y, and compute E[X|Y] and Var(X|Y) this way? It seems like we would have to repeat this for many values of y, which looks like more work than working directly with X.
Why isn’t the variability in X just the weighted average of the variances of the different conditional distributions fX∣Y or PX|Y?
Why do we need to include the variability of the mean, Var(E[X∣Y])?
Isn’t the variability of the mean already somehow accounted for in E[Var(X∣Y)], since each conditional variance is computed around its own mean?
$$ Var(X|Y)=E[X^{2}|Y]-(E[X|Y])^{2} $$
Taking expectations:
$$ E[Var(X|Y)]=E[E[X^{2}|Y]-(E[X|Y])^{2}]=E[E[X^{2}|Y]]-E[(E[X|Y])^{2}]=E[X^{2}]-E[(E[X|Y])^{2}] $$
and
$$ Var(E[X|Y])=E[(E[X|Y])^{2}]-(E[E[X|Y]])^{2} $$
Adding these gives:
$$ E[Var(X|Y)]+Var(E[X|Y])=E[X^{2}]-(E[X])^{2}=Var(X) $$
Thinking from a more practical point of view: when we divide the real line into small intervals, run the experiment many times, count how often X falls within each interval, and then divide by both the total number of trials and the length of the interval, we obtain fX, from which we can compute E[X] and Var(X).
How do we know that there is another random variable Y that we can condition on, find fX∣Y, and compute E[X|Y] and Var(X|Y) this way? It seems like we would have to repeat this for many values of y, which looks like more work than working directly with X.
Why isn’t the variability in X just the weighted average of the variances of the different conditional distributions fX∣Y or PX|Y?
Why do we need to include the variability of the mean, Var(E[X∣Y])?
Isn’t the variability of the mean already somehow accounted for in E[Var(X∣Y)], since each conditional variance is computed around its own mean?