# X=0 almost surely => E(X)=0

Axioms of expectation:
1. X>0 => E(X)>0
2. E(1)=1
3. E(aX+bY) = aE(X) + bE(Y)
4. If X_n is a nondecreasing sequence of numbers and lim(X_n) = X, then E(X)=lim E(X_n) [montone convergence theorem]

Definition: P(A)=E[I(A)]

Using the above, prove that if X=0 almost surely [i.e. P(X=0)=1 ], then E(X)=0.

Proof:
X=0 almost surely <=> |X|=0 almost surely

[note: I is the indicator/dummy variable
I(A)=1 if event A occurs
I(A)=0 otherwise]

|X| = |X| I(|X|=0) + |X| I(|X|>0)

=> E(|X|) = E(0) + E[|X| I(|X|>0)]
=E(0*1) + E[|X| I(|X|>0)]
=0E(1) + E[|X| I(|X|>0)] (axiom 3)
=0 + E[|X| I(|X|>0)] (axiom 2)
=E[|X| I(|X|>0)]
=E[lim |X| * I(0<|X|<N)] (lim here means the limit as N->∞)
=lim E[|X| * I(0<|X|<N)] (axiom 4)
=lim E[N * I(0<|X|<N)]
=lim N * E[I(0<|X|<N)]
=lim N * P(0<|X|<N) (by definition)
=lim (0) since P(X=0)=1 => P(0<|X|<N)=0
=0
=>E(X)=0
=======================================

Now, I don't understand the parts in red.
1. We have |X| I(|X|=0), taking the expected value it becomes E(0), how come?? I don't understand why E[|X| I(|X|=0)] = E(0).
2. lim E[|X| * I(0<|X|<N)] = lim E[N * I(0<|X|<N)], why??

Thanks for explaining!

Last edited:

Hurkyl
Staff Emeritus
Gold Member
What is I(Z<0) + I(Z=0) + I(Z>0)?

(By the way, what do you mean by I?)

What is I(Z<0) + I(Z=0) + I(Z>0)?

(By the way, what do you mean by I?)

I is the indicator/dummy variable
I(A)=1 if event A occurs
I(A)=0 otherwise

First, your axioms aren't really well-defined. You have to specify the space of random variables to which it applies, which could be non-negative r.v.s, bounded r.v.s or integrable r.v.s and, if the latter, you should really specify what integrable means. Anyway...

1 is easy,
$$\vert X\vert I(\vert X\vert=0) = 0\ \Rightarrow E\left(\vert X\vert I(\vert X\vert=0)\right)=E(0).$$

2 isn't true. You should have an inequality,
$$\vert X\vert I(0<\vert X\vert\le N) \le N I(0<\vert X\vert\le N)\ \Rightarrow E\left(\vert X\vert I(0<\vert X\vert\le N)\right)\le E\left(N I(0<\vert X\vert\le N)\right).$$

1. But I don't understand why |X| I(|X|=0) = 0.
Here we are only given that "X=0 almost surely", instead of "X=0". Somehow I was told that "X=0 almost surely" is NOT the same thing as saying that "X=0", so in this case we cannot say that "X=0", and now I don't see why it would always be the case that |X| * I(|X|=0) =0. Would you mind explaining a little more on this?

By the way, I actually don't quite understand what is the real difference between "X=0 almost surely" and "X=0". If P(X=0)=1, then we can say with certainty that X must equal 0. It is impossible(i.e. with zero proabiliity) for X to not be 0. So this seems to be exactly the same as saying that X=0??

2. Yes, I think there may be a typo in the source and it should be an inequality.
$$\vert X\vert I(0<\vert X\vert\le N) \le N I(0<\vert X\vert\le N)\ \Rightarrow E\left(\vert X\vert I(0<\vert X\vert\le N)\right)\le E\left(N I(0<\vert X\vert\le N)\right).$$
I agree with this. But now, if we take the limit as N->∞ of both sides of the expected values, would the inequality still hold? What justifies this?

Hurkyl
Staff Emeritus
Gold Member
You're dealing with inequalities and absolute values -- you should be dividing things into cases!

1. But I don't understand why |X| I(|X|=0) = 0.​
Split it into cases. Analyze each case individually.

As for not understanding "almost surely", surely you've been introduced to some sets of measure zero? Can you find a subset of [0,1] whose Lesbegue measure is zero? Now, can you find a function zero on that set?

It also might help to have at your disposal probability measures of much different qualitative behavior to consider. For example, consider the following probability measure on R:

$$P(S) = \begin{cases} 1 & 0 \in S \\ 0 & 0 \notin S \end{cases}$$

As for not understanding "almost surely", surely you've been introduced to some sets of measure zero? Can you find a subset of [0,1] whose Lesbegue measure is zero? Now, can you find a function zero on that set?

It also might help to have at your disposal probability measures of much different qualitative behavior to consider. For example, consider the following probability measure on R:

$$P(S) = \begin{cases} 1 & 0 \in S \\ 0 & 0 \notin S \end{cases}$$

Unfontunately, I am a second year undergrad stat student and I don't have the background of Lesbegue integral and measure zero. Everything in my multivariable calculus course is in terms of Riemann integral and we haven't discussed the idea of measure zero.

Is it possible to explain the difference between "X=0 a.s." and "X=0" in a simpler way? I know my understanding won't be perfect, but for now I just want to understand the difference in the simplest sense.

Thank you!

1. Now I see why it's always 0.
Claim: |X| I(|X|=0) is always 0

If X=0, then |X|=0, so |X| I(|X|=0) = 0
If X≠0, then I(|X|=0) =0, so |X| I(|X|=0) = 0
Therefore, |X| I(|X|=0) is always 0.

But now I have a question. In the assumptions, we are given that X=0 almost surely [i.e. P(X=0)=1]. We can say that X=0 with certainty. It is impossible for X to not be 0. Then, why are we even considering the case X≠0? There should only be one possible case, X=0, right?

2. |X| * I(0<|X|<N) < N * I(0<|X|<N)
=> E[|X| * I(0<|X|<N)] < E[N * I(0<|X|<N)]
=> lim E[|X| * I(0<|X|<N)] < lim E[N * I(0<|X|<N)] ???????????
(lim=limit as N->infinity)

I was flipping through my calculus textbooks, but I couldn't find a theorem that applies.

Any help is appreciated!

To be precise, you can't even take the limit as N tends to infinity unless you can first show that it converges. However, it is equal to 0 for each N, so that problem is easily solved.

To be precise, you can't even take the limit as N tends to infinity unless you can first show that it converges. However, it is equal to 0 for each N, so that problem is easily solved.
Um...do you mean the left side lim E[|X| * I(0<|X|<N)] equals 0 for all N, or the right side lim E[N * I(0<|X|<N)] equals 0 for all N?

In general, is it true that f(N)≤g(N) ALWAYS implies
lim f(N)≤ lim g(N) ?
N->∞ N->∞
And we can safely take the limit of both sides while still preserving the same inequality sign??

Thanks!

First I'll give a (hopefully simple) intuitive explanation addressing your X=0 a.s. versus X=0. A related discussion is the meaning of P(A)=0. Some people use the word "impossible" in the sense that "P(A)=0 means A is impossible." Kolmogorov calls that usage incorrect. He says if A is impossible, then P(A)=0, but not conversely. Instead, P(A)=0 means A is "practically impossible."

Example: Select a random real number from the interval [0,3] according to a uniform distribution. The probability of selecting the number 2 is 0, meaning it is practically impossible, but not impossible. It has zero probability of occurring.

Now use the same example, letting X=0 if the random real number is not 2, but X=57 if the random real number is 2. Then X=0 a.s., but X is not identically 0.

Second: if f(N) < g(N) for all N, then lim f(N) <= lim g(N). Note how the strict inequality might become equality in the limit. If f(N) <= g(N) for all N, then lim f(N) <= lim g(N).

First I'll give a (hopefully simple) intuitive explanation addressing your X=0 a.s. versus X=0. A related discussion is the meaning of P(A)=0. Some people use the word "impossible" in the sense that "P(A)=0 means A is impossible." Kolmogorov calls that usage incorrect. He says if A is impossible, then P(A)=0, but not conversely. Instead, P(A)=0 means A is "practically impossible."

Example: Select a random real number from the interval [0,3] according to a uniform distribution. The probability of selecting the number 2 is 0, meaning it is practically impossible, but not impossible. It has zero probability of occurring.

Now use the same example, letting X=0 if the random real number is not 2, but X=57 if the random real number is 2. Then X=0 a.s., but X is not identically 0.
Thank you! This is very helpful!!

Second: if f(N) < g(N) for all N, then lim f(N) <= lim g(N). Note how the strict inequality might become equality in the limit. If f(N) <= g(N) for all N, then lim f(N) <= lim g(N).
This doesn't seem too obvious to me. Is there a name for this theorem? It looks like the squeeze/sandwich theorem may be appropriate, but it is not quite saying the same thing as the above...

Once again, thanks for your help!

If s(N)>=0 for all N, and if lim s(N)=L, then it is not hard to prove L>=0. (Suppose not, suppose L<0, take epsilon s.t. L<L+epsilon<0, then eventually s(N)<L+epsilon<0.)

Now let s(N)=g(N)-f(N). It follows that if g(N)>=f(N) for all N and if lim g(N) and lim f(N) both exist, then lim g(N) >= lim f(N).

If s(N)>=0 for all N, and if lim s(N)=L, then it is not hard to prove L>=0. (Suppose not, suppose L<0, take epsilon s.t. L<L+epsilon<0, then eventually s(N)<L+epsilon<0.)

Now let s(N)=g(N)-f(N). It follows that if g(N)>=f(N) for all N and if lim g(N) and lim f(N) both exist, then lim g(N) >= lim f(N).
Thanks! This really helps!

But it seems like an important condition is that both lim g(N) and lim f(N) must exist.
So in our context, we have to show that both lim E[|X| * I(0<|X|<N)] and lim E[N * I(0<|X|<N)] exist before we can say that
E[|X| * I(0<|X|<N)] < E[N * I(0<|X|<N)]
=> lim E[|X| * I(0<|X|<N)] < lim E[N * I(0<|X|<N)]
How can we show the existence of lim g(N) and lim f(N) in our case?