# Mistake using Borel-Cantelli Lemma

Suppose that $X_i\sim N(0,1)$ are iid, then by the Strong Law of Large Numbers the sample mean $\bar{X}_n=\frac{1}{n}\sum_{i=1}^n X_i$ converges almost surely to the mean, which in this case is 0.

Recall that by definition, $\bar{X}_n \to 0$ almost surely means that $P(\lim_{n\to\infty}|\bar{X_n}-0|=0)=1$. Since for the limit to converge, the limsup (and liminf) must also converge to the same value, we have $P(\limsup_{n\to\infty}|\bar{X_n}-0|=0)=1$.

But $P(\limsup_{n\to\infty}|\bar{X_n}|=0)=P(|\bar{X_n}|=0\text{ i.o.})$, where i.o. means infinitely often (this is the definition of i.o.). Now, $\sum_n P(|\bar{X_n}|=0)=0<\infty$, since $\bar{X}_n$ is a continuous random variable and the probability it takes any particular value is 0. But by the Borel-Cantelli Lemma this would imply that $P(|\bar{X_n}|=0\text{ i.o.})=0$, not 1, as the Strong Law of Large Numbers says.

I've been trying all day to find what's wrong with my argument. We can replace the normal distribution with any continuous distribution with finite mean. Can anyone help me?

Last edited:

Related Set Theory, Logic, Probability, Statistics News on Phys.org
The error is in this line:

But $P(\limsup_{n\to\infty}|X_n|=0)=P(|X_n|=0 \text{ i.o.})$
It is entirely possible for $\limsup_{n\rightarrow \infty} |X_n|=0$ without having $|X_n|=0$ even once, let alone infinitely often (example: suppose that $X_n=\frac{1}{n}$).

I believe that equality you quoted is correct.

It's true by definition, it's the definition of i.o.:
http://planetmath.org/encyclopedia/Io.html [Broken]

Alternatively, we could have just use the version of the Borel-Cantelli Lemma given at Wikipedia (http://en.wikipedia.org/wiki/Borel–Cantelli_lemma) on $P(\limsup_{n\to\infty}|\bar{X_n}|=0)$, without any reference to i.o..

Also, all the $X_n$'s after the first one should be $\bar{X}_n$, I'll edit that in.

Last edited by a moderator:
I believe that equality you quoted is correct.

It's true by definition, it's the definition of i.o.:
http://planetmath.org/encyclopedia/Io.html [Broken]

Also, all the $X_n$'s after the first one should be $\bar{X}_n$, I'll edit that in.
No, it isn't. |X_n| = 0 i.o. means that for infinitely many n, |X_n|=0. This is very different from saying that limsup |X_n|=0.

You're apparently being confused by the fact that the planetmath definition uses the word limsup. But in the definition of i.o., the limsup operator is being applied to a sequence of events, not a sequence of values of random variables. This becomes clear if we write out the formal definition of the actual events being described. Assuming the r.v.s X_n are defined on the underlying probability space Ω, we have:

$$"\limsup_{n\rightarrow\infty} |\overline{X_n}|=0" = \{\omega\in \Omega:\limsup_{n\rightarrow\infty} |\overline{X_n}(\omega)|=0\}$$

But

$$"|\overline{X_n}|=0 \text{ i.o.}"=\limsup \{\omega\in \Omega: |\overline{X_n}(\omega)|=0\}$$

Do you see why these describe completely different sets in the probability space?

Last edited by a moderator:
No, it isn't. |X_n| = 0 i.o. means that for infinitely many n, |X_n|=0. This is very different from saying that limsup |X_n|=0.

You're apparently being confused by the fact that the planetmath definition uses the word limsup. But in the definition of i.o., the limsup operator is being applied to a sequence of events, not a sequence of values of random variables. This becomes clear if we write out the formal definition of the actual events being described. Assuming the r.v.s X_n are defined on the underlying probability space Ω, we have:

$$"\limsup_{n\rightarrow\infty} |\overline{X_n}|=0" = \{\omega\in \Omega:\limsup_{n\rightarrow\infty} |\overline{X_n}(\omega)|=0\}$$

But

$$"|\overline{X_n}|=0 \text{ i.o.}"=\limsup \{\omega\in \Omega: |\overline{X_n}(\omega)|=0\}$$

Do you see why these describe completely different sets in the probability space?
So the event $"\limsup_{n\rightarrow\infty} |\overline{X_n}|=0"$
is really the event $"Y=0"$, where $Y:=\limsup_{n\rightarrow\infty}|\bar{X}_n|$ is a random variable? And this is what is meant in the definition of almost sure convergence?

And $"Y=0"$ is not equal to $"\limsup_{n\rightarrow\infty}\{|\bar{X}_n |=0\}":="|\bar{X}_n | \text{ i.o.}"$, so that the Borel Cantelli Lemma can't be applied?

So the event $"\limsup_{n\rightarrow\infty} |\overline{X_n}|=0"$
is really the event $"Y=0"$, where $Y:=\limsup_{n\rightarrow\infty}|\bar{X}_n|$ is a random variable? And this is what is meant in the definition of almost sure convergence?
Yes, that's right.

And $"Y=0"$ is not equal to $"\limsup_{n\rightarrow\infty}\{|\bar{X}_n |=0\}":="|\bar{X}_n | \text{ i.o.}"$, so that the Borel Cantelli Lemma can't be applied?
You got it!

Stephen Tashi
Suppose that $X_i\sim N(0,1)$ are iid, then by the Strong Law of Large Numbers the sample mean $\bar{X}_n=\sum_{i=1}^n X_i$ converges almost surely to the mean, which in this case is 0.
$$\bar{X}_n = \frac{\sum_{i=1}^n X_i}{n}$$
Recall that by definition, a.s. convergence means that $P(\lim_{n\to\infty}|X_n-0|=0)=1$.
You need the bar:
$$| \bar{X}_n - 0 |$$

That's OK, provided you are using the notation $| \bar{X}_n - 0|$ to denote a set of outcomes in a probability space and "0" to denote a constant random variable. It would be clearer to write $P(\{\omega \in \Omega$ such that $\bar{X}_n(\omega) = 0 \}) = 1$ where $\Omega$ is the set of outcomes in the probability space.

An interesting technicality here is "What probablity space?". It can't be the probability space for a single normal random variable. One way to define the probability space is say that for each $n$, it must the the product probability space that defines the vector of outcomes of $n$ normal random variables. Another way is to say that the outcomes of the probability space are the set of real numbers, but the measure on the space involves the convolution of $n$ normal distributions. Either way, the probability space for the event $| \bar{X}_n - 0 ]$ is a different probability space for each $n$ since a probability space depends on the set of outcomes, the sigma field and the probability measure.

(It's also interesting that the definition hypothesizes that the event $\{ \omega \in \Omega: \bar{X}_n = 0 \}$ is a measureable set for each $n$, which could be proven in your example, but I wonder about the general case.)

The Borel-Cantelli lemma deals with a sequence of events whose members are all from the same probability space. In the strong law of large numbers, each event in the sequence of events $\{ \omega \in \Omega: \bar{X_n}(\omega) = 0 \}$ is in a different probability space.

To get to a paradox, I think you must try to "embed" all these events in the same space.

Stephen Tashi
On example 2 on page 2 of this link (http://math.arizona.edu/~jgemmer/bishopprobability3.pdf) it uses Borel-Canetelli for random variables in the same way I did, even though they've defined io for sets only.
That example says the $X_n$ are independent identically distributed random variables. So each $X_i$ is defined in the same probablity space and the measure in that space is the same measure.

In your example, the $\bar{X_i}$ are each defined in different probability spaces. In the statement of the theorems in that link, the $A_n$ can be different events but they are in the same probability space.

On example 2 on page 2 of this link (http://math.arizona.edu/~jgemmer/bishopprobability3.pdf) it uses Borel-Canetelli for random variables in the same way I did, even though they've defined io for sets only.
I really don't see how you came to that conclusion. The only place Borel-Cantelli is applied in that example is to the sequence of events $X_n > \alpha \log (n)$. It isn't being applied to the values of $\frac{X_n}{\log n}$

What about the following argument. Let $X_i$ be standard Cauchy. It's a well known fact about the Cauchy distribution that $\bar{X}_n$ has the same distribution as $X_1$, i.e. the sample mean of standard Cauchy is standard Cauchy, for any n.

Let $c \geq 0$ be arbitrary.

Now $P(\bar{X}_n > c) = P(X_1 > c)$ which is some nonzero constant. So $\sum_n P(\bar{X}_n > c) = \infty$, and by Borel-Cantelli, $P(\limsup\bar{X}_n > c)$=1. Since c is arbitrary, $\limsup\bar{X}_n=\infty$ (this was obvious because the Cauchy distribution has infinite mean).

But we can replace ">" with "<" and everything above (appears) to still work and we conclude that $P(\limsup\bar{X}_n < c)=1$, hence $\limsup\bar{X}_n=-\infty$. Where have I made a mistake?

Source: http://www-stat.wharton.upenn.edu/~steele/Courses/530/HomeWorkSolutions/STAT 530 HW3 Solutions.pdf (I think there's a typo in the 2nd equation in this link. The n on the denominator should be n^2).

Stephen Tashi
Apparently, you don't want to reply to any objections about different probability spaces, so let's put it this way. The $\bar{X}_i$ are not idependent random variables. (Is $\frac{ X_1 + X_2 + X_3}{3}$ going to be independent of $\frac{X_1 + X_2}{2}$?) And the $\bar{X_i}$ are not identically distributed. (Edit: OK, they are identically distributed in your example! I see now why you picked the Cauchy. But that doesn't overcome the objection that they are different non-independent random variables.)

Is there a version of the Borel - Cantelli lemma that applies to events $A_i$ where the probability of each event is computed by using a different random variable?

Last edited:
Apparently, you don't want to reply to any objections about different probability spaces, so let's put it this way. The $\bar{X}_i$ are not idependent random variables. (Is $\frac{ X_1 + X_2 + X_3}{3}$ going to be independent of $\frac{X_1 + X_2}{2}$?) And the $\bar{X_i}$ are not identically distributed. (Edit: OK, they are identically distributed in your example! I see now why you picked the Cauchy. But that doesn't overcome the objection that they are different non-independent random variables.)

Is there a version of the Borel - Cantelli lemma that applies to events $A_i$ where the probability of each event is computed by using a different random variable?
Borel-Cantelli doesn't require independence.

https://en.wikipedia.org/wiki/Borel–Cantelli_lemma The line above "Example".

And I don't think what you're saying is true, as it would seem to imply that Borel-Cantelli can't be used on sample means, when it's used to prove the Strong Law of Large Numbers and the strong convergence of the sample mean in inference. Why can't everything be on the same probability space?

Stephen Tashi
Borel-Cantelli doesn't require independence.

https://en.wikipedia.org/wiki/Borel–Cantelli_lemma The line above "Example".
That line refers to the events, not to what random variable is involved, but you are correct that my citing the non-independence of the random variables in your example is irrelevant. The point is that your example uses events defined by different random variables.

Let $X$ and $Y$ be random variables. Suppose they have the same probability density function. Are they "the same" random variable? - not necessarily. They would be the same if they both refer to the same experimental outcome, in which case they would always have the same realized value. But then a realization of their sample mean $X(\omega) + Y(\omega)$ would always equal $\frac{2 X(\omega)}{2} = X(\omega)$. In your example, $\bar{X}_2$ and $\bar{X}_3$ have the same distribution, but they are not the same random variable.

I think unravelling the paradox involves parsing the hypothesis of the lemma, which begins "Let $(E_n)$ be a sequence of events in a probability space....". If we define $E_i$ to be the event $\{\omega: \bar{X}_i > c \}$ we haven't defined an event that involves the outcome of $\bar{X}_{i+1}$, so in order to get all the events $E_n$ to be in the same probability space, you must specify what values of $\bar{X}_{i+1}$ are outcomes in $E_i$. You can say that $E_i$ is the event $\{\omega: \bar{X}_i(\omega) > c$ and for $k \ne i, \bar{X}_k$ can take any value $\}$. However then you must examine your assertion that $P(A_i)$ can be computed by a simple integration of the Cauchy distribution because if we consider only finitely many of the $\bar{X_n}$ this is not an integration involving independent random variables where the integrand is a product of factors, each being a function of only one of the variables. Furthemore, the probability space for the example would involve outcomes that must be infinite sequences of real numbers since an outcome must specify the value of each $\bar{X}_n$.

I don't think there is a paradox, as the argument using ">" is valid. It shows that in the link, and the conclusion that the limsup is infinite is definitely correct. Something is wrong with it when I change to "<".

Your argument doesn't make sense. Why can't we have a probability that contains all the $\bar{X}_n$'s, and have them be dependent or independent or whatever we want. Obviously such a space exists, as it is very simple to simulate either of the two cases on a computer.

Your argument seems to imply that we can't make any statement relating to convergence of sample means, like stating the central limit theorem.

Last edited:
Stephen, in all the problems we have been considering so far, the $X_i$ are specified to be independent identically distributed random variables. The independence assumption already tells us that they are defined on the same probability space, because otherwise it wouldn't make sense to talk about whether they are independent or not. The reason nobody has been replying to your statements about the r.v.s being defined on the same probability space is because they are addressing issues that simply are not present in this problem.

What about the following argument. Let $X_i$ be standard Cauchy. It's a well known fact about the Cauchy distribution that $\bar{X}_n$ has the same distribution as $X_1$, i.e. the sample mean of standard Cauchy is standard Cauchy, for any n.

Let $c \geq 0$ be arbitrary.

Now $P(\bar{X}_n > c) = P(X_1 > c)$ which is some nonzero constant. So $\sum_n P(\bar{X}_n > c) = \infty$, and by Borel-Cantelli, $P(\limsup\bar{X}_n > c)$=1. Since c is arbitrary, $\limsup\bar{X}_n=\infty$ (this was obvious because the Cauchy distribution has infinite mean).
This is not true. Steve's right here, you do need independence here. Remember that there are two forms of the Borel-Cantelli lemma. The first form is the one that says $\sum_{n=1}^{\infty} P(E_n) < \infty \Rightarrow P(E_n \text{ i.o.} = 0)$, which does not require independence. The second form, which is the one being invoked here, is the one that states $\sum_{n=1}^{\infty} P(E_n)=\infty \Rightarrow P(E_n \text{ i.o}) = 1$, and this form of the lemma does require independence. To see this, let U be a uniform 0-1 r.v. and let E_n be the event that $U\leq \frac{1}{n}$. Then $\sum_{n=1}^{\infty}P(E_n) = \sum_{n=1}^{\infty}\frac{1}{n} = \infty$ but $P(\limsup_{n\rightarrow \infty} E_n) = P(U=0) = 0$.

By the way, there is a mistake in that student's paper. The mean of the Cauchy distribution is undefined, not $\infty$.

But we can replace ">" with "<" and everything above (appears) to still work and we conclude that $P(\limsup\bar{X}_n < c)=1$, hence $\limsup\bar{X}_n=-\infty$. Where have I made a mistake?
Well, for one thing, you're still mixing up the limsup of a sequence of events and the limsup of a sequence of random variables. In the paper, the student concludes (correctly) that $P(\frac{|X_n|}{n} \geq c \text{ i.o.}) = 1$, and then states that since c was arbitrary, $P(\limsup_{n \rightarrow \infty} \frac{|X_n|}{n} = \infty)=1$, which is true. There is however, a missing step which the student did not write explicitly, but which is important. Namely, the inference that $P(\frac{|X_n|}{n} \geq c \text{ i.o.}) = 1 \Rightarrow P(\limsup_{n\rightarrow \infty} \frac{|X_n|}{n} \geq c) = 1$. You seem to think that $\frac{|X_n|}{n} \geq c \text{ i.o.}$ and $\limsup_{n\rightarrow \infty} \frac{|X_n|}{n} \geq c$ are the same event. They aren't. The event $\frac{|X_n|}{n} \geq c \text{ i.o.}$ is a subset of the event $\limsup_{n\rightarrow \infty} \frac{|X_n|}{n} \geq c$, so if the first one happens almost surely, then so does the second. But if the $X_n$ happen to take on the values $c-\frac{1}{n}$, then the second event occurs, but the first one does not. That point seems to be the source of all your confusion.

Stephen Tashi
Stephen, in all the problems we have been considering so far, the $X_i$ are specified to be independent identically distributed random variables. The independence assumption already tells us that they are defined on the same probability space, because otherwise it wouldn't make sense to talk about whether they are independent or not.
I agree the $X_i$ are in independent. If you want them "in" the same probability space then an outcome in that space must specify the values of each of the $X_i$.

But the $\bar{X}_i$ are not independent. The examples by logarithmic that purport to contradict the Bore-Cantelli lemma involve the $\bar{X}_i$.

Your objections to the examples by logarithmic may be valid but, in addition, a basic flaw in the examples is that the events $E_n$ (in the Wikipedia) or $A_n$ (in the link) are supposed to be in the same probability space by the hypothesis of the lemma. In logarithmic's examples, they are not defined as being in the same space. (If they are in the same space, what is that space? What exactly is an outcome in this space? If the space is $(\Omega, F, \mu)$ , what is an element of $\Omega$? It can't be specified as a single real number because that doesn't define the values of each of the $\bar{X}_i$.

For example, the probability space of a single random variable $X_1$ isn't a probability space for the $\bar{X}_2$ since an outcome such as $X_1(\omega) = 2.5$ doesn't specify a value for $X_2$, upon which $\bar{X}_2$ depends.

In general, if a sequence of i.i.d. random variables $X_1,X_2,\ldots$ is specified, then the underlying probability space is $\mathbb{R}^{\infty}$, with the $\Sigma$-algebra $\mathcal{F}$ being the $\Sigma$-algebra generated by all sets of the form $\prod_{n=1}^{N}A_n \times \prod_{n=N+1}^{\infty} \mathbb{R}$ (where the $A_n$ are Borel subsets of $\mathbb{R}$), and the probability measure being the unique probability measure such that $\mu \left( \prod_{n=1}^{N}A_n \times \prod_{n=N+1}^{\infty}\mathbb{R} \right) = \prod_{n=1}^{N}P(A_n)$, where $P(A_n)$ is computed with respect to the specified distribution of the $X_n$. The existence and uniqueness of such a measure is guaranteed by the Kolmogorov extension theorem. This is what you do implicitly every time you have a sequence of i.i.d. random variables. Note that the mean of the first N $X_n$ is indeed a measurable function with respect to this $\Sigma$-algebra.

But the $\bar{X}_i$ are not independent. The examples by logarithmic that purport to contradict the Bore-Cantelli lemma involve the $\bar{X}_i$.
Yes, that's correct. The paper he's quoting, however, talks about the $X_i$ themselves, and I want him to understand why what the paper is doing is (essentially, there are some minor mistakes, but they are fixable) valid, and what he's doing is not.

Stephen, in all the problems we have been considering so far, the $X_i$ are specified to be independent identically distributed random variables. The independence assumption already tells us that they are defined on the same probability space, because otherwise it wouldn't make sense to talk about whether they are independent or not. The reason nobody has been replying to your statements about the r.v.s being defined on the same probability space is because they are addressing issues that simply are not present in this problem.

This is not true. Steve's right here, you do need independence here. Remember that there are two forms of the Borel-Cantelli lemma. The first form is the one that says $\sum_{n=1}^{\infty} P(E_n) < \infty \Rightarrow P(E_n \text{ i.o.} = 0)$, which does not require independence. The second form, which is the one being invoked here, is the one that states $\sum_{n=1}^{\infty} P(E_n)=\infty \Rightarrow P(E_n \text{ i.o}) = 1$, and this form of the lemma does require independence. To see this, let U be a uniform 0-1 r.v. and let E_n be the event that $U\leq \frac{1}{n}$. Then $\sum_{n=1}^{\infty}P(E_n) = \sum_{n=1}^{\infty}\frac{1}{n} = \infty$ but $P(\limsup_{n\rightarrow \infty} E_n) = P(U=0) = 0$.

By the way, there is a mistake in that student's paper. The mean of the Cauchy distribution is undefined, not $\infty$.

Well, for one thing, you're still mixing up the limsup of a sequence of events and the limsup of a sequence of random variables. In the paper, the student concludes (correctly) that $P(\frac{|X_n|}{n} \geq c \text{ i.o.}) = 1$, and then states that since c was arbitrary, $P(\limsup_{n \rightarrow \infty} \frac{|X_n|}{n} = \infty)=1$, which is true. There is however, a missing step which the student did not write explicitly, but which is important. Namely, the inference that $P(\frac{|X_n|}{n} \geq c \text{ i.o.}) = 1 \Rightarrow P(\limsup_{n\rightarrow \infty} \frac{|X_n|}{n} \geq c) = 1$. You seem to think that $\frac{|X_n|}{n} \geq c \text{ i.o.}$ and $\limsup_{n\rightarrow \infty} \frac{|X_n|}{n} \geq c$ are the same event. They aren't. The event $\frac{|X_n|}{n} \geq c \text{ i.o.}$ is a subset of the event $\limsup_{n\rightarrow \infty} \frac{|X_n|}{n} \geq c$, so if the first one happens almost surely, then so does the second. But if the $X_n$ happen to take on the values $c-\frac{1}{n}$, then the second event occurs, but the first one does not. That point seems to be the source of all your confusion.
Thanks. How would we show that $P(\liminf \bar{X}_n < c) = 1$. I think this is $P(\bar{X}_n < c \text{ e.v.}) = 1 - P(\bar{X}_n > c \text{ i.o.}) = 1 - 1 = 0$, not 1.

Stephen Tashi
In general, if a sequence of i.i.d. random variables $X_1,X_2,\ldots$ is specified, then the underlying probability space is $\mathbb{R}^{\infty}$,...
I agree. And my point is that in the similar infinite dimensional probability space needed for the $\bar{X}_i$ , describing a set by a requirement like $\bar{X}_3 < 7.3$ fails to completely specify an event in that probability space unless some conventions are stated about what values the other $\bar{X}_i$ may take.

I agree. And my point is that in the similar infinite dimensional probability space needed for the $\bar{X}_i$ , describing a set by a requirement like $\bar{X}_3 < 7.3$ fails to completely specify an event in that probability space unless some conventions are stated about what values the other $\bar{X}_i$ may take.
By definition, $\overline{X}_3 = \frac{1}{3}(X_1 + X_2 + X_3)$. The set $\{ \omega : \frac{1}{3}(X_1(\omega) + X_2(\omega) + X_3(\omega)) < 7.3 \}$ is certainly a well-defined measurable set, seeing as how it is the preimage of a measurable set under the measurable function $\omega \mapsto \frac{1}{3}(X_1(\omega) + X_2(\omega) + X_3(\omega)$. So I'm having some difficulty understanding what you think the problem is here.

Thanks. How would we show that $P(\liminf \bar{X}_n < c) = 1$.
I think you mean $X_n$, not $\overline{X}_n$. Since the $\overline{X}_n$ are not independent, the relevant case of the Borel-Cantelli theorem doesn't apply, and I'm not sure whether it's true that$P(\liminf \overline{X}_n < c) = 1$. For the $X_n$ though, we note that by symmetry, $\sum_{n=1}^{\infty} P(X_n < c) = \sum_{n=1}^{\infty} P(X_n > -c) = \infty$, so by independence and Borel-Cantelli, we have $P(X_n < c \text{ i.o.})=1$. Of course if for infinitely many $n$, $X_n < c$, then we have $\liminf X_n < c$, so since the first event happens almost surely, so too does the second.

I think this is $P(\bar{X}_n < c \text{ e.v.}) = 1 - P(\bar{X}_n > c \text{ i.o.}) = 1 - 1 = 0$, not 1.
What do you mean by "e.v." in this context?

Stephen Tashi
The set $\{ \omega : \frac{1}{3}(X_1(\omega) + X_2(\omega) + X_3(\omega)) < 7.3 \}$ is certainly a well-defined measurable set,...
It's only a well defined set in $\mathbb{R}^\infty$ if we have the understanding that it defines an event where the other $\bar{X}_i$ can take on any values. I'm just trying to get confirmation that we have this understanding.
For sets S_n, $\{S_n e.v.\}:= \cup_{n\in \mathbb{N}}\cap_{m\geq n}S_n$.