# Convergence of random variables

1. Nov 24, 2009

### kingwinner

I was reading some proofs about the convergence of random variables, and here are the little bits that I couldn't figure out...

1) Let Xn be a sequence of random variables, and let Xnk be a subsequence of it. If Xn conveges in probability to X, then Xnk also conveges in probability to X. WHY?

2) I was looking at a theorem: if E(Y)<∞, then Y<∞ almost surely. Now I am puzzled by the notation. What does it MEAN to say that Y=∞ or Y<∞?
For example, if Y is a Poisson random variable, then the possible values are 0,1,2,..., (there is no upper bound). Is it true to say that Y=∞ in this case?

3) If Xn4 converges to 0 almost surely, then is it true to say that Xn also converges to 0 almost surely? Why or why not?

4) The moment generating function(mgf) determines the distribution uniquely, so we can use mgf to find the distributions of random varibles. If the mgf already does the job, what is the point of introducing the "characteristic function"?

Any help is much appreciated! :)

2. Nov 24, 2009

### grief

I can answer the first one. Xn converges to X by definition if for all epsilon > 0,
Pr(|Xn-X|>epsilon) converges to 0. Suppose Xn converges to X in probability. Let Xnk be a subsequence. Then for any epsilon>0, Pr(|Xnk-X|>epsilon) is a subsequence of Pr(|Xn-X|>epsilon) (these are sequences of numbers). Since we know that a subsequence of a convergent sequence of numbers converges to the limit of the original sequence, it follows that Pr(|Xnk-X|>epsilon) converges to 0. So Xnk converges in probability to X.

3. Nov 24, 2009

### bpet

1) This would be a generalization of convergence of subsequences of real numbers.

2) An example of this would be first exit times - consider a process that has a finite probability of never exiting (e.g. fly in a jar), so the first exit time can be infinite.

3) Not sure

4) No - mgf is not unique (e.g. lognormal distribution) and doesn't necessary exist (e.g. Pareto). The c.f. is useful because it always exists on the real axis (if the r.v. is a.s. finite) and acts like a Fourier transform.

Hope this helps

4. Nov 24, 2009

### kingwinner

Thank you for the replies.

2) I don't get it. The theorem is talking about this: "if E(Y)<∞, then Y<∞ almost surely", but I don't even know what Y<∞ means...:(
For a Poisson random variable Y, the possible values are 0,1,2,..., and there is NO upper bound, so Y=∞ is possible? (same for exponential random variable, there is no upper bound.)
For a binomial random variable X, the possible values are 0,1,2,...,n, there is a upper bound, so Y<∞?
I am really confused. Can someone please explain more on this? What does it mean to say that Y<∞? (or Y=∞?)

4) So you mean the characterisitic function c(t) always exists for ALL real numbers t, is that right?
Also, for example, if we are asked to prove that the sum of 2 indepndent normal r.v.'s is again normal, then I think the proof using mgf is perfectly fine, but I see my textbook using characteristic function for this, is it absolutely necessary to use characteristic function in a proof like this?

5. Nov 25, 2009

"Also, for example, if we are asked to prove that the sum of 2 indepndent normal r.v.'s is again normal, then I think the proof using mgf is perfectly fine, but I see my textbook using characteristic function for this, is it absolutely necessary to use characteristic function in a proof like this?"

No, it isn't necessary.

Every probability distribution has a characteristic function, and that function is unique - it determines the distribution.

In order for a distribution to have a moment generating function, every moment has to exist - that is, you must have

$$\int x^n \,dF(x) < \infty$$

for all n. This isn't always true - consider

$$f(x) = \frac 1 {\pi (1+x^2)}$$

which doesn't even have a mean.

If a distribution's moments identify the distribution exactly (say they satisfy Carleman's conditions) then the moment generating function is unique and identifies the distribution.

I'm guessing (and it's only a guess, since I don't know which probability text you're using) that the author(s) use the characteristic function approach to show the sum of two independent normals is normal because it is a relatively easy example to use to demonstrate the general procedure.

6. Nov 25, 2009

### kingwinner

4) So while the moment generating function does not always exist in a neighborhood of 0, the "characterisitic function" ALWAYS exists for ALL real numbers t, is this right? (so that it is more general?)

2) Can you also explain the meaning of "Y<∞", please?
Is this about the difference of binomial random variables (which has an upper bound on the possible values), and Poisson (or exponential) random variables (which has no upper bound on the possible values)?
So that for binomial random variables Y, we can say that Y<∞, while for Poisson (or exponential) random variables X, we cannot say that X<∞?

Your help is much appreciated! :)

7. Nov 25, 2009

### Hurkyl

Staff Emeritus
It is often more convenient to do calculus using the extended real numbers rather than the real numbers. The extended real numbers contain two extra points, called $+\infty$ and $-\infty$.

Every infinite sum of nonnegative extended real numbers is convergent. For example:
$$1 + 1 + 1 + \cdots = +\infty$$​
A similar statement is true for definite integrals.

8. Nov 25, 2009

\begin{align*} \phi_X(t) & = \int_{\mathcal{R}} e^{tx} \, dF(x) \\ & = \int_{\mathcal{R}} \sum_{n=0}^\infty \frac{(tx)^n}{n!} \, dF(x) \end{align*}
If the distribution does not have moments of all orders, eventually an integral involving $$x^n$$ will diverge, and so the mgf does not exist.
$$|\psi_X(t)| = \left|\int_{\mathcal{R}} e^{itx} \, dF(x)\right| \le \int_{\mathcal{R}} |e^{itx}| \, dF(x) = \int_{\mathcal{R}} dF(x) = 1$$