Challenge 8: Sub-Gaussian Variables

In summary, sub-gaussian random variables have a small probability of being far from their mean, and the standard normal distribution has this property. Two approaches to proving this are by expanding the integrand in a power series and using properties of alternating series, or by taking the derivative and analyzing the behavior of the function.
  • #1
Office_Shredder
Staff Emeritus
Science Advisor
Gold Member
5,610
1,529
A random variable X is called sub-gaussian if
[tex] P( |X| > t) \leq 2e^{-K t^2} [/tex]
for some constant K. Equivalently, the probability density p(x) is sub-gaussian if
[tex] \int_{-t}^{t} p(x) dx \geq 1 - 2 e^{-Kt^2}. [/tex]

The challenge: Prove that the standard normal distribution (with mean 0 and variance 1) is sub-gaussian.
 
Mathematics news on Phys.org
  • #2
I haven't found a solution yet, but here's a method that might be useful.

The standard normal distribution has the following probability density function:
$$p(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}$$
Therefore we wish to prove that there exists some ##K## such that
$$\int_{-\infty}^{\infty} r_t(x) p(x) dx \geq 1 - 2e^{-Kt^2}$$
where
$$r_t(x) = \begin{cases}
1 & \text{ if }|x| < t \\
0 & \text{ if }|x| > t\\
\end{cases}$$
Note that in general, the integral of a function is equal to its Fourier transform evaluated at zero:
$$\int_{-\infty}^{\infty}f(x) dx = \left. \int_{-\infty}^{\infty} f(x) e^{-2\pi i u x} dx \right|_{u = 0} = \hat{f}(0)$$
where ##\hat{f}## is the Fourier transform of ##f##:
$$\hat{f}(u) = \int_{-\infty}^{\infty}f(x) e^{-2\pi i u x} dx$$
We may therefore express our desired integral (let's call it ##I##) in terms of Fourier transforms:
$$I = \int_{-\infty}^{\infty} r_t(x) p(x) dx = \widehat{r_t p}(0)$$
where ##\widehat{r_t p}## is the Fourier transform of the product ##r_t p##. It's easy to show that the Fourier transform of a product is equal to the convolution of the Fourier transforms of the factors:
$$\widehat{r_t p}(u) = \int_{-\infty}^{\infty} \widehat{r_t}(u-v)\widehat{p}(v) dv$$
Evaluating at ##u = 0##, we get
$$I = \widehat{r_t p}(0) = \int_{-\infty}^{\infty} \widehat{r_t}(-v) \widehat{p}(v) dv$$
Let us now compute ##\widehat{r_t}## and ##\widehat{p}##. Starting with the definition of the Fourier transform, we obtain
$$\begin{align}
\widehat{r_t}(v) &= \int_{-\infty}^{\infty} r_t(x) e^{-2\pi i v x} dx \\
&= \int_{-t}^{t} e^{-2\pi i v x} dx \\
&= \int_{-t}^{t} \cos(2\pi v x) dx \\
&= 2 \int_{0}^{t} \cos(2\pi v x) dx \\
&= \left. 2 \frac{\sin(2\pi v x)}{2\pi v} \right|_{x = 0}^{x = t} \\
&= 2 \frac{\sin(2 \pi v t)}{2\pi v} \\
&= \frac{\sin(2\pi v t)}{\pi v}\\
\end{align}$$
Note that the result is an even function of ##v##, i.e., ##\widehat{r_t}(-v) = \widehat{r_t}(v)##, so we can simplify our expression for ##I## to
$$I = \int_{-\infty}^{\infty} \widehat{r_t}(v) \widehat{p}(v) dv$$
Next we calculate ##\hat{p}##, again starting from the definition. We have
$$\begin{align}
\hat{p}(v) &= \int_{-\infty}^{\infty} p(x) e^{-2 \pi i v x} dx \\
&= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-x^2/2}e^{-2\pi i v x} dx\\
&= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} \exp\left(-\frac{1}{2} (x^2 + 4\pi i v x)\right)dx \\
&= \exp\left(-\frac{1}{2} 4 \pi^2 v^2\right) \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} \exp\left(-\frac{1}{2} (x + 2\pi i v)^2\right) dx \\
&= \exp\left(-\frac{1}{2} 4 \pi^2 v^2\right) \frac{1}{\sqrt{2\pi}} \cdot \sqrt{2\pi} \\
&= \exp(-2\pi^2 v^2)
\end{align}$$
Plugging the expressions for ##\widehat{r_t}## and ##\hat{p}## into our integral, we obtain
$$I = \int_{-\infty}^{\infty}\widehat{r_t}(v) \widehat{p}(v) dv =
\int_{-\infty}^{\infty} \frac{\sin(2\pi v t)}{\pi v} \exp(-2\pi^2 v^2) dv$$
Now if I can find the right lower bound for the integrand as a function of ##v## and ##t##, the result might follow. Or I could be barking up the wrong tree.

Intuitively, as ##t## increases, ##\sin(2\pi v t)/(\pi v)## becomes a taller and skinnier function centered at ##v = 0##, although with tails that decay on the order of ##1/v##, so the oscillations are important! Most of the value of ##I## is hopefully determined by the part spanned by the "main peak" of ##\sin(2\pi v t)/(\pi v)##, which is the part for which ##|v| < 1/(2t)##.

I think it's true that
$$I > \int_{-1/t}^{1/t} \frac{\sin(2\pi v t)}{\pi v} \exp(-2\pi^2 v^2) dv$$
because outside the interval ##[-1/t, 1/t]## you could approximate the integrand as an alternating series whose first term is positive.
 
Last edited:
  • #3
Office_Shredder said:
A random variable X is called sub-gaussian if
[tex] P( |X| > t) \leq 2e^{-K t^2} [/tex]
for some constant K. Equivalently, the probability density p(x) is sub-gaussian if
[tex] \int_{-t}^{t} p(x) dx \geq 1 - 2 e^{-Kt^2}. [/tex]

The challenge: Prove that the standard normal distribution (with mean 0 and variance 1) is sub-gaussian.
Let's try this.

We wish to prove that ##\displaystyle 1\leq\frac{\sqrt{2}}{\sqrt{\pi}}\int\limits_{[0,t]}e^{-\frac{x^2}{2}}\mathrm{d}x+2e^{-kt^2}=f(t)##.

Attempt 1:
Expanding by power series, we get $$\frac{\sqrt{2}}{\sqrt{\pi}}\int\limits_{[0,t]}e^{-\frac{x^2}{2}}\mathrm{d}x=\frac{\sqrt{2}}{\sqrt{\pi}}\sum_{0\leq n}\frac{(-1)^n t^{2n+1}}{2^n(2n+1)n!}.$$

For ##2e^{-kt^2}##, we expand and get ##\displaystyle 2e^{-kt^2}=2\sum_{m\geq 0}\frac{(-1)^mk^mt^{2m}}{m!}##.

Let ##k=\frac{1}{2}## for ease of notation. After a myriad of simplifications and a couple series manipulations which I am not going to LaTeX out, we just need to prove $$1\leq 2\sum_{n\geq 0}(-1)^n\frac{t^{2n}(t+\sqrt{2\pi}(2n+1))}{\sqrt{2\pi}2^n(2n+1)n!}.$$

By properties of alternating series, I'm pretty sure that one can show the RHS is always greater than 1.

Attempt 2:
Note that $$\frac{d}{dt}\left(\frac{\sqrt{2}}{\sqrt{\pi}}\int\limits_{[0,t]}e^{-\frac{x^2}{2}}\mathrm{d}x+2e^{-kt^2}\right)=\sqrt{\frac{2}{\pi}}e^{-\frac{1}{2}t^2}-4kte^{-kt^2}=e^{-\frac{1}{2}t^2}(\sqrt{\frac{2}{\pi}}-2t).$$ Letting ##k=1/2## again, for convenience. Then, we note that ##max(f)=f\left(\sqrt{\frac{1}{2\pi}}\right)>1##. Since ##f## is [strike]increasing[/strike] (clearly decreasing with an asymptote at 1) for all ##t>\sqrt{\frac{1}{2\pi}}## and ##f(t)>1## for all ##0\leq t\leq \sqrt{\frac{1}{2\pi}}##, the result follows.

Not sure if either of these is right. They were pretty ugly.

Edit: Ignore the second part. I clearly did something horribly wrong with the derivatives. >_>
 
Last edited:
  • #4
I guess we just consider t≥0?
For t=0, it is trivial (left side 1/2, right side 1), consider t>0 now.

We want to show
$$\int_{-t}^{t} p(x) dx \geq 1 - 2 e^{-Kt^2}$$
As the normal distribution is symmetric, this can be written as
$$1-2\int_{t}^{\infty} p(x) dx \geq 1 - 2 e^{-Kt^2}$$
or
$$\int_{t}^{\infty} p(x) dx \leq e^{-Kt^2}$$

As a plot and with the example K=1/2, this means the red line has to be above the blue line in the WolframAlpha plot
(if the url does not work, the query: "plot int(1/sqrt(2pi)*exp(-y^2/2), y from x to infinity), e^(-x^2/2)")

Let's find an upper bound for the integral. As the integrand is monotonically decreasing,
$$\int_{t}^{\infty} p(x) dx \\
\leq \sum_{i=0}^\infty p(t+i) \\
= \frac{1}{\sqrt{2\pi}} \sum_{i=0}^\infty \exp\left(\frac{-(t+i)^2}{2}\right) \\
< \frac{1}{\sqrt{2\pi}} \sum_{i=0}^\infty \exp\left(\frac{-t^2-2ti}{2}\right) \\
= \frac{1}{\sqrt{2\pi}} \exp(-t^2/2) \sum_{i=0}^\infty \left(e^{-t}\right)^i
$$
As t>0, exp(-t)<1 so we can evaluate the sum.
$$ = \frac{1}{\sqrt{2\pi}} \exp(-t^2/2) \frac{1}{1-e^{-t}}$$

Let's fix ##K=\frac{1}{2}##. We have to show that
$$\frac{1}{\sqrt{2\pi}} \exp(-t^2/2) \frac{1}{1-e^{-t}} \leq e^{-t^2/2}$$
This is satisfied if
$$\sqrt{2 \pi} (1-e^{-t}) \geq 1$$
where equality gives approximately t ≈ 0.51. Therefore, the inequality is satisfied for t≥1.

It is left to show the inequality for 0<t<1. In this interval, the LHS is ≤ 0.5, but ##e^{-t^2/2} > e^{-1/2} \approx 0.607 > 0.5.

As a result, with K=1/2 the inequality is satisfied everywhere and the standard normal distribution is sub-gaussian.
 
  • #5
Mandelbroth said:
Let's try this.

We wish to prove that ##\displaystyle 1\leq\frac{\sqrt{2}}{\sqrt{\pi}}\int\limits_{[0,t]}e^{-\frac{x^2}{2}}\mathrm{d}x+2e^{-kt^2}=f(t)##.

Attempt 1:
Expanding by power series, we get $$\frac{\sqrt{2}}{\sqrt{\pi}}\int\limits_{[0,t]}e^{-\frac{x^2}{2}}\mathrm{d}x=\frac{\sqrt{2}}{\sqrt{\pi}}\sum_{0\leq n}\frac{(-1)^n t^{2n+1}}{2^n(2n+1)n!}.$$

For ##2e^{-kt^2}##, we expand and get ##\displaystyle 2e^{-kt^2}=2\sum_{m\geq 0}\frac{(-1)^mk^mt^{2m}}{m!}##.

Let ##k=\frac{1}{2}## for ease of notation. After a myriad of simplifications and a couple series manipulations which I am not going to LaTeX out, we just need to prove $$1\leq 2\sum_{n\geq 0}(-1)^n\frac{t^{2n}(t+\sqrt{2\pi}(2n+1))}{\sqrt{2\pi}2^n(2n+1)n!}.$$

By properties of alternating series, I'm pretty sure that one can show the RHS is always greater than 1.

That seems like it would be hard - if t is sufficiently large the first bunch of terms will be increasing in magnitude (as they will have higher powers of t).

mfb, that's a really nice solution, I like the idea of bounding the integral with a Riemann sum. k=1/2 is actually the best bound that I know of, but your proof is the first one that makes me think that it is genuinely tight as opposed to just convenient (since the sum should be a really good estimate if t is large).

Any alternate proofs out there people can come up with?
 
  • #6
Office_Shredder said:
k=1/2 is actually the best bound that I know of, but your proof is the first one that makes me think that it is genuinely tight as opposed to just convenient (since the sum should be a really good estimate if t is large).
Proof:

$$\int_{t}^{\infty} p(x) dx \\
{\bf \geq} \sum_{i=1}^\infty p(t+i) \\
= \frac{1}{\sqrt{2\pi}} \sum_{i=1}^\infty \exp\left(\frac{-(t+i)^2}{2}\right) \\
> \frac{1}{\sqrt{2\pi}} \exp(-\frac{(t+1)^2}{2})$$
Where the last expression is simply the first term of the sum.

Let's look at the inequality:
$$\frac{1}{\sqrt{2\pi}} \exp(-\frac{(t+1)^2}{2}) < e^{-K t^2}$$
This can be solved for K:
$$c+(t+1)^2 > 2Kt^2$$
Where ##c=\log(2 \pi)## is constant.
$$K< \frac{c+(t+1)^2}{2t^2} = \frac{1}{2} + \frac{1}{t} + \frac{1+c}{2t^2}$$
K cannot be larger than 1/2, or the inequality gets violated at some point.
 
Last edited:
  • #7
mfb said:
Proof:

$$\int_{t}^{\infty} p(x) dx \\
{\bf \geq} \sum_{i=1}^\infty p(t+i) \\
= \frac{1}{\sqrt{2\pi}} \sum_{i=1}^\infty \exp\left(\frac{-(t+i)^2}{2}\right) \\
> \frac{1}{\sqrt{2\pi}} \exp(-\frac{(t+1)^2}{2})$$
Where the last expression is simply the first term of the sum.

Let's look at the inequality:
$$\frac{1}{\sqrt{2\pi}} \exp(-\frac{(t+1)^2}{2}) < e^{-K t^2}$$
This can be solved for K:
$$c+(t+1)^2 > 2Kt^2$$
Where ##c=\log(2 \pi)## is constant.
$$K< \frac{c+(t+1)^2}{2t^2} = \frac{1}{2} + \frac{1}{t} + \frac{1+c}{2t^2}$$
K cannot be larger than 1/2, or the inequality gets violated at some point.
That's...awesome. In the series method I used, ##k=\frac{1}{2}## was actually what I considered an aesthetic choice. I chose it because it made the computations simple and lots of things canceled to give something I knew would converge. That's pretty fantastic, when you think about how all of it connects. :approve:
 
  • #8
Mandelbroth said:
we just need to prove $$1\leq 2\sum_{n\geq 0}(-1)^n\frac{t^{2n}(t+\sqrt{2\pi}(2n+1))}{\sqrt{2\pi}2^n(2n+1)n!}.$$
One part gives a nice sum, the other part gives the error function:

$$ \sum_{n\geq 0}(-1)^n\frac{t^{2n}(t+\sqrt{2\pi}(2n+1))}{\sqrt{2\pi}2^n(2n+1)n!} \\
= \sum_{n\geq 0}(-1)^n\frac{t^{2n}(\sqrt{2\pi}(2n+1))}{\sqrt{2\pi}2^n(2n+1)n!} + \sum_{n\geq 0}(-1)^n\frac{t^{2n}t}{\sqrt{2\pi}2^n(2n+1)n!} \\
= \sum_{n\geq 0}(-1)^n\frac{t^{2n}}{2^nn!} + \frac{1}{\sqrt{2\pi}}\sum_{n\geq 0}(-1)^n\frac{t^{2n+1}}{2^n(2n+1)n!} \\
= \sum_{n\geq 0}\frac{(-t^2/2)^n}{n!} + \frac{1}{\sqrt{2\pi}}\sum_{n\geq 0}(-1)^n\frac{t^{2n+1}}{2^n(2n+1)n!} \\
= \exp\left(-\frac{t^2}{2}\right) + \frac{1}{\sqrt{2\pi}}\sum_{n\geq 0}(-1)^n\frac{t^{2n+1}}{2^n(2n+1)n!}$$
Let's focus on the second part:
$$\frac{1}{\sqrt{2\pi}}\sum_{n\geq 0}(-1)^n\frac{t^{2n+1}}{2^n(2n+1)n!} \\
= \int dt \frac{d}{dt} \frac{1}{\sqrt{2\pi}}\sum_{n\geq 0}(-1)^n\frac{t^{2n+1}}{2^n(2n+1)n!} \\
= \frac{1}{\sqrt{2\pi}} \int dt \sum_{n\geq 0}(-1)^n\frac{t^{2n}}{2^n n!} \\
= \frac{1}{\sqrt{2\pi}} \int dt \exp\left(\frac{-t^2}{2}\right)$$
Now we have to get the integration constant right: For t=0, the expression is 0.
$$ = \frac{1}{\sqrt{2\pi}} \int_0^t dt' \exp\left(\frac{-t'^2}{2}\right)$$
So now we are comparing a gaussian with the error function again. I think this went in a circle...

We have to prove that$$1 \leq 2 \exp\left(-\frac{t^2}{2}\right) + \frac{2}{\sqrt{2\pi}} \int_0^t dt' \exp\left(\frac{-t'^2}{2}\right)$$
Which is equivalent to (using the symmetry of the integral)
$$1 - 2 \exp\left(-\frac{t^2}{2}\right) \leq \frac{1}{\sqrt{2\pi}} \int_{-t}^t dt' \exp\left(\frac{-t'^2}{2}\right)$$
And that is exactly the initial problem statement. Okay, bad direction. Well now we have that in TeX.
 
Last edited:
  • #9
mfb, you literally just reversed the steps that he took to arrive at that :p
 
  • #10
mfb said:
One part gives a nice sum, the other part gives the error function:

$$ \sum_{n\geq 0}(-1)^n\frac{t^{2n}(t+\sqrt{2\pi}(2n+1))}{\sqrt{2\pi}2^n(2n+1)n!} \\
= \sum_{n\geq 0}(-1)^n\frac{t^{2n}(\sqrt{2\pi}(2n+1))}{\sqrt{2\pi}2^n(2n+1)n!} + \sum_{n\geq 0}(-1)^n\frac{t^{2n}t}{\sqrt{2\pi}2^n(2n+1)n!} \\
= \sum_{n\geq 0}(-1)^n\frac{t^{2n}}{2^nn!} + \frac{1}{\sqrt{2\pi}}\sum_{n\geq 0}(-1)^n\frac{t^{2n+1}}{2^n(2n+1)n!} \\
= \sum_{n\geq 0}\frac{(-t^2/2)^n}{n!} + \frac{1}{\sqrt{2\pi}}\sum_{n\geq 0}(-1)^n\frac{t^{2n+1}}{2^n(2n+1)n!} \\
= \exp\left(-\frac{t^2}{2}\right) + \frac{1}{\sqrt{2\pi}}\sum_{n\geq 0}(-1)^n\frac{t^{2n+1}}{2^n(2n+1)n!}$$
Let's focus on the second part:
$$\frac{1}{\sqrt{2\pi}}\sum_{n\geq 0}(-1)^n\frac{t^{2n+1}}{2^n(2n+1)n!} \\
= \int dt \frac{d}{dt} \frac{1}{\sqrt{2\pi}}\sum_{n\geq 0}(-1)^n\frac{t^{2n+1}}{2^n(2n+1)n!} \\
= \frac{1}{\sqrt{2\pi}} \int dt \sum_{n\geq 0}(-1)^n\frac{t^{2n}}{2^n n!} \\
= \frac{1}{\sqrt{2\pi}} \int dt \exp\left(\frac{-t^2}{2}\right)$$
Now we have to get the integration constant right: For t=0, the expression is 0.
$$ = \frac{1}{\sqrt{2\pi}} \int_0^t dt' \exp\left(\frac{-t'^2}{2}\right)$$
So now we are comparing a gaussian with the error function again. I think this went in a circle...

We have to prove that$$1 \leq 2 \exp\left(-\frac{t^2}{2}\right) + \frac{2}{\sqrt{2\pi}} \int_0^t dt' \exp\left(\frac{-t'^2}{2}\right)$$
Which is equivalent to (using the symmetry of the integral)
$$1 - 2 \exp\left(-\frac{t^2}{2}\right) \leq \frac{1}{\sqrt{2\pi}} \int_{-t}^t dt' \exp\left(\frac{-t'^2}{2}\right)$$
And that is exactly the initial problem statement. Okay, bad direction. Well now we have that in TeX.
I'm literally crying with laughter right now. That was hilarious. :rofl::rofl::rofl:
 

1. What are sub-Gaussian variables?

Sub-Gaussian variables are a type of random variable that have a probability distribution with a tail that decays faster than an exponential distribution. This means that they are less likely to take on extreme values compared to other types of random variables.

2. What is the significance of sub-Gaussian variables in statistics?

Sub-Gaussian variables have important applications in statistical analysis, particularly in the study of random processes and their properties. They are used to model a wide range of phenomena, from stock market fluctuations to the behavior of particles in a gas.

3. How are sub-Gaussian variables different from sub-exponential variables?

Sub-Gaussian variables have a tail that decays faster than an exponential distribution, while sub-exponential variables have a tail that decays slower than an exponential distribution. This means that sub-Gaussian variables are less likely to take on extreme values compared to sub-exponential variables.

4. Can sub-Gaussian variables be used to model real-world data?

Yes, sub-Gaussian variables can be used to model real-world data. In fact, many natural phenomena can be described using sub-Gaussian variables, including the behavior of financial markets, the distribution of particle sizes in a gas, and the fluctuations of radio signals in communication systems.

5. How can sub-Gaussian variables be identified and analyzed in a dataset?

Sub-Gaussian variables can be identified and analyzed using statistical methods such as moment analysis, tail bounds, and concentration inequalities. These methods allow researchers to estimate the parameters of a sub-Gaussian distribution and make predictions about the behavior of the data.

Similar threads

Replies
66
Views
4K
Replies
1
Views
745
Replies
2
Views
239
  • Classical Physics
Replies
0
Views
128
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
Replies
70
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
0
Views
1K
  • Linear and Abstract Algebra
Replies
1
Views
1K
  • General Math
Replies
2
Views
2K
  • Math Proof Training and Practice
3
Replies
100
Views
7K
Back
Top