Math to compute width of gaussian PDF(x) given sigma & N values of random variable x

1. Jun 17, 2011

tim8691

Hi Experts,

I'm working in industry and have an application requiring some expert knowledge on statistics/probability. I have a probability distribution function (PDF) for a Gaussian random variable. I know the standard deviation of the PDF. I also know total number of experiments conducted, where one experiment is one value of the random variable, x.

For example, the standard deviation in my application is 1 ps RMS (e.g. 1 ps = 1E-12 seconds). The number of measured values for my random variable is 600E+9 (e.g. 600E+9 individual values of x; I don't have the individual values, but I know 600E+9 of them were measured).

From this information, I need to predict the largest peak-to-peak deviation (e.g. the width of the PDF, from the end of one tail to the end of the other tail) that may be observed (as a function of a given confidence level, or confidence interval, not sure what's the right terminology here; I believe this level/interval is needed to define the goal, correct me if not).

Can anyone help me understand the equations involved? I know Gaussian PDF for random variable x is

PDF(x) = (1/sigma*sqrt(2*pi))*e^(x*x/(2*sigma^2))

Not sure how to quantify the largest peak-peak deviation expected based on number o acquired samples N and sigma. Thanks in advance. -Tim

2. Jun 17, 2011

Stephen Tashi

Re: math to compute width of gaussian PDF(x) given sigma & N values of random variabl

I can only help with the terminology. For N independent samples the smallest value $X_{(1)}$ is calle the first "order statistic". The largest value $X_{)N)}$ would be the Nth order statistic. From some web searching, the value $X_{(N)} - X_{(1)}$ is often called the "sample range".

Do you care about the two endpoints that bound the values in the sample or only about the distance between these endpoints?

For N samples $X_i$ from a normal distribution $N(\mu,\sigma)$ and for a given probability p (such as p =0.95) determine the interval $(mu - R, mu + R)$ so that there is a probability p that the interval $(X_{(1)}, X_{(N)})$ is contained in $(\mu - R, \mu + R)$.

I suppose the interval $(\mu - R, \mu + R)$ can be called a "prediction interval. It seems the man-in-the-street calls any interval associated with the idea of probability a "confidence interval", but confidence intervals are a more complicated concept than prediction intervals. The term "prediction interval" is usually applied to an interval that tries to contain a single value of a random variable rather than two values of it.

3. Jun 17, 2011

tim8691

Re: math to compute width of gaussian PDF(x) given sigma & N values of random variabl

Hi Stephen,

Thanks for your comments. Using your terminology, I want to compute the Nth order statistic multiplied by 2. That is, if we conduct N experiments of random variable x having normal distribution (mean=0), what's the largest value of x we'll observe (assuming some confidence level, or probability, of 95%, for example)? This "largest value" would be the "Nth order statistic". Since the distribution (e.g. PDF(x)) is symmetric, the width, or separation, between the Nth order statistic on the right tail and the Nth order statistic on the left tail equals two times the Nth order statistic. This "two times the Nth order statistic" is what I call the peak-to-peak value for my random variable x.

Basically, if we view the PDF(x) as a histogram, I need to compute the width of this histogram. As the number of experiments grows (e.g. N increases), we'd expect the width to increase. So, for a given N, how to compute the width?

Any feedback much appreciated -- thanks so much.

Last edited: Jun 17, 2011
4. Jun 17, 2011

Stephen Tashi

Re: math to compute width of gaussian PDF(x) given sigma & N values of random variabl

Strictly speaking, that doesn't make any sense. We'd have to define what "the Nth order statistic on the left tail" means. You seem to be thinking of the values farthest to the left as some kind of Nth order statistic, but these are the smaller values, not the larger ones. You also seem to be thinking that the Nth order statistic must always be greater than the mean. In your problem, you have so many samples, it probably will be. But there is nothing in definition of the Nth order statistic that says is must be.

See if you like this formulation:

For a random variable X with a given normal distribution N(mu,sigma), find an interval (mu-R,mu+R) such that there is a 95% chance that when a given number n independent samples of X are taken that all of them will lie in (mu-R,mu+R).

I think that amounts to what I said in the earlier post.

5. Jun 17, 2011

tim8691

Re: math to compute width of gaussian PDF(x) given sigma & N values of random variabl

Hi Stephen,

I think you're right. Let's set mu to 0 here (e.g. the mean of the distribution is 0).

For a random variable X with a given normal distribution N(mu=0,sigma), find an interval (-R,R) such that there is a 95% chance that when a given number n independent samples of X are taken that all of them will lie in (-R,R).

Any idea how to compute R?

Thanks again.

6. Jun 17, 2011

Stephen Tashi

Re: math to compute width of gaussian PDF(x) given sigma & N values of random variabl

I have an idea, but it may be too simplistic.

For a given value m, you can find the probability that a single outcome from a normal normal distribution N(0,sigma) falls within (-m, m). That's a standard type of statistical problem. Let's call value of that probability p(m). If you want the probability that N independent samples fall with (-m,m), the answer would be the Nth power of p(m).
So, to get a 0.95 probability, we are trying to solve for m, in the equation:
$$(p(m))^N = 0.95$$
For large N, I think it would be best to use logarithms, so we are trying to solve:
$$N \log {p(m)} = \log{ 0.95 }$$
$$\log p(m) = \frac{ \log{0.95}}{N}$$

I don't think you can solve this symbolically, you probably have to do it by a numerical method such as "bisection". I also don't think that you can use the typical printed tables of the normal distribution in statistics texts because they only have a few digits of precision.

7. Jun 17, 2011

tim8691

Re: math to compute width of gaussian PDF(x) given sigma & N values of random variabl

Thanks Stephen,

(p(m))N=0.95

This equation is along the right track, thanks!. But, I find this places me working with numbers very close to 1. This is a problem because my values of N are very large, and there's only 16 digits in a computer's representation of a double-precision number.

Is there a way, keeping the same definition of the problem above, to change that equation above around so we're trying to solve for a number very close to zero instead? By doing this we not only have the 16 digits but an exponent as well, e.g. times 10^y, which greatly improves accuracy over working with a number close to 1.

Let me try in words...

For a random variable X with a given normal distribution N(mu=0,sigma), find an interval (-R,R) such that there is a FIVE % chance that when n independent samples of X are taken that AT LEAST ONE of them will lie outside of (-R,R).

Maybe that's not the right way of wording it to be equivalent, but hoping at least you can see what I'm trying to do.

8. Jun 17, 2011

Stephen Tashi

Re: math to compute width of gaussian PDF(x) given sigma & N values of random variabl

The problem of doing the numerical calculations is what I had in mind when I said my answer might be too simplistic. There is the problem of finding p(m) and then the problem of finding m once you do that.

The way that you formulated the "complementary" problem is correct. I don't know that it is easier to solve. The expression for the probability of at least one value falling outside is going to be a sum of terms involving large factorials. Sterlings formula might be used to approximate them. I'm not sure.

It's similar to the problem encountered in writing
$$p(m) = 1 - \delta$$
$$(1 - \delta)^N = 0.95$$
and then beginning to expand $(1-\delta)^N$ by the binomial theorem using the lower powers of $\delta$.

I think the problem can be solved but I'll have to think more about it. Also we can start another thread that may attract the attention of numerical analysts who don't read the threads about probability. Right now, I must go do some yard work.

9. Jun 17, 2011

tim8691

Re: math to compute width of gaussian PDF(x) given sigma & N values of random variabl

Thanks again Stephen,

If we use the Normal distribution with mean = 0 and sigma = 1, then the distribution N(x) simplifies to

N(x) = (1/(sqrt(2*pi))*e(-x*x/2)

If we can insert this into the appropriate "complementary" equation, then re-arrange terms and/or do a change of variables to express this equation using the complementary error function (erfc), I can likely handle the numerical techniques for large values of n.

erfc(z) = (2/sqrt(pi))*[integral from z to infinity of e-t*t dt ]

But the initial "complementary" equation I'm thinking may need to avoid any expression of the form (1 minus a number very close to 1) from the start.

Last edited: Jun 17, 2011
10. Jun 18, 2011

Stephen Tashi

Re: math to compute width of gaussian PDF(x) given sigma & N values of random variabl

In terms of erfc(x):

Let $$\phi(x) = \frac{1}{\sqrt{ 2 \pi}} e^{-\frac{x^2}{2} }$$

$$p(m) = \int_{-m}^m \phi(x) dx$$

$$1 - p(m) = 1 - \int_{-m}^m \phi(x) dx = 2 \int_{m}^{\infty} \phi(x) dx = 2 \ erfc(m)$$

$$2\ erfc(m) = 1 - p(m)$$

$$erfc(m) = \frac{1 - p(m)}{2}$$

-------

$$\ln {p(m)} = \frac{ \ln{0.95}}{600\ E 9}$$
$$\ln{p(m)} = -(0.051293294)\ E-11 = -0.51293294\ E -12$$
$$p(m) = e^{ (-0.51293294\ E -12 )}$$

The approximation $e^x \approx 1 + x$ might be good enough.

11. Jun 18, 2011

tim8691

Re: math to compute width of gaussian PDF(x) given sigma & N values of random variabl

Wow, that's great Stephen. At first I was skeptical because of working with 1-(very small number), but in the end with the Taylor expansion of ex = 1+x, the 1 - ex simplifies to -x. Brilliant!