Calculating Width of Gaussian PDF(x) Given Sigma & N

In summary: PDF(x) = (1/sigma*sqrt(2*pi))*e^(x*x/(2*sigma^2)) To compute the width of the histogram for a given N, I need to find an interval (mu-R,mu+R) such that there is a 95% chance that when a given number n independent samples of X are taken that all of them will lie in (mu-R,mu+R). In summary, you need to predict the largest peak-to-peak deviation (e.g. the width of the PDF, from the end of one tail to the end of the other tail) that may be observed (
  • #1
tim8691
9
0
Hi Experts,

I'm working in industry and have an application requiring some expert knowledge on statistics/probability. I have a probability distribution function (PDF) for a Gaussian random variable. I know the standard deviation of the PDF. I also know total number of experiments conducted, where one experiment is one value of the random variable, x.

For example, the standard deviation in my application is 1 ps RMS (e.g. 1 ps = 1E-12 seconds). The number of measured values for my random variable is 600E+9 (e.g. 600E+9 individual values of x; I don't have the individual values, but I know 600E+9 of them were measured).

From this information, I need to predict the largest peak-to-peak deviation (e.g. the width of the PDF, from the end of one tail to the end of the other tail) that may be observed (as a function of a given confidence level, or confidence interval, not sure what's the right terminology here; I believe this level/interval is needed to define the goal, correct me if not).

Can anyone help me understand the equations involved? I know Gaussian PDF for random variable x is

PDF(x) = (1/sigma*sqrt(2*pi))*e^(x*x/(2*sigma^2))

Not sure how to quantify the largest peak-peak deviation expected based on number o acquired samples N and sigma. Thanks in advance. -Tim
 
Physics news on Phys.org
  • #2


I can only help with the terminology. For N independent samples the smallest value [itex] X_{(1)} [/itex] is calle the first "order statistic". The largest value [itex] X_{)N)}[/itex] would be the Nth order statistic. From some web searching, the value [itex] X_{(N)} - X_{(1)} [/itex] is often called the "sample range".

Do you care about the two endpoints that bound the values in the sample or only about the distance between these endpoints?

Perhaps your question is this:

For N samples [itex] X_i [/itex] from a normal distribution [itex] N(\mu,\sigma) [/itex] and for a given probability p (such as p =0.95) determine the interval [itex] (mu - R, mu + R) [/itex] so that there is a probability p that the interval [itex] (X_{(1)}, X_{(N)}) [/itex] is contained in [itex] (\mu - R, \mu + R) [/itex].

I suppose the interval [itex] (\mu - R, \mu + R) [/itex] can be called a "prediction interval. It seems the man-in-the-street calls any interval associated with the idea of probability a "confidence interval", but confidence intervals are a more complicated concept than prediction intervals. The term "prediction interval" is usually applied to an interval that tries to contain a single value of a random variable rather than two values of it.
 
  • #3


Hi Stephen,

Thanks for your comments. Using your terminology, I want to compute the Nth order statistic multiplied by 2. That is, if we conduct N experiments of random variable x having normal distribution (mean=0), what's the largest value of x we'll observe (assuming some confidence level, or probability, of 95%, for example)? This "largest value" would be the "Nth order statistic". Since the distribution (e.g. PDF(x)) is symmetric, the width, or separation, between the Nth order statistic on the right tail and the Nth order statistic on the left tail equals two times the Nth order statistic. This "two times the Nth order statistic" is what I call the peak-to-peak value for my random variable x.

Basically, if we view the PDF(x) as a histogram, I need to compute the width of this histogram. As the number of experiments grows (e.g. N increases), we'd expect the width to increase. So, for a given N, how to compute the width?

Any feedback much appreciated -- thanks so much.
 
Last edited:
  • #4


tim8691 said:
Since the distribution (e.g. PDF(x)) is symmetric, the width, or separation, between the Nth order statistic on the right tail and the Nth order statistic on the left tail equals two times the Nth order statistic.

Strictly speaking, that doesn't make any sense. We'd have to define what "the Nth order statistic on the left tail" means. You seem to be thinking of the values farthest to the left as some kind of Nth order statistic, but these are the smaller values, not the larger ones. You also seem to be thinking that the Nth order statistic must always be greater than the mean. In your problem, you have so many samples, it probably will be. But there is nothing in definition of the Nth order statistic that says is must be.

See if you like this formulation:

For a random variable X with a given normal distribution N(mu,sigma), find an interval (mu-R,mu+R) such that there is a 95% chance that when a given number n independent samples of X are taken that all of them will lie in (mu-R,mu+R).


I think that amounts to what I said in the earlier post.
 
  • #5


Hi Stephen,

I think you're right. Let's set mu to 0 here (e.g. the mean of the distribution is 0).

For a random variable X with a given normal distribution N(mu=0,sigma), find an interval (-R,R) such that there is a 95% chance that when a given number n independent samples of X are taken that all of them will lie in (-R,R).

Any idea how to compute R?

Thanks again.
 
  • #6


I have an idea, but it may be too simplistic.

For a given value m, you can find the probability that a single outcome from a normal normal distribution N(0,sigma) falls within (-m, m). That's a standard type of statistical problem. Let's call value of that probability p(m). If you want the probability that N independent samples fall with (-m,m), the answer would be the Nth power of p(m).
So, to get a 0.95 probability, we are trying to solve for m, in the equation:
[tex] (p(m))^N = 0.95 [/tex]
For large N, I think it would be best to use logarithms, so we are trying to solve:
[tex] N \log {p(m)} = \log{ 0.95 }[/tex]
[tex] \log p(m) = \frac{ \log{0.95}}{N} [/tex]

I don't think you can solve this symbolically, you probably have to do it by a numerical method such as "bisection". I also don't think that you can use the typical printed tables of the normal distribution in statistics texts because they only have a few digits of precision.
 
  • #7


Thanks Stephen,

(p(m))N=0.95

This equation is along the right track, thanks!. But, I find this places me working with numbers very close to 1. This is a problem because my values of N are very large, and there's only 16 digits in a computer's representation of a double-precision number.

Is there a way, keeping the same definition of the problem above, to change that equation above around so we're trying to solve for a number very close to zero instead? By doing this we not only have the 16 digits but an exponent as well, e.g. times 10^y, which greatly improves accuracy over working with a number close to 1.

Let me try in words...

For a random variable X with a given normal distribution N(mu=0,sigma), find an interval (-R,R) such that there is a FIVE % chance that when n independent samples of X are taken that AT LEAST ONE of them will lie outside of (-R,R).

Maybe that's not the right way of wording it to be equivalent, but hoping at least you can see what I'm trying to do.
 
  • #8


The problem of doing the numerical calculations is what I had in mind when I said my answer might be too simplistic. There is the problem of finding p(m) and then the problem of finding m once you do that.

The way that you formulated the "complementary" problem is correct. I don't know that it is easier to solve. The expression for the probability of at least one value falling outside is going to be a sum of terms involving large factorials. Sterlings formula might be used to approximate them. I'm not sure.

It's similar to the problem encountered in writing
[tex] p(m) = 1 - \delta [/tex]
[tex] (1 - \delta)^N = 0.95 [/tex]
and then beginning to expand [itex] (1-\delta)^N [/itex] by the binomial theorem using the lower powers of [itex] \delta [/itex].

I think the problem can be solved but I'll have to think more about it. Also we can start another thread that may attract the attention of numerical analysts who don't read the threads about probability. Right now, I must go do some yard work.
 
  • #9


Thanks again Stephen,

If we use the Normal distribution with mean = 0 and sigma = 1, then the distribution N(x) simplifies to

N(x) = (1/(sqrt(2*pi))*e(-x*x/2)

If we can insert this into the appropriate "complementary" equation, then re-arrange terms and/or do a change of variables to express this equation using the complementary error function (erfc), I can likely handle the numerical techniques for large values of n.

erfc(z) = (2/sqrt(pi))*[integral from z to infinity of e-t*t dt ]

But the initial "complementary" equation I'm thinking may need to avoid any expression of the form (1 minus a number very close to 1) from the start.
 
Last edited:
  • #10


In terms of erfc(x):

Let [tex] \phi(x) = \frac{1}{\sqrt{ 2 \pi}} e^{-\frac{x^2}{2} } [/tex]

[tex] p(m) = \int_{-m}^m \phi(x) dx [/tex]

[tex] 1 - p(m) = 1 - \int_{-m}^m \phi(x) dx = 2 \int_{m}^{\infty} \phi(x) dx = 2 \ erfc(m) [/tex]

[tex] 2\ erfc(m) = 1 - p(m) [/tex]

[tex] erfc(m) = \frac{1 - p(m)}{2} [/tex]

-------

[tex] \ln {p(m)} = \frac{ \ln{0.95}}{600\ E 9} [/tex]
[tex] \ln{p(m)} = -(0.051293294)\ E-11 = -0.51293294\ E -12 [/tex]
[tex] p(m) = e^{ (-0.51293294\ E -12 )} [/tex]

The approximation [itex] e^x \approx 1 + x [/itex] might be good enough.
 
  • #11


Wow, that's great Stephen. At first I was skeptical because of working with 1-(very small number), but in the end with the Taylor expansion of ex = 1+x, the 1 - ex simplifies to -x. Brilliant!
 

1. What is a Gaussian PDF and why is it important in scientific research?

A Gaussian PDF, also known as a Gaussian distribution or normal distribution, is a probability distribution that is commonly used to model a wide range of natural phenomena in various scientific fields. It is important because it allows us to describe and analyze data that follows a bell-shaped curve, making it a useful tool in statistical analysis and hypothesis testing.

2. How is the width of a Gaussian PDF calculated?

The width of a Gaussian PDF is calculated using the standard deviation, or sigma (σ), of the distribution. The width is equal to 2σ, and it represents the distance from the center of the curve to the point where the curve intersects the x-axis.

3. Can the width of a Gaussian PDF be negative?

No, the width of a Gaussian PDF cannot be negative. It is always a positive value, as it represents a distance or a scale.

4. How does changing the value of sigma affect the width of a Gaussian PDF?

Increasing the value of sigma will result in a wider Gaussian PDF, while decreasing the value of sigma will result in a narrower Gaussian PDF. This is because sigma is directly proportional to the width of the curve.

5. Can the width of a Gaussian PDF be used to compare distributions with different means?

Yes, the width of a Gaussian PDF can be used to compare distributions with different means. This is because the width is a standardized measure that is independent of the mean and allows for a fair comparison between different distributions.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
744
  • Set Theory, Logic, Probability, Statistics
Replies
0
Views
990
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
814
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
684
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
Replies
12
Views
686
Back
Top