Need statistics help working with normal distribution

In summary: Because if so, then Qc would be the probability of choosing a particular value of x, rather than the probability of running an experiment multiple times and getting a range of values for x. In summary, Tim is trying to solve a problem where he needs to know the probability of a certain range of values for a normally distributed random variable. He has found that if he solves for the probability of a certain range of values, then the probability of choosing a particular value of the random variable also falls into that range.
  • #1
tim8691
9
0
Hello experts,

Thanks to discussions with Stephen Tashi for getting me this far.

See the problem statement in the attached PDF page 1. I need help solving for Qc in equation form, as a function of the other variables (N and C), preferably using erfc so I can program an accurate algorithm for very large values of N (>1E+16).

Pages 2 and 3 attempt to solve this problem, but only take me so far. Not sure if this is the right approach, of it there's just another step or two needed from what's presented there.

Looking forward if anyone can help me figure this out.

Tim
 
Physics news on Phys.org
  • #2
Attached pdf?
 
  • #3
Hmm, not sure why it didn't take. I'll try again.
 

Attachments

  • gen_problem.pdf
    97.8 KB · Views: 275
  • #4
To rephrase, the problem would be to determine the Q such that
[tex]P[\max(X_1,...,X_N)-\min(X_1,...,X_N) \le 2Q]=C[/tex]
where the [itex]X_i[/itex] are iid N(0,1). The LHS can be written as
[tex]\int_{-\infty}^{\infty}N(F(x+2Q)-F(x))^{N-1}f(x) dx[/tex]
where [itex]F(x)[/itex] and [itex]f(x)[/itex] are the Normal CDF and PDF respectively, though the integral may not have a closed form.

If instead you solve for the Q such that
[tex]P[-Q\le \min(X_1,...,X_N) \le \max(X_1,...,X_N) \le Q] = C[/tex]
then the left hand side is
[tex]P[-Q\le X_1 \le Q]^N = (F(Q)-F(-Q))^N = (2F(Q)-1)^N[/tex]
The solution is
[tex]Q = F^{-1}((1+C^{1/N})/2)[/tex]
The normal quantile function is implemented in many computer languages, e.g. with C=0.95 and N=1e6 the Excel formula "=NORMSINV((1+0.95^(1/1e6))/2)" returns 5.446768 which agrees with R and MATLAB.

Edit: if you must use erf, use [itex]F(x)=(erf(x/\sqrt{2})+1)/2[/itex] so
[tex]Q=\sqrt{2}erf^{-1}(C^{1/N})[/tex]
 
Last edited:
  • #5
Thanks so much bpet, I believe your solution above solves the equation:

(p(Q))N = C

as I've defined on page 2 of the attached document. This Q is the probability of running one experiment of population N and computing the range (e.g. 2Q) of the normally distributed random variable x.

But after that, how do we then account for running that experiment many times and solving for Qc? That is, "If we repeat the above experiment an infinite (or, very large) number of times, and create a histogram from all the values of 2Q measured, how far (e.g. 2Qc) into this new histogram contains C percent of the population?"

I think Qc should differ from Q, is that right?
 

FAQ: Need statistics help working with normal distribution

1. What is a normal distribution?

A normal distribution is a probability distribution that is commonly used in statistics to describe the distribution of a set of data. It is also known as a Gaussian distribution and is characterized by its bell-shaped curve.

2. How is a normal distribution calculated?

A normal distribution can be calculated using a mathematical formula known as the normal distribution function, which takes into account the mean and standard deviation of the data set. Alternatively, it can also be graphically represented using a normal distribution curve.

3. What is the purpose of using a normal distribution in statistics?

The use of normal distribution in statistics allows us to make predictions and draw conclusions about a population based on a sample of data. It also helps in understanding the likelihood of certain events occurring and allows for the use of various statistical tests.

4. How do you interpret a normal distribution curve?

The normal distribution curve can be interpreted in terms of its mean, standard deviation, and shape. The mean represents the center of the curve, while the standard deviation indicates the spread of the data around the mean. The shape of the curve can tell us the proportion of data falling within certain intervals.

5. What are some real-world examples of a normal distribution?

Normal distributions can be found in many real-world scenarios, such as the height and weight of a population, IQ scores, and blood pressure measurements. They are also commonly used in financial markets to model stock returns and in quality control to assess product variability.

Back
Top