Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Need statistics help working with normal distribution

  1. Jun 30, 2011 #1
    Hello experts,

    Thanks to discussions with Stephen Tashi for getting me this far.

    See the problem statement in the attached PDF page 1. I need help solving for Qc in equation form, as a function of the other variables (N and C), preferably using erfc so I can program an accurate algorithm for very large values of N (>1E+16).

    Pages 2 and 3 attempt to solve this problem, but only take me so far. Not sure if this is the right approach, of it there's just another step or two needed from what's presented there.

    Looking forward if anyone can help me figure this out.

    Best regards, Tim
  2. jcsd
  3. Jun 30, 2011 #2


    User Avatar
    Science Advisor

    Attached pdf???????????????
  4. Jun 30, 2011 #3
    Hmm, not sure why it didn't take. I'll try again.

    Attached Files:

  5. Jun 30, 2011 #4
    To rephrase, the problem would be to determine the Q such that
    [tex]P[\max(X_1,...,X_N)-\min(X_1,...,X_N) \le 2Q]=C[/tex]
    where the [itex]X_i[/itex] are iid N(0,1). The LHS can be written as
    [tex]\int_{-\infty}^{\infty}N(F(x+2Q)-F(x))^{N-1}f(x) dx[/tex]
    where [itex]F(x)[/itex] and [itex]f(x)[/itex] are the Normal CDF and PDF respectively, though the integral may not have a closed form.

    If instead you solve for the Q such that
    [tex]P[-Q\le \min(X_1,...,X_N) \le \max(X_1,...,X_N) \le Q] = C[/tex]
    then the left hand side is
    [tex]P[-Q\le X_1 \le Q]^N = (F(Q)-F(-Q))^N = (2F(Q)-1)^N[/tex]
    The solution is
    [tex]Q = F^{-1}((1+C^{1/N})/2)[/tex]
    The normal quantile function is implemented in many computer languages, e.g. with C=0.95 and N=1e6 the Excel formula "=NORMSINV((1+0.95^(1/1e6))/2)" returns 5.446768 which agrees with R and MATLAB.

    Edit: if you must use erf, use [itex]F(x)=(erf(x/\sqrt{2})+1)/2[/itex] so
    Last edited: Jun 30, 2011
  6. Jul 1, 2011 #5
    Thanks so much bpet, I believe your solution above solves the equation:

    (p(Q))N = C

    as I've defined on page 2 of the attached document. This Q is the probability of running one experiment of population N and computing the range (e.g. 2Q) of the normally distributed random variable x.

    But after that, how do we then account for running that experiment many times and solving for Qc? That is, "If we repeat the above experiment an infinite (or, very large) number of times, and create a histogram from all the values of 2Q measured, how far (e.g. 2Qc) into this new histogram contains C percent of the population?"

    I think Qc should differ from Q, is that right?
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook