Need statistics help working with normal distribution

tim8691 · Jun 30, 2011

Hello experts,

Thanks to discussions with Stephen Tashi for getting me this far.

See the problem statement in the attached PDF page 1. I need help solving for Qc in equation form, as a function of the other variables (N and C), preferably using erfc so I can program an accurate algorithm for very large values of N (>1E+16).

Pages 2 and 3 attempt to solve this problem, but only take me so far. Not sure if this is the right approach, of it there's just another step or two needed from what's presented there.

Looking forward if anyone can help me figure this out.

Tim

mathman · Jun 30, 2011

Attached pdf?

tim8691 · Jun 30, 2011

Hmm, not sure why it didn't take. I'll try again.

bpet · Jun 30, 2011

To rephrase, the problem would be to determine the Q such that
[tex]P[\max(X_1,...,X_N)-\min(X_1,...,X_N) \le 2Q]=C[/tex]
where the [itex]X_i[/itex] are iid N(0,1). The LHS can be written as
[tex]\int_{-\infty}^{\infty}N(F(x+2Q)-F(x))^{N-1}f(x) dx[/tex]
where [itex]F(x)[/itex] and [itex]f(x)[/itex] are the Normal CDF and PDF respectively, though the integral may not have a closed form.

If instead you solve for the Q such that
[tex]P[-Q\le \min(X_1,...,X_N) \le \max(X_1,...,X_N) \le Q] = C[/tex]
then the left hand side is
[tex]P[-Q\le X_1 \le Q]^N = (F(Q)-F(-Q))^N = (2F(Q)-1)^N[/tex]
The solution is
[tex]Q = F^{-1}((1+C^{1/N})/2)[/tex]
The normal quantile function is implemented in many computer languages, e.g. with C=0.95 and N=1e6 the Excel formula "=NORMSINV((1+0.95^(1/1e6))/2)" returns 5.446768 which agrees with R and MATLAB.

Edit: if you must use erf, use [itex]F(x)=(erf(x/\sqrt{2})+1)/2[/itex] so
[tex]Q=\sqrt{2}erf^{-1}(C^{1/N})[/tex]

tim8691 · Jul 1, 2011

Thanks so much bpet, I believe your solution above solves the equation:

(p(Q))^N = C

as I've defined on page 2 of the attached document. This Q is the probability of running one experiment of population N and computing the range (e.g. 2Q) of the normally distributed random variable x.

But after that, how do we then account for running that experiment many times and solving for Qc? That is, "If we repeat the above experiment an infinite (or, very large) number of times, and create a histogram from all the values of 2Q measured, how far (e.g. 2Qc) into this new histogram contains C percent of the population?"

I think Qc should differ from Q, is that right?

Need statistics help working with normal distribution

Attachments

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Need statistics help working with normal distribution

Attachments

Similar threads