Distribution/confidence question for dummies

CRGreathouse · Aug 11, 2008

Binomial distribution/confidence question for dummies

(The 'dummy' would be me.)

I have an event that happens with unknown probability p. Each of n independent events results in k of these events happening. How do I construct a (95%) confidence interval for p?

For small n it's easy to figure this out with numerical combinatorics:

Pr(at most k events) = [tex]\sum_{i=0}^k{n\choose i}p^i(1-p)^{n-i}[/tex]
Pr(at least k events) = [tex]\sum_{i=k}^n{n\choose i}p^i(1-p)^{n-i}[/tex]

and then find the roots of Pr(at most k events) - 0.05 and Pr(at least k events) - 0.05. (Maybe I should use 0.025 instead?)

But for large n (even not all that large!), this is inconvenient. Surely there is some standard statistical method for this? Sticking as close to the roots as possible would be best -- I'd prefer to use as little Central Limit Theorem as I can.

CRGreathouse · Aug 11, 2008

Actually, forget it. Computers are fast, and wasting a few billion cycles isn't going to kill me.

Pari/GP code:

Code:

probrange(n,k,confidence=.05)={
	[if(k==0,0,solve(p=0,1,sum(i=k,n,binomial(n,i)*p^i*(1-p)^(n-i))-confidence)),
	if(k==n,1,solve(p=0,1,sum(i=0,k,binomial(n,i)*p^i*(1-p)^(n-i))-confidence))]
};

Focus · Aug 11, 2008

CRGreathouse said:

(The 'dummy' would be me.)

I have an event that happens with unknown probability p. Each of n independent events results in k of these events happening. How do I construct a (95%) confidence interval for p?

For small n it's easy to figure this out with numerical combinatorics:

Pr(at most k events) = [tex]\sum_{i=0}^k{n\choose i}p^i(1-p)^{n-i}[/tex]
Pr(at least k events) = [tex]\sum_{i=k}^n{n\choose i}p^i(1-p)^{n-i}[/tex]

and then find the roots of Pr(at most k events) - 0.05 and Pr(at least k events) - 0.05. (Maybe I should use 0.025 instead?)

But for large n (even not all that large!), this is inconvenient. Surely there is some standard statistical method for this? Sticking as close to the roots as possible would be best -- I'd prefer to use as little Central Limit Theorem as I can.

If you want the proper statistical reasoning for using the combinatorial things then look up Neyman Pearson lemma. It tells you that out of all the estimators, the one using the likelihood estimate gives the largest power. I'm afraid without Central Limit or Poisson approximation you can't simplify the computation.

Oo and your values should be 0.025 (so it adds to 0.5 when you do two sided test)

Good luck.

CRGreathouse · Aug 11, 2008

Focus said:

If you want the proper statistical reasoning for using the combinatorial things then look up Neyman Pearson lemma. It tells you that out of all the estimators, the one using the likelihood estimate gives the largest power. I'm afraid without Central Limit or Poisson approximation you can't simplify the computation.

The maximum likelihood estimator is k/n -- if I have 10 trials and win at 3, the most likely probability is 30%. That's not hard to figure out.

I'm not actually getting particularly good results with using the normal estimate, so I think I'll have to compute numerically.

I will look up that lemma, though; that might help.

Focus said:

Oo and your values should be 0.025 (so it adds to 0.5 when you do two sided test)

You know, this is where my lack of stats knowledge hurts. It's clear that I should use 0.05, not 0.025, when k = 0. But for k > 0, where I have a choice, I'm not sure what the justification is on choosing left and right errors to be equal. What do you think?

Focus · Aug 11, 2008

CRGreathouse said:

The maximum likelihood estimator is k/n -- if I have 10 trials and win at 3, the most likely probability is 30%. That's not hard to figure out.

I'm not actually getting particularly good results with using the normal estimate, so I think I'll have to compute numerically.

I will look up that lemma, though; that might help.

The lemma tells you that the likelihood function is, well, "best" to use.

CRGreathouse said:

You know, this is where my lack of stats knowledge hurts. It's clear that I should use 0.05, not 0.025, when k = 0. But for k > 0, where I have a choice, I'm not sure what the justification is on choosing left and right errors to be equal. What do you think?

Confidence intervals always use alpha/2 on each tail. It is analogous (sp?) to two sided hypothesis tests.

CRGreathouse · Aug 11, 2008

Here's the revised numerical function I'm using:

Code:

probrange(n,k,conf=.05)={
	if (k==0, return([0, solve(p=0,1,(1-p)^n-conf)]));
	if (k==n, return([solve(p=0,1,p^n-conf), 1]));

	conf = conf/2;
	if(k+k < n,
		[solve(p=0,1,1-conf-sum(i=0,k-1,binomial(n,i)*p^i*(1-p)^(n-i))),
		solve(p=0,1,sum(i=0,k,binomial(n,i)*p^i*(1-p)^(n-i))-conf)],
		[solve(p=0,1,sum(i=k,n,binomial(n,i)*p^i*(1-p)^(n-i))-conf),
		solve(p=0,1,1-conf-sum(i=k+1,n,binomial(n,i)*p^i*(1-p)^(n-i)))]
	)
};
addhelp(probrange, "probrange(n,k,conf=.05): Gives a confidence interval for the probability of an event which happens k times in n trials.");

I'd like to test this against some normal approximations, but I'm having a bit of trouble in Pari -- I don't know how to calculate a normal percentile ([itex]z_{1-\alpha/2}[/itex]). All Pari has built-in is the complementary error function. Is there a better way than numerically solving for the inverse?

Distribution/confidence question for dummies

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight