Estimation, negative binomial variable

skwey · Dec 26, 2011

Hey, there's this thing I can't wrap my head around.

Let's say we have a negative binomial variable x, with parameters p and r. That is, x is the number of failures we get before the rth sucess, while looking at random bernolli variables with sucsess rate p.

It can be shown that (r-1)/(x+r-1) is an unbiased estimator for p. So let's say before you start the experiment you want 5 sucesses, then the failures x is the variable. Let's say you get SFFSSFSFFS then the estimation for p=4/9=0,444444

Here comes the tricky part. One intuitive way the estimate the sucsessrate is to use r/(x+r), which I think is more logical(it is also actually the maximum-lilelyhood estimator, but biased). I mean, if you have 10 trials and 5 sucsesses 5/10 wouldn't be that bad would it? However this is not correct, and on avarage in the long run, since this is not a binomial experiment, but a negative binomial, it will tend to overestimate p.

But why does this tend to overestimate p, and why does subtracting 1 from r, give the correct answer? I know it can be shown from calculating the expected value of (r-1)/(x+r-1) that it is unbiased, but I am looking for a more intuitive answer. Which properties does the negative binomal model have that gives you the correct estimator, on avarage in the long run, if you subtract 1 from r.

Stephen Tashi · Dec 26, 2011

I don't know how to justify that exact result intuitively and I haven't double checked you formulas, but one can form an intuition on the general idea that a sampling plan affects an estimator. If you agree to toss a coin 2 times then you estimate p from 4 possible sequences of success S and failure F. This includes the possibility F,F. If you agree to toss a coin until the first head comes up, you eliminate the possibility of the sequence FF and you add the possibilities of longer sequences. However, you never allow sequences that are all failures such as F,F,F or F,F,F,F. So, in a manner of speaking, it is your sampling plan that biases the estimate based on the ratio (number of successes)/(total number of trials).

As a simpler example, suppose you agree to toss a coin until the total number of heads is equal to the total number of tails. That sampling plan certainly affects an estimator based on the ratio of heads to total tosses.

skwey · Dec 26, 2011

I see what you are saying Stephen Tashi, the sampling process obvioiusly alters the way we estimate, it is just interesting that it is so close to the binomial forumula except that you subtract 1 from r.

Another thing that makes this make you wonder more is if you look at the expected value of x.

E(X)=r*(1-p)/p, shuffling this gives p=r/( E(X)+r ). This says that if we do a negative binomial experiment many times, with the same r, and take the avarage of the x values, then now r/(x(avarage)+r) will converge to p.

But if you want to use (r-1)/(x+r-1) to get p, when you have many experiments, you have to use the avarage of the whole (r-1)/(x+r-1), not (r-1)/(x(avarage)+r-1). Since the expected value of (r-1)/(x+r-1) is p.

This is also very difficult to understand.

bpet · Dec 26, 2011

skwey said:

I see what you are saying Stephen Tashi, the sampling process obvioiusly alters the way we estimate, it is just interesting that it is so close to the binomial forumula except that you subtract 1 from r.

Another thing that makes this make you wonder more is if you look at the expected value of x.

E(X)=r*(1-p)/p, shuffling this gives p=r/( E(X)+r ). This says that if we do a negative binomial experiment many times, with the same r, and take the avarage of the x values, then now r/(x(avarage)+r) will converge to p.

But if you want to use (r-1)/(x+r-1) to get p, when you have many experiments, you have to use the avarage of the whole (r-1)/(x+r-1), not (r-1)/(x(avarage)+r-1). Since the expected value of (r-1)/(x+r-1) is p.

This is also very difficult to understand.

That looks about right, what it's saying is that E[(r-1)/(X+r-1)] = p = r/(E[X]+r) for the NB distribution. Jensen's inequality might help to show if r/(X+r) is overestimating p.

skwey · Dec 27, 2011

bpet said:

That looks about right, what it's saying is that E[(r-1)/(X+r-1)] = p = r/(E[X]+r) for the NB distribution. Jensen's inequality might help to show if r/(X+r) is overestimating p.

Thanks for your contribution. I don't know about Jensens inequality, but you don't need it to show that r/(X+r) overestimates p: It is easy to show that E[(r-1)/(x+r-1)] is p. And also that r/(x+r) > (r-1)/(x+r-1) for all x>0, then it is logical that r/(x+r) overestimates p, because when you calculate it's expeted value, all its moments is bigger than the moments of (r-1)/(x+r-1), since r/(x+r) > (r-1)/(x+r-1) hence it's expected value must be bigger than p. I am talking about the first moment f(x)*p(x). Where f(x) is r/(x+r) or f(x)=(r-1)/(r-1+x)

But this is very technical and not really an intuitive explanation, which explains it by just calculating it. I thought of one intuitive answer, which I am not sure of at all, but maybe you guys can accept or reject it?: Is it fair to say that the negative binomial process is on a team with the sucsess and not the failures? I mean if r is 5 we accept SSSSS, but we will continue if we get FFFFF, so in a way the process is allways fighting for S?, it will never stop uintil it has enough sucsesses. And since the process is allways "fighting" for sucesses, the end result will tend to have more sucesses than the population really has, so we have to correct for this. That is why we have to subtract one from r, that is, we subtract one from the sucesses, and use (r-1)/(r-1+x) and not r/(r+x). Is this a fair explanation, or is it not ok?

Estimation, negative binomial variable

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect