Estimation, negative binomial variable

AI Thread Summary
The discussion centers on the estimation of the success probability p in a negative binomial distribution, specifically contrasting two estimators: (r-1)/(x+r-1) and r/(x+r). The first estimator is unbiased, while the second tends to overestimate p due to the nature of the negative binomial process, which focuses on achieving a set number of successes rather than a fixed number of trials. The participants explore the intuition behind why subtracting one from r yields a more accurate estimate, suggesting that the negative binomial process inherently favors successes, thus necessitating a correction. They also touch on the expected value of x and how averaging affects the estimators. Overall, the conversation highlights the complexities of estimating probabilities in non-binomial experiments.
skwey
Messages
17
Reaction score
0
Hey, there's this thing I can't wrap my head around.

Let's say we have a negative binomial variable x, with parameters p and r. That is, x is the number of failures we get before the rth sucess, while looking at random bernolli variables with sucsess rate p.

It can be shown that (r-1)/(x+r-1) is an unbiased estimator for p. So let's say before you start the experiment you want 5 sucesses, then the failures x is the variable. Let's say you get SFFSSFSFFS then the estimation for p=4/9=0,444444

Here comes the tricky part. One intuitive way the estimate the sucsessrate is to use r/(x+r), which I think is more logical(it is also actually the maximum-lilelyhood estimator, but biased). I mean, if you have 10 trials and 5 sucsesses 5/10 wouldn't be that bad would it? However this is not correct, and on avarage in the long run, since this is not a binomial experiment, but a negative binomial, it will tend to overestimate p.

But why does this tend to overestimate p, and why does subtracting 1 from r, give the correct answer? I know it can be shown from calculating the expected value of (r-1)/(x+r-1) that it is unbiased, but I am looking for a more intuitive answer. Which properties does the negative binomal model have that gives you the correct estimator, on avarage in the long run, if you subtract 1 from r.
 
Last edited:
Physics news on Phys.org
I don't know how to justify that exact result intuitively and I haven't double checked you formulas, but one can form an intuition on the general idea that a sampling plan affects an estimator. If you agree to toss a coin 2 times then you estimate p from 4 possible sequences of success S and failure F. This includes the possibility F,F. If you agree to toss a coin until the first head comes up, you eliminate the possibility of the sequence FF and you add the possibilities of longer sequences. However, you never allow sequences that are all failures such as F,F,F or F,F,F,F. So, in a manner of speaking, it is your sampling plan that biases the estimate based on the ratio (number of successes)/(total number of trials).

As a simpler example, suppose you agree to toss a coin until the total number of heads is equal to the total number of tails. That sampling plan certainly affects an estimator based on the ratio of heads to total tosses.
 
I see what you are saying Stephen Tashi, the sampling process obvioiusly alters the way we estimate, it is just interesting that it is so close to the binomial forumula except that you subtract 1 from r.


Another thing that makes this make you wonder more is if you look at the expected value of x.

E(X)=r*(1-p)/p, shuffling this gives p=r/( E(X)+r ). This says that if we do a negative binomial experiment many times, with the same r, and take the avarage of the x values, then now r/(x(avarage)+r) will converge to p.

But if you want to use (r-1)/(x+r-1) to get p, when you have many experiments, you have to use the avarage of the whole (r-1)/(x+r-1), not (r-1)/(x(avarage)+r-1). Since the expected value of (r-1)/(x+r-1) is p.

This is also very difficult to understand.
 
skwey said:
I see what you are saying Stephen Tashi, the sampling process obvioiusly alters the way we estimate, it is just interesting that it is so close to the binomial forumula except that you subtract 1 from r.


Another thing that makes this make you wonder more is if you look at the expected value of x.

E(X)=r*(1-p)/p, shuffling this gives p=r/( E(X)+r ). This says that if we do a negative binomial experiment many times, with the same r, and take the avarage of the x values, then now r/(x(avarage)+r) will converge to p.

But if you want to use (r-1)/(x+r-1) to get p, when you have many experiments, you have to use the avarage of the whole (r-1)/(x+r-1), not (r-1)/(x(avarage)+r-1). Since the expected value of (r-1)/(x+r-1) is p.

This is also very difficult to understand.

That looks about right, what it's saying is that E[(r-1)/(X+r-1)] = p = r/(E[X]+r) for the NB distribution. Jensen's inequality might help to show if r/(X+r) is overestimating p.
 
bpet said:
That looks about right, what it's saying is that E[(r-1)/(X+r-1)] = p = r/(E[X]+r) for the NB distribution. Jensen's inequality might help to show if r/(X+r) is overestimating p.


Thanks for your contribution. I don't know about Jensens inequality, but you don't need it to show that r/(X+r) overestimates p: It is easy to show that E[(r-1)/(x+r-1)] is p. And also that r/(x+r) > (r-1)/(x+r-1) for all x>0, then it is logical that r/(x+r) overestimates p, because when you calculate it's expeted value, all its moments is bigger than the moments of (r-1)/(x+r-1), since r/(x+r) > (r-1)/(x+r-1) hence it's expected value must be bigger than p. I am talking about the first moment f(x)*p(x). Where f(x) is r/(x+r) or f(x)=(r-1)/(r-1+x)

But this is very technical and not really an intuitive explanation, which explains it by just calculating it. I thought of one intuitive answer, which I am not sure of at all, but maybe you guys can accept or reject it?: Is it fair to say that the negative binomial process is on a team with the sucsess and not the failures? I mean if r is 5 we accept SSSSS, but we will continue if we get FFFFF, so in a way the process is allways fighting for S?, it will never stop uintil it has enough sucsesses. And since the process is allways "fighting" for sucesses, the end result will tend to have more sucesses than the population really has, so we have to correct for this. That is why we have to subtract one from r, that is, we subtract one from the sucesses, and use (r-1)/(r-1+x) and not r/(r+x). Is this a fair explanation, or is it not ok?
 
Last edited:
I was reading documentation about the soundness and completeness of logic formal systems. Consider the following $$\vdash_S \phi$$ where ##S## is the proof-system making part the formal system and ##\phi## is a wff (well formed formula) of the formal language. Note the blank on left of the turnstile symbol ##\vdash_S##, as far as I can tell it actually represents the empty set. So what does it mean ? I guess it actually means ##\phi## is a theorem of the formal system, i.e. there is a...
Back
Top