# Discrete Random Variables - Geometric Distribution

Hi Guys,

Long time reader first time poster...

This simple question has stumped me all day and I think I've finally cracked it! I'm hoping someone can confirm that, or tell me how wrong I am - either is fine :)

One in 1000 cows have a rare genetic disease. The disease is not contagious, therefore cases are independent.

Let X be the number of cows purchased by a farmer. How many cows are purchased by the farmer until the 1st cow with the disease, given:

P(X≤r)=0.05
P(X≤r)=0.90

This is what I've done:

p = 1/1000 = 0.001 (? was unsure if this was in fact my p value)

P(X>r)=(1-p)^r

P(X≤r)=0.05 (given)

P(X≤r) + P(X>r) = 1 for geometric distribution

Therefore:

0.05 + (1-p)^r=1

0.05 + (1-0.001)^r=1

0.999^r=0.95

ln(0.999^r)=ln(0.95)

r≈51

And same again for P(X≤r)=0.90

Can someone tell me if I'm heading in the right direction - or is there a better way?

Thanks

chiro
Hey mrmt and welcome to the forums.

You are right in that the distribution is geometric, but I think you are not using the right definition for the cumulative distribution CDF for the geometric distribution.

According to wikipedia, the CDF of the cumulative distribution is given by:

P(X <= x) = 1 - (1-p)^x

In the above examples you're given x = r and for this r you're also given the cumulative probability, so you have to figure out r. This means for the 0.05 case we do:

P(X <= r) = 1 - (1-0.001)^r = 0.05 which implies
0.95 = (1-0.001)^r = 0.999^r
log_0.999(0.95) = r = ln(0.95)/ln(0.999) ~ 52 (rounded up)

For the other one we get

P(X <= r) = 1 - (0.999)^r = 0.90
log_0.999(0.1) = r = ln(0.1)/ln(0.999) ~ 2302 (rounded up)

Remember always round up for these kinds of problems.

You got the right answer but I feel you got it for the wrong reasons. Remember that the CDF is well defined for this problem and we need to use that definition: there is no need to split the probabilities like you have done and I think it's the wrong way to think about it. Remember that you are using the definition of the CDF directly and don't try and improvise.

If you need to understand my comments, then I will do my best to answer your questions.

Thank you chiro,

Your response helped me greatly and solidified my (lack of) understanding of a cdf.

However, in the interest of discussion/further learning only I do have take you up on one point:

You got the right answer but I feel you got it for the wrong reasons. Remember that the CDF is well defined for this problem and we need to use that definition: there is no need to split the probabilities like you have done and I think it's the wrong way to think about it. Remember that you are using the definition of the CDF directly and don't try and improvise.

The definition of a cdf is in fact a splitting of the probabilities. From wikipedia (which I'm now going to use more often because of your advice) a cdf "describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x"

i.e splitting the probabilities, as you put it, into two categories - what is ≤ x and what is > x.

I think you'll find that my working, although some of which was unnecessary given a solid understang of a cdf and where to find it, is still in fact the cdf for geometric distribution slightly rearranged:

P(X≤r) + P(X>r) = 1

P(X≤r) = 1 - P(x>r)

= 1 - (1-p)^r

I guess my slightly ego driven point is that there is no "right" way to view problems, even in mathematics...

chiro
Thank you chiro,

The definition of a cdf is in fact a splitting of the probabilities. From wikipedia (which I'm now going to use more often because of your advice) a cdf "describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x"

i.e splitting the probabilities, as you put it, into two categories - what is ≤ x and what is > x.

I think you'll find that my working, although some of which was unnecessary given a solid understang of a cdf and where to find it, is still in fact the cdf for geometric distribution slightly rearranged:

You are right in your definition for when you split up the probabilities using complementarity, but the only reason I made the comment I made was that you might get in situation where it is misused. This was an interpretation on my part and it may have standing or it may not. The important point was to treat the CDF in this context as a general thing for all events which means that you don't need to split things up. The real crux for my response was that I didn't understand why you split it up because it was unnecessary for solving the problem. It doesn't mean that you don't necessarily understand what you are doing but from my point of view it was unnecessary and as a consequence I generated a thought that you may be misunderstanding either the question or probability. Also remember correlation doesn't imply causation.

Don't stress about my comments for this issue though because if you understand the answer and agree with it, then whatever you did as long as you can put it into context with my suggestion should be enough. As you said mathematics can have multiple ways of getting to the answer and all of these are just as correct as one another and that is more true than most people (sometimes even mathematicians) realize.