# Hypergeometric Probability Testing: Simple Question

1. Jan 17, 2012

### neil.thompson

Hi everyone.

So I'm afraid I don't really know much about statistics, but I am trying to learn by working through a book, and taking some examples (I have mathematics experience, but from a biological perspective).

Just now, I am looking at the hypergeometric probability distribution. I have access to MATLAB so I have been playing around with examples in that. As I understand it, the hypergeometric probability distribution gives the probability of a number of positive results, given selection of a sample from a greater set (where the total successes are known). That seems simple enough.

However, I also expect that, as with everything else, the total probability sums to 1. So I am trying examples in MATLAB (http://www.mathworks.co.uk/help/toolbox/stats/hygecdf.html - there is an example on how to use it there too) and obviously doing something wrong? Imagine a total set of 61 balls. 30 of them are red and 31 of them are black. I take a sample of 34, without replacement, finding that 14 are red, and 20 are black.

I think this is OK - there seems to be no requirement on equal divisions between the colours or anything like that. So I run, for the cumulative probability:

Red=hygecdf(14,60,30,34);
Black=hygecdf(20,61,30,34);

I get Red = 0.1260, and Black = 0.9520. I think that these two calculations should be equivalent - they both have the same sample size (34) and that they should sum to 1, but obviously they do not - I am doing something very basic wrong. !

Sorry for all the words!

thank you,

Neil.

2. Jan 17, 2012

### SW VandeCarr

I don't quite follow your set up, but I can give you an example of the hypergeometric distribution in terms of a 2 x 2 table with fixed marginal totals:

a b

c d

where a+b, c+d, a+c and b+d are all fixed. Obviously the sum of all entries is also fixed. This is a two way contingency table where the variables in cells a, b, c, d follow a hypergeometric distribution when subject to the marginal constraints. Can you put your problem into this form? If you calculate probabilities based on individual column or row totals such as P(a)= a/(a+b), it is probabilities P(a) and P(b) that must sum to one based on the marginal total a+b. You need to check if you are using the appropriate denominators.

http://data.princeton.edu/wws509/notes/c5s1.html

Last edited: Jan 17, 2012
3. Jan 17, 2012

### MrAnchovy

This is the probability that you will draw no more than 14 red balls when drawing 34 balls from a bag containing 60 balls, 30 of which are red.

This is the probability that you will draw no more than 20 black balls when drawing 34 balls from a bag containing 61 balls, 30 of which are black.

Neither of these Matlab statements describe your result, the appropriate statements are:

Red = hygecdf(14,61,30,34);
Black = hygecdf(20,61,31,34);

... from the numerical results you give I can see that these were in fact the statements you used (that killed a bit of time!)

The reason they do not sum to 1 is that these partial results are not mutually exclusive - of course we know that, because they both happened in the same trial!

If you want two probabilities that sum to 1, you want Black to be 1 - Red, in other words Black must be the complement of Red. As Red is the probability that no more than 14 of the balls will be red, Black needs to be the probability that more than 14 of the balls will be red - which would mean that no more than 19 can be black.

Lo and behold, hygecdf(19,61,31,34) = 0.8740 which is 1 - 0.1260.