Hypergeometric Probability Testing: Simple Question

Click For Summary
SUMMARY

The discussion centers on the hypergeometric probability distribution and its application using MATLAB. The user, Neil, attempts to calculate the probabilities of drawing red and black balls from a set using the MATLAB function hygecdf. He encounters confusion when the probabilities do not sum to one, which is clarified by another user who explains that the calculations are not mutually exclusive. The correct approach involves recognizing that the probabilities should be complementary, leading to the conclusion that Black = 1 - Red for accurate results.

PREREQUISITES
  • Understanding of hypergeometric probability distribution
  • Familiarity with MATLAB programming
  • Basic knowledge of cumulative distribution functions
  • Experience with probability concepts and calculations
NEXT STEPS
  • Explore MATLAB's hygepdf function for probability mass function calculations
  • Learn about contingency tables and their relation to hypergeometric distributions
  • Study the concept of complementary probabilities in statistical analysis
  • Review examples of hypergeometric distribution applications in biological research
USEFUL FOR

Statisticians, data analysts, researchers in biological sciences, and anyone interested in applying hypergeometric probability distributions in MATLAB.

neil.thompson
Messages
9
Reaction score
0
Hi everyone.



So I'm afraid I don't really know much about statistics, but I am trying to learn by working through a book, and taking some examples (I have mathematics experience, but from a biological perspective).

Just now, I am looking at the hypergeometric probability distribution. I have access to MATLAB so I have been playing around with examples in that. As I understand it, the hypergeometric probability distribution gives the probability of a number of positive results, given selection of a sample from a greater set (where the total successes are known). That seems simple enough.

However, I also expect that, as with everything else, the total probability sums to 1. So I am trying examples in MATLAB (http://www.mathworks.co.uk/help/toolbox/stats/hygecdf.html - there is an example on how to use it there too) and obviously doing something wrong? Imagine a total set of 61 balls. 30 of them are red and 31 of them are black. I take a sample of 34, without replacement, finding that 14 are red, and 20 are black.

I think this is OK - there seems to be no requirement on equal divisions between the colours or anything like that. So I run, for the cumulative probability:

Red=hygecdf(14,60,30,34);
Black=hygecdf(20,61,30,34);

I get Red = 0.1260, and Black = 0.9520. I think that these two calculations should be equivalent - they both have the same sample size (34) and that they should sum to 1, but obviously they do not - I am doing something very basic wrong. !


Sorry for all the words!

thank you,

Neil.
 
Physics news on Phys.org
neil.thompson said:
Hi everyone.
So I'm afraid I don't really know much about statistics, but I am trying to learn by working through a book, and taking some examples (I have mathematics experience, but from a biological perspective).

Just now, I am looking at the hypergeometric probability distribution. I have access to MATLAB so I have been playing around with examples in that. As I understand it, the hypergeometric probability distribution gives the probability of a number of positive results, given selection of a sample from a greater set (where the total successes are known).

I get Red = 0.1260, and Black = 0.9520. I think that these two calculations should be equivalent - they both have the same sample size (34) and that they should sum to 1, but obviously they do not - I am doing something very basic wrong. !

I don't quite follow your set up, but I can give you an example of the hypergeometric distribution in terms of a 2 x 2 table with fixed marginal totals:

a b

c d

where a+b, c+d, a+c and b+d are all fixed. Obviously the sum of all entries is also fixed. This is a two way contingency table where the variables in cells a, b, c, d follow a hypergeometric distribution when subject to the marginal constraints. Can you put your problem into this form? If you calculate probabilities based on individual column or row totals such as P(a)= a/(a+b), it is probabilities P(a) and P(b) that must sum to one based on the marginal total a+b. You need to check if you are using the appropriate denominators.

http://data.princeton.edu/wws509/notes/c5s1.html
 
Last edited:
neil.thompson said:
Red = hygecdf(14,60,30,34)

This is the probability that you will draw no more than 14 red balls when drawing 34 balls from a bag containing 60 balls, 30 of which are red.

neil.thompson said:
Black = hygecdf(20,61,30,34)

This is the probability that you will draw no more than 20 black balls when drawing 34 balls from a bag containing 61 balls, 30 of which are black.

Neither of these Matlab statements describe your result, the appropriate statements are:

Red = hygecdf(14,61,30,34);
Black = hygecdf(20,61,31,34);

... from the numerical results you give I can see that these were in fact the statements you used (that killed a bit of time!)

The reason they do not sum to 1 is that these partial results are not mutually exclusive - of course we know that, because they both happened in the same trial!

If you want two probabilities that sum to 1, you want Black to be 1 - Red, in other words Black must be the complement of Red. As Red is the probability that no more than 14 of the balls will be red, Black needs to be the probability that more than 14 of the balls will be red - which would mean that no more than 19 can be black.

Lo and behold, hygecdf(19,61,31,34) = 0.8740 which is 1 - 0.1260.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 3 ·
Replies
3
Views
851
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 36 ·
2
Replies
36
Views
5K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 53 ·
2
Replies
53
Views
8K