Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

A Sample size required in hypergeometric test

  1. Oct 25, 2016 #1


    User Avatar

    I have a hypergeometric distribution with:

    N=total population of red and green balls, I now this
    K=total number of red balls, I don't know this
    n=sample size (number of investigated balls), I can choose this
    k=number of investigated balls that are red, I don't know this

    Red balls are a problem and I want to make sure, with a certainty c, that in the total population the fraction K/N of red balls is lower that a certain value, called p_max.

    How big must n be? And for a given n; what is the maximum value of k in order to approve the total population N. (I mean if I cannot guarantee with certainty c that K/N is below p_max I have to scrap the entire population N.)

    For example, N=1000, c=0.95, and p=0.1.

    I hope this is graduate level, at least it beats me :-)
  2. jcsd
  3. Oct 25, 2016 #2


    User Avatar
    2017 Award

    Staff: Mentor

    Depends on K.

    As an example, in a box with 28 red balls and 72 green balls, it is hard to figure out (with some confidence) if the fraction of red balls is larger than 30%: you'll need to investigate most balls. If the box has 0 red balls and 100 green balls, you can stop after investigating just about 10 balls.

    You can re-evaluate probabilities (for p_max and for the observed fraction) after each drawn ball, but then requiring some given confidence level does not work easily as you do p-hacking.
  4. Oct 25, 2016 #3


    User Avatar

    Many thanks!
    1) What does "p-hacking" mean?
    2) Assume I know K. Could your example with 28/72 and 0/100 balls be expressed so n is given in a mathematical expression in terms of N, c, p, and maximum k, or similar. Or a reference to such an expression.
    3) If N=1000 balls in total, I draw 100 balls of which 8 are read. What can I say about K/N? I mean in terms of K/N=0.08 +/- x, with some certainty c, for example 0.95. Almost like my original question but a double-sided interval. Maybe this can be solved without re-evaluation after each drawn ball.
  5. Oct 26, 2016 #4


    User Avatar
    2017 Award

    Staff: Mentor

    About 50 million google hits...
    It means looking for something falling below/above some arbitrary threshold until you find something.
    I'm not sure if there is a closed form, but you can certainly analyze every case separately.
    Construct confidence intervals via the hypergeometric distribution.
  6. Oct 26, 2016 #5

    Stephen Tashi

    User Avatar
    Science Advisor

    "Certainty" and "probability" are two diametrically opposed concepts.

    To have a mathematical problem, you must define what you mean by "with certainty c".

    If "c" is supposed to represent a probability, then what event is it that must have probability "c"? What is the sample space in which such an event is defined ?

    For example if "c" is "the probability that K/N < p_max" then what is the sample space for the event "K/N < p_max"? Presumably it has to be a space where there are various possible values of K/N. How do you assign a probability distribution to the events in this space? (This leads to Bayesian statistical methods.)

    Does "c" represent statistical "confidence"? In the scenario for "confidence" there is some population parameter P and sample statistic used to estimate it. You are not asking about how to estimate the population parameter K/N, you are asking about estimating (in the sense of "yes" or "no") whether it satisfies a certain inequality. So we'll have to think carefully about how "confidence" relates to your question.

    As far a computations, the problem might not be on the graduate level. But, as far as conceptual understanding, it is exceeds the level that typical undergraduates attain. To get a mathematical answer, you have to struggle with the question of "What am I really asking?".
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted