Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Sampling accuracy estimation

  1. Oct 28, 2014 #1
    Good afternoon!

    Suppose I have a box with N marbles of different color and I want to know the ratio of the number of green ones to N (number X). The number of marbles (N) is so huge that there`s absolutely no way to get them all out of the box and count. What I do instead is I take a sample of M randomly picked marbles and count the ratio of green ones within this sample. I obtain another value - Y.

    It is obvious that as M approaches N, Y approaches X. The question is: up to what significant figure can I trust the value of Y?

    I know that to get a proper answer I have to calculate probabilities of each significant figure being correct and then establish a trust boundary (let`s say, if it is 90% or more - I consider it to be correct). And I do understand the process of calculation. My question is the following:

    Is there an already well established solution for this problem that I can just quote in my essay and avoid adding unnecessary information? I`m pretty sure that it exists, since I can`t be the only one facing this problem. I just need the name :)

    Deep Thanks!
  2. jcsd
  3. Oct 29, 2014 #2

    Simon Bridge

    User Avatar
    Science Advisor
    Homework Helper

    Say you draw n<<N marbles, and g of them are green, then you want to use this result as an estimator for the probability of drawing a green marble out of the population? i.e. P(g)=g/n ... how good an estimator is this?
    Well, if n=1, that's a pretty bad estimator ...

    Working out population statistics from sample statistics is well studied and there are lots of ways to approach it - covered in standard statistics text books. For you, I think: look up "Bayesian Analysis" - it helps to think of the sample as n independent trials with possible outcomes g and not-g.
  4. Oct 29, 2014 #3

    Stephen Tashi

    User Avatar
    Science Advisor

    The name "confidence interval" is often associated with the scenario you describe, but it does not give you the ironclad guarantees that you want - even though many people misinterpret it as doing so. (For example "90% confidence" isn't synonymous with "90% probability".) http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval

    If you really want to make the claim that "There is a 90% probablity that the actual proportion of the population is in the following interval:..." then, as Simon Bridge said, you must take a Bayesian approach.
  5. Oct 29, 2014 #4


    User Avatar
    Science Advisor

    Your case can be modeled by a binomial distribution. The standard deviation is √(p(1-p)/n), where p is estimated by the sample average and n is the sample size.
  6. Oct 29, 2014 #5
    Thanks to everyone!

    The solution came from an A-Level book in the form of confidence intervals. Special thanks to Stephen for telling me what to look for in the book`s index. Unfortunately, overloading the essay with mathematical stuff appeared to be inevitable :(.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook