What is the Best Way to Determine Sampling Accuracy in Large Populations?

Virous · Oct 28, 2014

Good afternoon!

Suppose I have a box with N marbles of different color and I want to know the ratio of the number of green ones to N (number X). The number of marbles (N) is so huge that there`s absolutely no way to get them all out of the box and count. What I do instead is I take a sample of M randomly picked marbles and count the ratio of green ones within this sample. I obtain another value - Y.

It is obvious that as M approaches N, Y approaches X. The question is: up to what significant figure can I trust the value of Y?

I know that to get a proper answer I have to calculate probabilities of each significant figure being correct and then establish a trust boundary (let`s say, if it is 90% or more - I consider it to be correct). And I do understand the process of calculation. My question is the following:

Is there an already well established solution for this problem that I can just quote in my essay and avoid adding unnecessary information? I`m pretty sure that it exists, since I can`t be the only one facing this problem. I just need the name :)

Deep Thanks!

Simon Bridge · Oct 29, 2014

Virous said:

Suppose I have a box with N marbles of different color and I want to know the ratio of the number of green ones to N (number X). The number of marbles (N) is so huge that there`s absolutely no way to get them all out of the box and count. What I do instead is I take a sample of M randomly picked marbles and count the ratio of green ones within this sample. I obtain another value - Y.

Say you draw n<<N marbles, and g of them are green, then you want to use this result as an estimator for the probability of drawing a green marble out of the population? i.e. P(g)=g/n ... how good an estimator is this?
Well, if n=1, that's a pretty bad estimator ...

Working out population statistics from sample statistics is well studied and there are lots of ways to approach it - covered in standard statistics textbooks. For you, I think: look up "Bayesian Analysis" - it helps to think of the sample as n independent trials with possible outcomes g and not-g.

Stephen Tashi · Oct 29, 2014

Virous said:

Is there an already well established solution for this problem that I can just quote in my essay and avoid adding unnecessary information? I`m pretty sure that it exists, since I can`t be the only one facing this problem. I just need the name :)

The name "confidence interval" is often associated with the scenario you describe, but it does not give you the ironclad guarantees that you want - even though many people misinterpret it as doing so. (For example "90% confidence" isn't synonymous with "90% probability".) http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval

If you really want to make the claim that "There is a 90% probablity that the actual proportion of the population is in the following interval:..." then, as Simon Bridge said, you must take a Bayesian approach.

mathman · Oct 29, 2014

Your case can be modeled by a binomial distribution. The standard deviation is √(p(1-p)/n), where p is estimated by the sample average and n is the sample size.

Virous · Oct 29, 2014

Thanks to everyone!

The solution came from an A-Level book in the form of confidence intervals. Special thanks to Stephen for telling me what to look for in the book`s index. Unfortunately, overloading the essay with mathematical stuff appeared to be inevitable :(.

What is the Best Way to Determine Sampling Accuracy in Large Populations?

1. What is sampling accuracy estimation?

2. Why is sampling accuracy estimation important?

3. What factors can affect sampling accuracy estimation?

4. How is sampling accuracy estimation calculated?

5. What are some common methods for improving sampling accuracy estimation?

Similar threads

Hot Threads

Recent Insights