What is the Best Way to Determine Sampling Accuracy in Large Populations?

Virous · Oct 28, 2014

Good afternoon!

Suppose I have a box with N marbles of different color and I want to know the ratio of the number of green ones to N (number X). The number of marbles (N) is so huge that there`s absolutely no way to get them all out of the box and count. What I do instead is I take a sample of M randomly picked marbles and count the ratio of green ones within this sample. I obtain another value - Y.

It is obvious that as M approaches N, Y approaches X. The question is: up to what significant figure can I trust the value of Y?

I know that to get a proper answer I have to calculate probabilities of each significant figure being correct and then establish a trust boundary (let`s say, if it is 90% or more - I consider it to be correct). And I do understand the process of calculation. My question is the following:

Is there an already well established solution for this problem that I can just quote in my essay and avoid adding unnecessary information? I`m pretty sure that it exists, since I can`t be the only one facing this problem. I just need the name :)

Deep Thanks!

Simon Bridge · Oct 29, 2014

Virous said:

Suppose I have a box with N marbles of different color and I want to know the ratio of the number of green ones to N (number X). The number of marbles (N) is so huge that there`s absolutely no way to get them all out of the box and count. What I do instead is I take a sample of M randomly picked marbles and count the ratio of green ones within this sample. I obtain another value - Y.

Say you draw n<<N marbles, and g of them are green, then you want to use this result as an estimator for the probability of drawing a green marble out of the population? i.e. P(g)=g/n ... how good an estimator is this?
Well, if n=1, that's a pretty bad estimator ...

Working out population statistics from sample statistics is well studied and there are lots of ways to approach it - covered in standard statistics textbooks. For you, I think: look up "Bayesian Analysis" - it helps to think of the sample as n independent trials with possible outcomes g and not-g.

Stephen Tashi · Oct 29, 2014

Virous said:

Is there an already well established solution for this problem that I can just quote in my essay and avoid adding unnecessary information? I`m pretty sure that it exists, since I can`t be the only one facing this problem. I just need the name :)

The name "confidence interval" is often associated with the scenario you describe, but it does not give you the ironclad guarantees that you want - even though many people misinterpret it as doing so. (For example "90% confidence" isn't synonymous with "90% probability".) http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval

If you really want to make the claim that "There is a 90% probablity that the actual proportion of the population is in the following interval:..." then, as Simon Bridge said, you must take a Bayesian approach.

mathman · Oct 29, 2014

Your case can be modeled by a binomial distribution. The standard deviation is √(p(1-p)/n), where p is estimated by the sample average and n is the sample size.

Virous · Oct 29, 2014

Thanks to everyone!

The solution came from an A-Level book in the form of confidence intervals. Special thanks to Stephen for telling me what to look for in the book`s index. Unfortunately, overloading the essay with mathematical stuff appeared to be inevitable :(.

What is the Best Way to Determine Sampling Accuracy in Large Populations?

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Roulette wheel physics and probability'

Thread 'Detail of Diagonalization Lemma'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective