Optimizing Response Rates: Statistical Analysis for Small Population Sizes

Diffy · Apr 8, 2013

Sorry I need help in a hurry. This is for work and I haven't done this in a long time.

I have a population of ~ 5,338,000

And I know 0.74% respond to something.

I want to know if a population of only 5,000 will respond better or worse than my 5 million.

I am worried that it is too small a population to test because my response rate is so small. How can I prove or disprove this using statistics?

Thanks,

mfb · Apr 8, 2013

What do you mean with "will respond better or worse"? A higher rate?
Without any data about the smaller population, there is no way to tell how it will react. You can assume the same response rate and calculate the distribution of replies, of course, but that won't give an interesting deviation from "the response rate is probably the same" (=the assumption).

Diffy · Apr 8, 2013

The basic issue is that we have a large population with a very low response rate. And we want to test a very small population to see if the rate will be higher, or will be worse.

I don't think the results of the low population test will be significant be because we are comparing it to a very high population with a very low rate.

Hopefully that makes sense.

jbriggs444 · Apr 8, 2013

Guessing here that we're talking about the distribution of the number of successes or failures in a set of 5000 independent events.

We have a large population which establishes a nominal success rate of 0.74 percent. That's the control group.

The experimental population is 5000 events. The question is what increased success rate would be required to have, for instance, 95% certainty that the increased success rate would not come from random chance alone. [Or conversely, what increased failure rate would be required to have 95% certainty that the reduced success rate would not come from random chance alone].

That sounds like a pretty standard exercise in confidence intervals. And this is a binomial distribution. So you look at the cumulative binomial distribution and find the 95th percentile. 95 percent of the time random chance would not produce a result that far out of whack. If your result is that high, you can have some confidence that it is a genuine result rather than a random fluke. [Or find the 5th percentile if you are looking for the opposite effect]

There are binomial calculators on the web. For samples this large, the ones that I found approximate the binomial distribution with a normal distribution.

Diffy · Apr 8, 2013

Right I understand how to compare them.

What I don't understand is that say I want to set up a test. I know that in 100,000 tries I get 70 successes.

I wouldn't test just trying 10 times. I would need some type of significant population to test against. I can I find out how many I need.

In my original example I don't even want to compare, because I don't think 5,000 is significant. How can I want to know how confident I am that that population size is enough.

Diffy · Apr 8, 2013

Really struggling with this. Is anyone around?

jbriggs444 · Apr 8, 2013

Diffy said:

What I don't understand is that say I want to set up a test. I know that in 100,000 tries I get 70 successes.

In your initial post you said 0.74 percent. Now it sounds like the true figure is a factor of ten lower -- 0.070 percent.

So in a population of 5000 you would expect around 3.7 successes.

In my original example I don't even want to compare, because I don't think 5,000 is significant. How can I want to know how confident I am that that population size is enough.

Note that I'm not a practicing statistician and it's been a lot of years since I studied this stuff.

How big a sample you need depends on how small an effect you are trying to measure.

If you want to distinguish between 0.070 percent and 0.080 percent then you'll need a larger sample than if you want to distinguish between 0.070 percent and 50 percent.

A confidence interval calculator reports that in order to sample from a population of five million individuals and get a result that is accurate to 0.01 percent (able to distinguish between 0.070 percent and 0.080 percent) then you need a sample size in excess of four million.

If you relax that to 0.1 percent then you need 800,000
If you relax that to 1 percent then you need 9500
If you relax that to 10 percent then you need 96.

This fits with the naive principle that in order to increase accuracy by a factor of x you have to increase sample size by a factor of x².

The confidence interval calculator I used is based on the notion of polling individuals from a finite population without replacement. Worst case you sample the whole population and get a perfectly accurate result. In the case at hand it might be more appropriate to think in terms of sampling from an infinite population. That increases the required sample sizes significantly.

0.01 percent needs a sample size of 96 million
0.1 percent needs a sample size of 960 thousand
1 percent needs a sample size of 9600
10 percent needs a sample size of 96.

This is all at the 95 percent confidence level. For 99 percent confidence you need bigger sample sizes.

Diffy · Apr 8, 2013

Thanks, that helped.

Do you happen to know the formulas behind the calculations?

Optimizing Response Rates: Statistical Analysis for Small Population Sizes

Graduate Expected numbers of cards of a last color remaining

Graduate Probability puzzle

Undergrad The problem of points

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect