# Population stats question

1. Apr 8, 2013

### Diffy

Sorry I need help in a hurry. This is for work and I haven't done this in a long time.

I have a population of ~ 5,338,000

And I know 0.74% respond to something.

I want to know if a population of only 5,000 will respond better or worse than my 5 million.

I am worried that it is too small a population to test because my response rate is so small. How can I prove or disprove this using statistics?

Thanks,

2. Apr 8, 2013

### Staff: Mentor

What do you mean with "will respond better or worse"? A higher rate?
Without any data about the smaller population, there is no way to tell how it will react. You can assume the same response rate and calculate the distribution of replies, of course, but that won't give an interesting deviation from "the response rate is probably the same" (=the assumption).

3. Apr 8, 2013

### Diffy

The basic issue is that we have a large population with a very low response rate. And we want to test a very small population to see if the rate will be higher, or will be worse.

I don't think the results of the low population test will be significant be because we are comparing it to a very high population with a very low rate.

Hopefully that makes sense.

4. Apr 8, 2013

### jbriggs444

Guessing here that we're talking about the distribution of the number of successes or failures in a set of 5000 independent events.

We have a large population which establishes a nominal success rate of 0.74 percent. That's the control group.

The experimental population is 5000 events. The question is what increased success rate would be required to have, for instance, 95% certainty that the increased success rate would not come from random chance alone. [Or conversely, what increased failure rate would be required to have 95% certainty that the reduced success rate would not come from random chance alone].

That sounds like a pretty standard exercise in confidence intervals. And this is a binomial distribution. So you look at the cumulative binomial distribution and find the 95th percentile. 95 percent of the time random chance would not produce a result that far out of whack. If your result is that high, you can have some confidence that it is a genuine result rather than a random fluke. [Or find the 5th percentile if you are looking for the opposite effect]

There are binomial calculators on the web. For samples this large, the ones that I found approximate the binomial distribution with a normal distribution.

Last edited: Apr 8, 2013
5. Apr 8, 2013

### Diffy

Right I understand how to compare them.

What I don't understand is that say I want to set up a test. I know that in 100,000 tries I get 70 successes.

I wouldn't test just trying 10 times. I would need some type of significant population to test against. I can I find out how many I need.

In my original example I don't even want to compare, because I don't think 5,000 is significant. How can I want to know how confident I am that that population size is enough.

6. Apr 8, 2013

### Diffy

Really struggling with this. Is anyone around?

7. Apr 8, 2013

### jbriggs444

In your initial post you said 0.74 percent. Now it sounds like the true figure is a factor of ten lower -- 0.070 percent.

So in a population of 5000 you would expect around 3.7 successes.

Note that I'm not a practicing statistician and it's been a lot of years since I studied this stuff.

How big a sample you need depends on how small an effect you are trying to measure.

If you want to distinguish between 0.070 percent and 0.080 percent then you'll need a larger sample than if you want to distinguish between 0.070 percent and 50 percent.

A confidence interval calculator reports that in order to sample from a population of five million individuals and get a result that is accurate to 0.01 percent (able to distinguish between 0.070 percent and 0.080 percent) then you need a sample size in excess of four million.

If you relax that to 0.1 percent then you need 800,000
If you relax that to 1 percent then you need 9500
If you relax that to 10 percent then you need 96.

This fits with the naive principle that in order to increase accuracy by a factor of x you have to increase sample size by a factor of x2.

The confidence interval calculator I used is based on the notion of polling individuals from a finite population without replacement. Worst case you sample the whole population and get a perfectly accurate result. In the case at hand it might be more appropriate to think in terms of sampling from an infinite population. That increases the required sample sizes significantly.

0.01 percent needs a sample size of 96 million
0.1 percent needs a sample size of 960 thousand
1 percent needs a sample size of 9600
10 percent needs a sample size of 96.

This is all at the 95 percent confidence level. For 99 percent confidence you need bigger sample sizes.

Last edited: Apr 8, 2013
8. Apr 8, 2013

### Diffy

Thanks, that helped.

Do you happen to know the formulas behind the calculations?