View Single Post
Mar4-12, 10:12 AM
P: 199
Maybe I should have been more clear. Even knowing little to nothing about statistics the problem seemed fairly straightforward to me except for this "expected deviation".

There is a large population of size N where N>1,000, possibly N>10,000. Errors occur in individuals with some probability p so each individual trial is Bernoulli. Then we can assume that p, for large N, also represents the rate of occurrence of errors in the population. If I consider the set of all subsets of the population of size n<N, then the number of errors in these subsets are binomially distributed. I set a confidence level and reject the entire population if the rate of errors exceeds some threshold. As SW points out above, this threshold has an effect on sample size but in my case this threshold for rejection is probably on the order of 5-10% so I don't have to worry about the extreme case. All of this seems straightforward and, even knowing little about statistics, I can make approximations to necessary sample size that reasonably agree with results from tables or software.

Now there is this "expected deviation" which is apparently part of the determination of sample size. It seems reasonable, although I haven't done any calculations, that if I repeatedly sampled this same population, then I could improve my estimates of the true error rate in the population and thus reduce the sample size as I drew more samples. But in this case, it appears that an estimate of the true error rate (expected deviation) is used in the calculation of the sample size n for the first and only sample to be drawn. This seems sloppy to me without a very sound basis for predicting the true rate, such as having drawn previous samples. What are these tables being used that include expected rate. What are the conditions for the use of these tables?