Register to reply 
Question about sampling size. 
Share this thread: 
#1
Mar212, 07:00 PM

P: 206

My experience with statistics is very limited although I understand probability theory quite well. My friend, an accountant, asked a question. He does attribute sampling. I understand most of it intuitively and the math was what I expected it to look like when I got curious and started reading. In determining the sample size necessary they set a confidence level based on the risk they are willing to take and a tolerable deviation which is the maximum acceptable percentage of the population with the attribute. The weird thing is that, in addition, they use an "expected deviation" which appears to be a guess at the result. They use this guess, combined with the other two factors to determine the sample size. According to him, they just set the expected deviation according to how much work they want to do. If the expected deviation is low then the sample size is smaller and they do less work. So, my question is, is there a theoretical basis for using this estimated deviation in the determination of necessary sample size? If so, is there a reference that would explain this to me. I'm sure that setting it on the basis of how motivated one feels today is not necessarily valid.



#2
Mar312, 10:44 AM

Sci Advisor
P: 3,313

Since no authority on the subject has stepped forward, it would be interesting to discuss this.
Researching topics about auditing on the web isn't very exciting, but after a few minutes of it, I conclude "tolerable deviation" is like an "acceptable defect level". A manufacturer may know that even when a production line is operating properly, it may produce a certain fraction of defective items. The goal of his sampling would be to test if current state of the production line produces no more than this fraction. "Attribute" sampling seems to amount to sampling a bernoulli random variable which indicates whether the item is defective or notdefective. Let [itex] f [/itex] be the fraction of defective items in a sample of size [itex] n [/itex]. Let [itex] p [/itex] be the fraction of defective items in the the population. I speculate that the accounting math is focused on computing the probability that [itex] f [/itex] lies within a certain interval of [itex] p [/itex]. I don't know whether the scenario your friend uses assumes that [itex] p [/itex] is known. 


#3
Mar312, 02:28 PM

P: 206

Yes Stephen, this is exactly the problem. "Tolerable deviation" is just the acceptable percentage of defects in the population. If the the percentage of defects exceeds the tolerable amount then the population is rejected. My problem is how does one calculate the necessary sample size. Each sample is binomially distributed so we have to assume some bound on the actual percentage of defects in the population in order to approximate the binomial distribution with a normal distribution. Yet they use tables which give the "expected deviation" from zero to some large percentage in increments of 0.25%, and use this number to set the sample size. To me, that seems entirely equivalent to presupposing the actual defect rate that you're seeking. That doesn't make any sense to me. I know little about statistics but I do know that a lot of people with very little math knowledge use it regularly.



#4
Mar312, 06:21 PM

P: 2,504

Question about sampling size.
You can find sample size calculators online. For the math, go to section 4 of the following reference. For rare events in large samples, the Poisson distribution is probably the best. The expectation would be the acceptable rate of defects and you're interested in the probability of 0 given this rate. http://www.cqeweb.com/chapters/chapter6.pdf 


#5
Mar312, 08:49 PM

Sci Advisor
P: 3,313

What I find interesting about the current topic is that the online stuff I saw does not clearly classify what they are doing as "hypothesis testing" or as "estimation". In the usual sort of statiistics one establishes a sample size in order to obtain a certain "confidence interval" or estimating a parameter of a population, such as the fraction of defects. In "hypothesis testing" a decision is being made and the the size of the sample affects the thresholds set for a given "type I error", which is the probability of "incorrectly rejecting the null hypothesis when it is true". The actual computations in these two situation are often the same arithmetic, but it is easier to understand the use of the "tolerable deviation" if we consider their process to be hypothesis testing. In that case, the "null hypothesis" would be that the fraction of defects in the population is the tolerable deviation. The usual specifics for a hypothesis test would be to establish a certain "pvalue" such as 0.05. For a sample of a given size n, there is a certain threshold fraction of defects [itex] f_0 [/itex] such that the probabily of the observing [itex] f0 [/itex] or more defects in the sample is equal to the pvalue (assuming the null hypothesis is true.) In the usual senario, we aren't solving for the sample size. I don't understand exactly what the "givens" are in the problem that your friend works. Besides "tolerable deviations", what else is given? I don't know how to interpret "expected deviation". Perhaps one could pose a problem like "Given the tolerable fraction of defects is 0.03, what statistical test could I use that would give me a 0.95 probability of correctly judging that a population with a fraction of 0.07 defective items is one that exceeds the fraction of tolerable defects?". 


#6
Mar412, 03:24 AM

P: 2,504

Please explain. You need to establish the sample size for rejecting the null hypothesis for some value of alpha. If you set a one sided alpha at 0.05, the sample size N will be substantially smaller than for a two sided alpha of 0.001. If you're concerned about defects, you want a small value for alpha. Finding no defects in a large sample is obviously more convincing than finding no defects in a small sample. However, as I said, to establish zero defects as a certainty, you must have a full census of the population. 


#7
Mar412, 10:12 AM

P: 206

Maybe I should have been more clear. Even knowing little to nothing about statistics the problem seemed fairly straightforward to me except for this "expected deviation".
There is a large population of size N where N>1,000, possibly N>10,000. Errors occur in individuals with some probability p so each individual trial is Bernoulli. Then we can assume that p, for large N, also represents the rate of occurrence of errors in the population. If I consider the set of all subsets of the population of size n<N, then the number of errors in these subsets are binomially distributed. I set a confidence level and reject the entire population if the rate of errors exceeds some threshold. As SW points out above, this threshold has an effect on sample size but in my case this threshold for rejection is probably on the order of 510% so I don't have to worry about the extreme case. All of this seems straightforward and, even knowing little about statistics, I can make approximations to necessary sample size that reasonably agree with results from tables or software. Now there is this "expected deviation" which is apparently part of the determination of sample size. It seems reasonable, although I haven't done any calculations, that if I repeatedly sampled this same population, then I could improve my estimates of the true error rate in the population and thus reduce the sample size as I drew more samples. But in this case, it appears that an estimate of the true error rate (expected deviation) is used in the calculation of the sample size n for the first and only sample to be drawn. This seems sloppy to me without a very sound basis for predicting the true rate, such as having drawn previous samples. What are these tables being used that include expected rate. What are the conditions for the use of these tables? 


#8
Mar412, 10:13 AM

Sci Advisor
P: 3,313

All I'm saying is that it is common in hypothesis testing to be given the pvalue and the sample size and be asked to determine the acceptance region. In estimation, it is common to be given the confidence level and the size of the confidence interval and be asked to find the sample size. I agree that you can pose a variety of problems in either scenario that involve being given two out of three of the variables and being asked to find the third. 


#9
Mar412, 07:25 PM

P: 4,575

To add to this discussion, I want to remind the OP and the readers that in certain cases, optimization can be used to find the minimum sample size for a system given its constraints.
This is used in for example surveys especially when you have stratification and I imagine that its used quite frequently elsewhere when the connections between different parameters are known. Also what you are mentioning sounds a lot like the kind of stuff they do in the sixsigma work and Total Quality Management. I'll be doing some TQM stuff next semester so I can't really give anything useful in that regard at the moment, but if you are interested OP, then these subjects might give you a better insight into the lifecycle of detecting and managing defects of some sort. 


#10
Mar412, 10:58 PM

P: 34

The variance of the attribute is a function of p*(1p), so taking a purposely large value of p is fine, as taking p=.5 to obtain the largest variance is too conservative (labor intensive).
Better idea: construct a loss function balancing the cost of labor and the potential loss of sales, profits, upon misestimation. Another approach: A two stage sample size estimator (Stein). The 1st sample estimates p and incorporates this value into the second stage sample size estimate. 


#11
Mar512, 01:29 AM

Sci Advisor
P: 3,313




Register to reply 
Related Discussions  
Down sampling, bandpass sampling theorem, downconversion  Electrical Engineering  1  
Nyquist Sampling Rate Question  Electrical Engineering  20  
Easy question. Sampling techniques.  Set Theory, Logic, Probability, Statistics  1  
Nyquist Sampling Thm  Question 2  Engineering, Comp Sci, & Technology Homework  1  
Nyquist Sampling Thm  Question  Engineering, Comp Sci, & Technology Homework  0 