# A Stopping rule for quality control problem

Tags:
1. Sep 2, 2016

### estebanox

Problem:

Suppose I have a production process that yields output in batches of n items. For each batch, I can test whether they are of good or bad quality. Let q_i ∈ {1,0} be the quality of tested item i.

If more than half of the items are ‘bad’, the batch should be discarded. In other words: the batch should be discarded if the average quality of items is q(n)<0.5.

Suppose testing is sequential (i.e. you learn the quality of items one at the time), and each test is a random draw from a Bernoulli distribution with unknown mean p.

I would like to know when to stop testing items in a batch, in order to decide whether to discard it without testing all items, and yet be statistically confident that the outcome (discard vs not discard) is the same as the one that would have been reached if all items had been tested.

What I currently have:

If I knew p, I could use a normal approximation to estimate the confidence interval of the binomial proportion q(k) for any k. With this, for any arbitrary width of the interval, I could calculate the number of tests that are required to achieve a given confidence level (let me call such number of tests k*).

Since I don’t know p, one option would be to solve for k* assuming the worst case scenario (i.e. p=0.5). This was proposed (and discussed) in a related question here. Another similar alternative was proposed here.

This approach, however, does not take into account that n is finite, so once many items have been tested, it becomes very unlikely that one more test will swing the outcome...

2. Sep 2, 2016

### micromass

Staff Emeritus
This is interesting, because you'll be doing two estimations at the same time. Indeed, while testing a batch you are getting information on $p$ and on $q(n)$ at the same time. So if you test $100$ items and they all turn out to be positive, then it's no longer a good idea to assume the worst case scenario $p=0.5$. Indeed, you will want to update your confidence on $p$.

This can be done adequately with Bayesian statistics. In Bayesian statistics, you have your parameter $p$ which is essentially unknown to you. What's different is that now you can give a probability distribution to $p$, which corresponds to your beliefs on what $p$ is.

Of course, in the beginning you have no idea what $p$ is. So you could use what is called a "vague prior" on $p$, or you could be more pessimistic and assume that $p$ has the worst case scenario, the latter option is more conservative.

Anyway, what happens then is you test a number of items. The tests will turn out to be positive or negative. The results of these tests allow you to update your beliefs on $p$. Using this updated belief, you can then generate a predictive distribution of $q(n)$ and see whether it is good enough.

So let me put this in an example. We have $1000$ items. We test $100$ items.
In the beginning, I have no information on my probability $p$. So I give it a vague prior. I have a lot of choice here, but I choose for a uniform distribution on $[0,1]$. This means that I allow my probability $p$ to be anything with equal chance. Important for computations is also that the uniform distribution on $[0,1]$ is a beta distribution, namely beta(1,1).

Now assume that I know my probability $p$ (I don't, but let's assume). We test $100$ items and $20$ are inadequate. What's the probability of this? Well, it's not too farfetched to model this with a binomial distribution. The probability of $20$ inadequate samples is $\binom{100}{20}p^{20}(1-p)^{80}$.

Now I can combine my uniform prior and my binomial distribution using Bayes' theorem. This will give a posterior distribution on $p$. Basically, my posterior distribution will be proportional to $p^{20}(1-p)^{80}$. This is a beta distribution again (this is no coincidence) with parameters beta(21,81). You should graph it to see what happens.

Now we can compute the likelihood of what happens with the other $900$ samples. We already had $20$ inadequate samples, so we need to know when there will be $480$ other inadequate samples or more. If we knew $p$, then the probability of this happening is $\sum_{k=480}^{900} \binom{900}{k} p^k (1-p)^{900-k}$. We don't know $p$ but we know a distribution of $p$. So we can use integration to find a probability, namely
$$\int_0^1 \sum_{k=480}^{900} \binom{900}{k} p^k (1-p)^{900-k} \frac{1}{B(21,81)}p^{20}(1-p)^{80} dp = \sum_{k=480}^{900} \binom{900}{k}\frac{1}{B(21,81)} \int_0^1 p^{20+k}(1-p)^{1080-k}dp$$
The integral above is B(21+k, 1081-k). So I end up with
$$\sum_{k=480}^{900} \binom{900}{k} \frac{B(21+k,1081-k)}{B(21,81)}$$
I could use the CLT to compute this probability, or use software. If the probability is $<0.05$ (or another value), then I can say my batch is adequate, if it is larger than $0.95$, then the batch is inadequate. Otherwise I need to go on.

How to go on? Well, I test $100$ more items. Let's say $50$ are inadequate. My prior this time is beta(21,81) and I combine this with the result of my test which is $\binom{100}{50} p^{50}(1-p)^{50}$. I get a posterior probability on $p$ of $beta(21+50, 81+50)$. I can then again predict what happens if I test the $800$ other samples.

3. Sep 2, 2016

### Staff: Mentor

You use the observed rate to construct the confidence interval.

If p=0.5, no test can guarantee to tell you if p<0.5. You can't even decide clearly if the fraction of good items is below 0.5 without testing the full sample, and every test that will typically stop after a small subsample has to give a nearly random result.

You could keep testing until the confidence interval (90%, 95% or whatever) is fully below 0.5, for example. Use this confidence interval for the untested items only and add the tested items with their known quality to it. If you can make keep/discard choices for each item individually, doing so would clearly improve the quality here.

I don't know the application, but if money is involved a Bayesian approach is probably better. Your process won't give all possible p values with a uniform distribution, this is information (gained over time) that can be used in the process.

I'm also wondering how fixed that limit of 0.5 is. What would you prefer?
- a batch where it is known that a fraction of 0.51 is good.
- a batch where there is a 1% probability that a fraction of 0.49 is good, and 99% probability that all are good.

4. Sep 2, 2016

### Staff: Mentor

Here is a good starting point for understanding Bayesian decision making

http://www.stat.ucla.edu/~yuille/courses/Stat161-261-Spring13/LectureNote2.pdf [Broken]

Last edited by a moderator: May 8, 2017
5. Sep 3, 2016

### Heinera

To OP: To find the optimal stopping for this probem, you would also have to specify the cost of doing each test, and the costs of making a wrong decision (i.e., discarding an OK batch, or letting a faulty batch slip through).

6. Sep 6, 2016

### estebanox

@micromass, thanks a lot for the suggestion of using a Bayesian approach– and for spelling it out. I'll try to implement in computationally.

7. Sep 6, 2016

### estebanox

Do you mean "if testing is very costly"? In my application it is costly (not cash, but that's besides the point). In other words, I am willing to trade statistic confidence for number of tests.

Regarding your last question, the acceptance criterion in my application is not necessarily 0.5, but it is sharp: I do prefer knowing that a batch is above the threshold, even if only marginally.

I suppose both of these things point to the Bayesian approach. I'll look into it (thanks @Dale for the reference)

8. Sep 6, 2016

### Staff: Mentor

No, I mean the difference between a pure research environment and running a business. For a scientific study, you might be interested in finding the fraction of batches where more than half of the samples are good. If you want to sell something, it is probably not the overall goal to determine that number as precisely as possible.
That will need many tests then, if the sample is close to the threshold.

9. Sep 11, 2016

### chiro

Hey estebanox.

The only thing I'd add to the comments above is to consider the correlation aspect involved and note that a Binomial (or it's Bayesian equivalent the Beta distribution) is probably not going to be a good assumption to use on a line.

Typically if some process is making a bad batch then it impacts things in a correlated way - not an independent one.

If you are testing for a bad batch you should evaluate the physical process and see how that is likely to impact on the result of a positive or negative test. You will probably find that with more dependencies, this correlation component will be significant. Usually you assume independence if everything physically is disconnected from everything else but follows the same sort of layout or procedure.

As an example - you could two assembly lines with exactly the same design and components and they should be independent but if you had the one line then you would expect a lot of correlation to exist between whether items on that line were faulty or not.