Stats Q Help: Find Probability of Contaminated Cherry Pies

ChrisJ · Oct 20, 2018

Hi, my main sticking point with the following is which bit of statistics/probability theory is needed to answer the question. We've only been taught basic Bayes theory, the standard pdfs and a little on hypothesis testing. I have spent way too much time just trying to figure out where to start, any pointers appreciated.

"A bakery has suspicions that their recent production run of cherry pies has resulted in half of all the pies becoming contaminated. The bakery is trying to work out how this issue will change the number of customers complaining. How many cherry pies out of their recent production run does the bakery need to test, to determine the probability that any given cherry pie is contaminated to better than 5%".

Stephen Tashi · Oct 20, 2018

Solving the problem is a mind reading exercise - what does the author of the problem expect us to assume? To get a hint about that, we need to know where you encountered this problem. What course? What are some examples of other problems from the same problem set?

ChrisJ said:

We've only been taught basic Bayes theory, the standard pdfs and a little on hypothesis testing.

Have you studied confidence intervals?

the probability that any given cherry pie is contaminated to better than 5%".

That language might be a mangled attempt to ask a question about "confidence".

As it stands, the problem seems to be ask how many pies must be tested to be certain that the the probability of a randomly selected pie being contaminated is estimated to within ##\pm .05 ##.

Having 95% "confidence" about the estimate of a probability is different that being certain it is estimated to within ##\pm .05##. There is also the question of whether "5%" means we are to estimate the unknown probability ##p## within ##\pm .05 ## or whether we must estimate it within ##\pm .05 p## of its true value.

An amusing approach is to assume we can give the answer in terms of the number of pies in the particular production run. If there are ##N## pies in the production run and we make the Bayesian assumption that ##N/2## are contaminated, then how many M pies must be tested to be certain we test a number of contaminated-pies C such that ##| C/M - 1/2| \le .05 ## ?

The worst case for sampling is that we'd be unlucky enough to test all the contaminated pies or all the uncontaminated pies before testing pies in the opposite condition. Let ##M = N/2 + K##. As a function of ##N##, what value must ##K## have to satisfy ##| K/((N/2) + K) - 1/2| \le 0.5## ?

If the problem intends to make a point about the distinction between "confidence" about the estimate of a parameter versus "certainty" about the range of an estimate, that amusing interpretation might be what the author intends. However, given the average course material, I think such an interpretation is unlikely.

ChrisJ · Oct 20, 2018

Stephen Tashi said:

Solving the problem is a mind reading exercise - what does the author of the problem expect us to assume? To get a hint about that, we need to know where you encountered this problem. What course? What are some examples of other problems from the same problem set?
Have you studied confidence intervals?
That language might be a mangled attempt to ask a question about "confidence".

As it stands, the problem seems to be ask how many pies must be tested to be certain that the the probability of a randomly selected pie being contaminated is estimated to within ##\pm .05 ##.

Having 95% "confidence" about the estimate of a probability is different that being certain it is estimated to within ##\pm .05##. There is also the question of whether "5%" means we are to estimate the unknown probability ##p## within ##\pm .05 ## or whether we must estimate it within ##\pm .05 p## of its true value.

An amusing approach is to assume we can give the answer in terms of the number of pies in the particular production run. If there are ##N## pies in the production run and we make the Bayesian assumption that ##N/2## are contaminated, then how many M pies must be tested to be certain we test a number of contaminated-pies C such that ##| C/M - 1/2| \le .05 ## ?

The worst case for sampling is that we'd be unlucky enough to test all the contaminated pies or all the uncontaminated pies before testing pies in the opposite condition. Let ##M = N/2 + K##. As a function of ##N##, what value must ##K## have to satisfy ##| K/((N/2) + K) - 1/2| \le 0.5## ?

If the problem intends to make a point about the distinction between "confidence" about the estimate of a parameter versus "certainty" about the range of an estimate, that amusing interpretation might be what the author intends. However, given the average course material, I think such an interpretation is unlikely.

Thanks for the reply. The course is essentially a "Stats and Data Analysis for Scientists" course, which is more about what we can read/interpret from data, with an emphasis on using programming languages to help solve the problems, make plots etc. We're only three weeks, have not done anything on confidence intervals though (though we have touched on the P-value, and NHST). Other questions include one where we essentially just had to use the standard Bayes rule ## P(A | B) = \frac{P(B | A)(P(A)}{P(B | A)(P(A) + P(B | A^c) P(A^c)} ## where we had to make an assumption for the "prior" ##P(A)##, and other question where we were given some data, had to computationally find the pearson r-value and p-value.

Stephen Tashi · Oct 20, 2018

The major difficulty in interpreting the problem is the following language:

determine the probability that any given cherry pie is contaminated to better than 5%"

"Determining" the probability could mean estimating its value. Such an estimate has some error ##e_r## which is a random variable since it depends on the results of a random sample. For each given error bound ##\delta## there is a probability ##p_\delta## that ##|e_r| \le \delta##. From that viewpoint it isn't clear what "5%" refers to. Does it refer to ##\delta## or something about ##p_\delta##? If we assume the required value of ##p_\delta## is, say, 0.95 and the required value of ##\delta## is 0.05 then we have a well defined problem.

ChrisJ said:

(though we have touched on the P-value, and NHST).

If we interpret "better than %5" as a requirement to design a test of the null hypothesis that p(randomly selected pie is contaminated) = 1/2 with a significance level of ##\alpha = 0.05## then we have a well defined problem. However that interprets "determine the probability" as merely testing the hpothesis that it is 1/2 rather than estimating whether it has a different value. Have you done problems where you must determine the required sample size to use in a hypothesis test? If it's this kind of problem, it won't involve Baye's theorem. You will assume the probability that a ranomly selected pie is contaminated is (defintely) 1/2 and do all computations based on that assumption.

StoneTemplePython · Oct 21, 2018

my guess is, yes OP is trying to determine a confidence interval, or perhaps in the Bayes realm: a credible interval.

But as ST says... it's quite an expansive question and feels a lot like mind reading at times...

ChrisJ · Oct 21, 2018

Stephen Tashi said:

The major difficulty in interpreting the problem is the following language:"Determining" the probability could mean estimating its value.

My bad, the question say "know" not "determine", the rest of the text is as is.

Stephen Tashi said:

If we interpret "better than %5" as a requirement to design a test of the null hypothesis that p(randomly selected pie is contaminated) = 1/2 with a significance level of ##\alpha = 0.05## then we have a well defined problem. However that interprets "determine the probability" as merely testing the hpothesis that it is 1/2 rather than estimating whether it has a different value. Have you done problems where you must determine the required sample size to use in a hypothesis test? If it's this kind of problem, it won't involve Baye's theorem. You will assume the probability that a ranomly selected pie is contaminated is (defintely) 1/2 and do all computations based on that assumption.

This sounds more likely to be it, we have done NHST, P-values etc and nothing on confidence intervals. We haven't done anything where we had to determine the required sample size, but I am wondering whether we have to just make an assumption.

Based on testing the null hypothesis that 1/2 the cherry pies are contaminated, just running some brute force methods so far I found that if there were 100 pies, they'd need to test 60 to get ##\alpha \leq 0.05##, and if there were 200 pies, they'd need to test 114 (~57%) . Does that sound at least even reasonable? Sounds way to high to me.

Stephen Tashi · Oct 21, 2018

ChrisJ said:

just running some brute force methods so far I found that if there were 100 pies, they'd need to test 60 to get ##\alpha \leq 0.05##,

Are you using a "one tailed" test or a "two tailed test"? The language

how this issue will change the number of customers complaining.

suggests the company is interested in whether the probability of contamination is different than 1/2 , either by being less or greater.

What is the statistic for your test and what is the "rejection region" for your test?

Stats Q Help: Find Probability of Contaminated Cherry Pies

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad Understanding permutations and combinations in a coin toss experiment

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect