Bayesian Probability Distributions

In summary, the conversation discusses Bayesian probability problems related to drug testing for performance-enhancing drugs in Olympic athletes. The first question asks for the probability of an athlete testing positive for the PED actually using the drug, which is calculated using Bayes' rule. The second question introduces a second drug test and asks for the probability of an athlete testing positive for both tests actually using the drug. The third question involves a study that found only 3% of Olympic athletes use the PED and asks for the fully specified distribution that will be used in two years when testing athletes again.
  • #1
Rifscape
41
0
Hi,

I was having some trouble doing some bayesian probability problems and was wondering if I could get any help. I think I was able to get the first two but am confused on the last. If someone could please check my work to make sure I am correct and help me on the last question that would be great.

Here is the problem setup:

Let’s say that we work for the International Olympic Committee (IOC) as part of their Fight Against Doping (https://www.olympic.org/fight-against-doping). We have a drug test for a banned performance-enhancing drug (PED) that is 99.3% accurate at identifying an athlete that has the PED in their system. However, it is only 73% accurate at identifying the absence of PED in the athlete’s system. From a scientific study we also have a strong reason to believe that only 3% of Olympic athletes use this particular PED.

Here is the first question.

1. An athlete tests positive for the PED. Given that the test had a positive result, what is the probability the tested individual uses the PED?

My answer:

P = Positive N = Negative

##p(P|PED) = 0.993##

##p(N|No PED) = 0.73##

##p(PED) = 0.03##

##p(P) = p(P|PED)*p(PED) + p(P|No PED)*p(No PED) = 0.993 * 0.03 + (1 - 0.73) * 0.97 = 0.292##

##p(PED|P) = 0.993 x 0.03 / 0.292 = 0.102##- answer

Here is the second question.

2. As an employee of the IOC, we don’t want to needlessly ban an athlete from the chance to compete in the Olympics. As a result, we decide to institute a protocol that if an athlete tests positive for the use of the PED we will administer a second test. The second test is less accurate at identifying an athlete that has the PED in their system, at only 81%, but is more accurate at identifying the absence of PED in the athlete’s system, with a probability of 90%. If the athlete tests positive for the PED in both the first and second test, what is the probability that the accused individual uses the banned PED? (You may assume the outcome of the second drug test is conditionally independent of the outcome of the first drug test).

My answer:

##P_2## = Positive 2nd Test

##N_2## = Negative 2nd Test##p(P_2|PED)## = 0.81

##p(N_2|No PED)## = 0.90

##p(PED|P_2) = p(P_2|PED)*p(PED)/p(P_2)##

##p(P_2) = p(P_2|PED)*p(PED) + p(P_2|No PED)*p(No PED) = 0.81*0.03 + (1-0.90)*(1-0.03) = 0.124##

##p(PED|P_2) = p(PED|P_2) * p(PED)/p(P_2) = 0.81*0.03/0.124 = 0.196##

Thus the probability of using PED given that you tested positive for both tests is:

##p(Positive for both) = p(P|PED)*p(P_2|PED) = 0.102*0.196 = 0.0199 \approx 0.20## - answer

Here is question 3 - this is the one I am not sure on.

3. Our information that only 3% of Olympic athletes use the PED came from a
study of 300 athletes. This year we tested 500 athletes and confirmed that 11 of
them used the banned substance. In both cases, only a sample of all athletes to
complete in the Olympics were tested for the PED. As a result there is some
uncertainty, so we decide we would like to express what we have learned as a
probability distribution. In two years when we being testing athletes again, what is the fully specified distribution that we will we use for the percentage of Olympic athletes that use the PED?

I don't get what it means by what the probability distribution would be. Would it be some derivation of a Bayesian posterior distribution with 0.03 as the prior, though I'm not sure what the likelihood or marginal would be.

I know this is the Bayes rule formula:

##p(\theta|x) = p(x|\theta)*p(\theta)/p(x)##

But how would I change this to suit the question?

Apologies for the long post, please let me know if you need any more information or clarification.

Thank you for reading
 
Physics news on Phys.org
  • #2
Rifscape said:
Hi,

I was having some trouble doing some bayesian probability problems and was wondering if I could get any help. I think I was able to get the first two but am confused on the last. If someone could please check my work to make sure I am correct and help me on the last question that would be great.

Here is the problem setup:

Let’s say that we work for the International Olympic Committee (IOC) as part of their Fight Against Doping (https://www.olympic.org/fight-against-doping). We have a drug test for a banned performance-enhancing drug (PED) that is 99.3% accurate at identifying an athlete that has the PED in their system. However, it is only 73% accurate at identifying the absence of PED in the athlete’s system. From a scientific study we also have a strong reason to believe that only 3% of Olympic athletes use this particular PED.

Here is the first question.

1. An athlete tests positive for the PED. Given that the test had a positive result, what is the probability the tested individual uses the PED?

My answer:

P = Positive N = Negative

##p(P|PED) = 0.993##

##p(N|No PED) = 0.73##

##p(PED) = 0.03##

##p(P) = p(P|PED)*p(PED) + p(P|No PED)*p(No PED) = 0.993 * 0.03 + (1 - 0.73) * 0.97 = 0.292##

##p(PED|P) = 0.993 x 0.03 / 0.292 = 0.102##- answer

Here is the second question.

2. As an employee of the IOC, we don’t want to needlessly ban an athlete from the chance to compete in the Olympics. As a result, we decide to institute a protocol that if an athlete tests positive for the use of the PED we will administer a second test. The second test is less accurate at identifying an athlete that has the PED in their system, at only 81%, but is more accurate at identifying the absence of PED in the athlete’s system, with a probability of 90%. If the athlete tests positive for the PED in both the first and second test, what is the probability that the accused individual uses the banned PED? (You may assume the outcome of the second drug test is conditionally independent of the outcome of the first drug test).

My answer:

##P_2## = Positive 2nd Test

##N_2## = Negative 2nd Test##p(P_2|PED)## = 0.81

##p(N_2|No PED)## = 0.90

##p(PED|P_2) = p(P_2|PED)*p(PED)/p(P_2)##

##p(P_2) = p(P_2|PED)*p(PED) + p(P_2|No PED)*p(No PED) = 0.81*0.03 + (1-0.90)*(1-0.03) = 0.124##

##p(PED|P_2) = p(PED|P_2) * p(PED)/p(P_2) = 0.81*0.03/0.124 = 0.196##

Thus the probability of using PED given that you tested positive for both tests is:

##p(Positive for both) = p(P|PED)*p(P_2|PED) = 0.102*0.196 = 0.0199 \approx 0.20## - answer

Here is question 3 - this is the one I am not sure on.

3. Our information that only 3% of Olympic athletes use the PED came from a
study of 300 athletes. This year we tested 500 athletes and confirmed that 11 of
them used the banned substance. In both cases, only a sample of all athletes to
complete in the Olympics were tested for the PED. As a result there is some
uncertainty, so we decide we would like to express what we have learned as a
probability distribution. In two years when we being testing athletes again, what is the fully specified distribution that we will we use for the percentage of Olympic athletes that use the PED?

I don't get what it means by what the probability distribution would be. Would it be some derivation of a Bayesian posterior distribution with 0.03 as the prior, though I'm not sure what the likelihood or marginal would be.

I know this is the Bayes rule formula:

##p(\theta|x) = p(x|\theta)*p(\theta)/p(x)##

But how would I change this to suit the question?

Apologies for the long post, please let me know if you need any more information or clarification.

Thank you for reading

I don't see how you did question 2, and I don't think you answered the question that was actually asked. I would say that
$$
\begin{array}{rcl}\Pr(P1 \& P2) &=& \Pr(P1 \& P2 | PEP ) \Pr(PEP) + \Pr(P1 \& P2 | NPEP) \Pr(NPEP) \\
&=& 0.993 \times 0.81 \times 0.03 + 0.27 \times 0.10 \times 0.97
\end{array}
$$
The question asks for ##\Pr(PEP | P1 \& P2).##

The way I would approach question 3 would be to start with some prior density ##f_0(p)## of the parameter ##p = \Pr(PEP)##; that is a prior, before any measurements have been made. Then, given ##p## the number of doped athletes ##X## in our first sample is binomial with parameters ##300## and ##p##. We observed ##3\% \times 300 = 9## doped athletes, so
$$\Pr(X=9|p) = {300 \choose 9} p^9 (1-p)^{291}.$$
The posterior density of ##p## (after observing 9 doped athletes in 300) is
$$f(p) = \Pr(p | X = 9) = \frac{ \Pr(X=9 |p) f_0(p)}{\int_0^1 \Pr(X=9|p) f_0(p) \, dp}$$
Note that here I have abused notation by saying things like ##\Pr(p|X=9)## instead of the more accurate notation
$$\lim_{\Delta p \to 0} \frac{\Pr([p,p+\Delta p] | X=9)}{\Delta p}$$

If we make the reasonable (?) assumption that ##f_0(p) = 1## on ##0 < p < 1## then we have ##f(p) = C p^9 (1-p)^{291}## for a normalizing constant ##C## that gives ##\int_0^1 f(p) \, dp = 1##.

You could use the above ##f(p)## as your new "prior" before the second round of measurements, to give you a posterior density.

The only remaining issue is how to use the posterior density obtained in that way. Should we use the whole posterior distribution, doing calculations for each fixed ##p##, then integrating over ##p##? Should we simplify life by using (for example) the expected value of ##p##, taken with respect to the posterior? Should we use the most probable value of ##p## (the mode of the posterior density)? Should we use the median of the posterior density?

BTW: these three methods mentioned for the prior (after testing the first 300) are:
$$\begin{array}{rcl}
\text{expected value of } \; p & = &5/151 \doteq 0.03311\\
\text{most probable value of } \; p &=& 3/100 = 0.03 \\
\text{median value of } \; p &= &0.03209
\end{array}
$$By the way: using the median is not easy; getting it requires the solution of a 301-degree polynomial equation---definitely needing a computer algebra system; I used Maple.
 
Last edited:
  • Like
Likes jim mcnamara
  • #3
Ray Vickson said:
I don't see how you did question 2, and I don't think you answered the question that was actually asked. I would say that
$$
\begin{array}{rcl}\Pr(P1 \& P2) &=& \Pr(P1 \& P2 | PEP ) \Pr(PEP) + \Pr(P1 \& P2 | NPEP) \Pr(NPEP) \\
&=& 0.993 \times 0.81 \times 0.03 + 0.27 \times 0.10 \times 0.97
\end{array}
$$
The question asks for ##\Pr(PEP | P1 \& P2).##

The way I would approach question 3 would be to start with some prior density ##f_0(p)## of the parameter ##p = \Pr(PEP)##; that is a prior, before any measurements have been made. Then, given ##p## the number of doped athletes ##X## in our first sample is binomial with parameters ##300## and ##p##. We observed ##3\% \times 300 = 9## doped athletes, so
$$\Pr(X=9|p) = {300 \choose 9} p^9 (1-p)^{291}.$$
The posterior density of ##p## (after observing 9 doped athletes in 300) is
$$f(p) = \Pr(p | X = 9) = \frac{ \Pr(X=9 |p) f_0(p)}{\int_0^1 \Pr(X=9|p) f_0(p) \, dp}$$
Note that here I have abused notation by saying things like ##\Pr(p|X=9)## instead of the more accurate notation
$$\lim_{\Delta p \to 0} \frac{\Pr([p,p+\Delta p] | X=9)}{\Delta p}$$

If we make the reasonable (?) assumption that ##f_0(p) = 1## on ##0 < p < 1## then we have ##f(p) = C p^9 (1-p)^{291}## for a normalizing constant ##C## that gives ##\int_0^1 f(p) \, dp = 1##.

You could use the above ##f(p)## as your new "prior" before the second round of measurements, to give you a posterior density.

The only remaining issue is how to use the posterior density obtained in that way. Should we use the whole posterior distribution, doing calculations for each fixed ##p##, then integrating over ##p##? Should we simplify life by using (for example) the expected value of ##p##, taken with respect to the posterior? Should we use the most probable value of ##p## (the mode of the posterior density)? Should we use the median of the posterior density?

BTW: these three methods mentioned for the prior (after testing the first 300) are:
$$\begin{array}{rcl}
\text{expected value of } \; p & = &5/151 \doteq 0.03311\\
\text{most probable value of } \; p &=& 3/100 = 0.03 \\
\text{median value of } \; p &= &0.03209
\end{array}
$$By the way: using the median is not easy; getting it requires the solution of a 301-degree polynomial equation---definitely needing a computer algebra system; I used Maple.

I see that makes sense. Apologies for the delayed response, was working on the problem. Your solution was very helpful. I was able to solve it thanks to you! The actual solution is not as complicated as the one you gave but your solution definitely put me on the right track. The model for the posterior was a beta distribution with the initial values being the prior and the values given the question 3 part of the likelihood. All that is needed after that is too plug the values into the beta distribution.

Thank you for your help and your clear solution - it helped a ton!
 

What is a Bayesian probability distribution?

A Bayesian probability distribution is a statistical model that assigns probabilities to different outcomes based on prior knowledge or beliefs. It combines prior information with new data to update the probability of different outcomes.

How is a Bayesian probability distribution different from a traditional probability distribution?

A traditional probability distribution is based solely on observed data, while a Bayesian probability distribution takes into account prior information or beliefs. This allows for a more flexible and personalized approach to probability estimation.

What are some common applications of Bayesian probability distributions?

Bayesian probability distributions are commonly used in fields such as machine learning, data science, and risk analysis. They can be applied to a wide range of problems, such as predicting stock market trends, diagnosing medical conditions, and modeling weather patterns.

How is a prior belief incorporated into a Bayesian probability distribution?

A prior belief is typically represented as a prior probability distribution, which is then combined with observed data through Bayes' theorem to update the probability distribution. The strength of the prior belief can be adjusted based on the amount of data available.

What are the advantages of using Bayesian probability distributions?

One major advantage of Bayesian probability distributions is their ability to incorporate prior knowledge or beliefs, which can lead to more accurate and personalized estimates. They also allow for the updating of probabilities as new data becomes available, making them adaptable to changing situations.

Similar threads

  • Calculus and Beyond Homework Help
Replies
4
Views
984
  • Calculus and Beyond Homework Help
Replies
4
Views
784
  • Calculus and Beyond Homework Help
Replies
4
Views
1K
  • Precalculus Mathematics Homework Help
Replies
1
Views
758
  • Calculus and Beyond Homework Help
Replies
1
Views
6K
  • Calculus and Beyond Homework Help
Replies
1
Views
979
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
5K
  • Calculus and Beyond Homework Help
Replies
4
Views
2K
  • Calculus and Beyond Homework Help
Replies
2
Views
5K
Back
Top