Physics Forums Insights
  • Physics
    • Physics Articles
    • Physics Tutorials
    • Physics Guides
    • Physics FAQs
  • Math
    • Math Articles
    • Math Tutorials
    • Math Guides
    • Math FAQs
  • Bio/Chem/Tech
    • Bio/Chem Articles
    • Computer Science Tutorials
    • Technology Guides
  • Education
    • Education Articles
    • Education Guides
  • Interviews
  • Quizzes
  • Forums
  • Click to open the search input field Click to open the search input field Search
  • Menu Menu
probabilities virus testing

Probabilistic Factors Involved in Disease and Virus Testing

February 6, 2022/2 Comments/in Bio/Chem Articles, Mathematics Articles/by PeroK
📖Read Time: 6 minutes
📊Readability: Advanced 📐 (Technical knowledge needed)
🔖Core Topics: virusprevalencepositiveprobabilityfalse

Table of Contents

  • Introduction
  • Terminology
  • Analysis Based on Prevalence
  • Applying this Analysis
  • Analysis Based on Test Results
  • Formulas for False Positives and Negatives
  • Conclusion
  • Post-Script: Bayes Theorem
  • More Related Articles

Introduction

This Insight looks at the various probabilistic factors and related terminology involved in disease and virus testing.

As we all know, tests are rarely 100% reliable.  The frequency of false positives and false negatives, however, not only depend on the tests themselves but also on the prevalence of the disease or virus within the population.  To see this, imagine the two extremes where a) no one has the virus, and b) everyone has the virus.  In the first case, all positives must be false.  And, in the second, all negatives must be false.

This motivates for doing a proper analysis of the probabilities involved to see more precisely what can be concluded from a test result given all the available data.

Note that this insight provides a simple probabilistic analysis.  In many practical cases, some or all of the data is unknown, which leads to the more advanced techniques of hypothesis testing.

We assume throughout that we have a single test for a virus.

Terminology

The relevant terminology cannot be avoided:

Prevalence (##D##): the proportion of the population (or the subgroup being tested) who have the virus. There are two possible scenarios here.  First, random testing of the population or group, where the prevalence is some generic likelihood that someone in that group has the virus (and doesn’t suspect it).  Second, testing within a group who have come forward because of some suspicion that they may have the virus.

In general, the prevalence will be higher in the second case, so it’s important to distinguish between these two cases and use the best estimate in each case.

In this Insight, we will use ##D## to denote the prevalence within the relevant population.

Positive Predictive Value (PPV) (##x##): the probability of having the virus given a positive test.  Note that as explained in the introduction this is not a fixed value, but depends on the prevalence, which itself may depend on the particular group or individual being tested.

In this Insight, we will use ##x## to denote the PPV.

Negative Predictive Value (NPV) (##y##): the probability of not having the virus given a negative test.  As with PPV, this depends on the prevalence.

In this Insight, we will use ##y## to denote the PPV.

Sensitivity (##p##): the probability of a positive test given the subject has the virus.  This probability is fixed for a given test and doesn’t depend on the prevalence.

Specificity (##q##): the probability of a negative test given the subject does not have the virus.  This also is independent of the prevalence.

With that standard terminology out of the way, we can begin to analyze how these quantities are related.

Analysis Based on Prevalence

The group to be tested will have a (possibly unknown) proportion ##D## who have the virus, and a proportion ##1-D## who do not have the virus. In each case two test results are possible, based on the sensitivity and specificity, which results in four categories in the following proportions:

##Dp##: those who have the virus and tested positive (these are true positives)

##D(1-p)##: those who have the virus and tested negative (these are the false negatives)

##(1-D)q##: those who do not have the virus and tested negative (true negatives)

##(1-D)(1-q)##: those who do not have the virus and tested positive (false positives)

For simplicity, we introduce a further variable here, which is the proportion of positive tests ##T##:

$$T = Dp + (1-D)(1-q)$$

We can now express the PPV and NPV by reading off the data above (this is equivalent to using Bayes’ Theorem):

To calculate the PPV we find the number of positive tests (##T##) and the number of those who have the virus – which is ##Dp##.  The PPV (##x##) is the conditional probability of having the virus given a positive test, which is:

$$x = \frac{Dp}{T}$$

We may also read off the NPV, which is the conditional probability of not having the virus given a negative test:

$$y = \frac{(1-D)q}{1-T}$$

Note that $$1 – T = D(1-p) + (1-D)q$$

Applying this Analysis

To do something useful with the above analysis (perhaps in the context of a new test), we first need a group who we know has the virus and a group who we know do not have the virus.  By applying the test in each case we can calculate the sensitivity ##p## and specificity ##q## for that particular test.

In addition, if we know (or can reasonably well estimate) the prevalence of the virus (##D##), then we can interpret the result of an individual test as a probability of that person having or not having the virus.  These are just the PPV and NPV as above.  For those who return a positive test, we have:

$$x = \frac{Dp}{T} = \frac{Dp}{Dp + (1-D)(1-q)}$$ is the probability they have the virus.  And, of course, ##1-x## is the probability they do not.

And, for those who return a negative test, we have:

$$y = \frac{(1-D)q}{1-T} = \frac{(1-D)q}{(1-D)q + D(1-p)}$$ is the probability they do not have the virus.  And, ##1-y## is the probability they do.

To take an example.  Suppose ##p = 0.9##, ##q = 0.95## and ##D = 0.1## is an estimated prevalence.  Then:

##x = \frac{Dp}{Dp + (1-D)(1-q)} = 0.667##

##y = \frac{(1-D)q}{(1-D)q + D(1-p)} = 0.988##

We can see that someone with a negative test almost certainly does not have the virus; whereas, someone who tested positive has only a probability of ##2/3## of actually having the virus.

We can now see the effect of changing the prevalence by taking ##D = 0.5##.  This might represent the scenario where a group of people with certain symptoms are being tested and are more likely to have the virus than those in a random sample of the population.  Then:

##x = 0.947##

##y = 0.905##

We see that in this case, the positive test has become more conclusive (nearly 95% likelihood), while the negative test result is now less conclusive (still a 10% chance of having the virus). This illustrates the importance of prior suspicion of the virus, as the conclusion depends heavily on the estimated prevalence.

Analysis Based on Test Results

We may also analyze the relationship between these quantities based on the outcome of test results.  We can look at  the proportion who tested positive (##T##) and negative (##1- T##); and, subdivide these based on PPV (##x##) and NPV (##y##).  This again gives four categories:

##Tx##: Those who have a positive test and the virus (true positives)

##T(1-x)##: Those who have a positive test but do not have the virus (false positives)

##(1-T)y##: Those who have a negative test and do not have the virus (true negatives)

##(1-T)(1-y)##: Those who have a negative test but do have the virus (false negatives)

We can then express the prevalence, sensitivity, and specificity in terms of these:

$$D = Tx +(1-T)(1-y)$$$$p = \frac{Tx}{D} = \frac{Tx}{Tx + (1-T)(1-y)}$$$$q = \frac{(1-T)x}{1-D} = \frac{(1-T)y}{(1-T)y + T(1-y)}$$

These equations may, of course, be derived directly from the previous set by some algebra.  It’s nice, however, to see how easily they are extracted from a simple probabilistic analysis.

In truth, I’m not sure how useful these reciprocal formulas may be, but there they are.

Formulas for False Positives and Negatives

By equating the proportions of true and false positives and negatives from each analysis above, we get four more formulas with no additional effort:

$$D(1-p) = (1-T)(1-y) \ \ \ [\text{false negatives}]$$$$(1-D)(1-q) = T(1-x) \ \ \  [\text{false positives}]$$$$Dp = Tx \ \ \ \ [\text{true positives}]$$$$(1-D)q = (1-T)y \ \ \ [\text{true negatives}]$$

Conclusion

What we have derived here, with relative ease and no significant algebra or calculations, is a general set of formulas that relate all the relevant quantities in such a way that any particular problem can be solved using them.  Whatever data is given (PPV, NPV, sensitivity, specificity, prevalence, or proportion of positive tests), then the remaining data may be calculated simply and directly from these formulas.

Post-Script: Bayes Theorem

Bayes’ Theorem is implicity the basis for reading off the conditional probabilities in the above analysis.  Bayes’ Theorem is:

$$P(B)P(A|B) = P(A)P(B|A) \ \ (1)$$

An easy proof is simply to note that both sides of equation ##(1)## equal ##P(A \cap B)##, which is the probability of having both ##A## and ##B##.

The more familiar form is, of course:

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

To see how this relates to our terminology, note that in Bayes’ notation, the PPV (##x##) is:

$$x = P(virus|+ test) = \frac{P(+ test|virus)P(virus)}{P(+test)}$$

Where ##P(+ test|virus) = p##, the sensitivity; ##P(virus) = D##, the prevalence; and, ##P(+test) = T##, the proportion of positive tests.

It’s possible, therefore, to generate all the formulas above using the algebraic form of Bayes’ Theorem.  And, indeed, this is generally the way the subject is taught – even though there seems much less scope for going wrong using our “probability tree” approach.

PeroK

BSc in pure mathematics (1984).  Retired from a career in Information Technology in 2014.  I divide my time between studying physics when I’m home in London and mountaineering.

Favourite area of physics is Quantum Mechanics.

More Related Articles

  • All About the Einstein Field Equations
    Tags: biology, medical physics
    Share this entry
    • Share on Facebook
    • Share on X
    • Share on WhatsApp
    • Share on LinkedIn
    • Share on Reddit
    • Share by Mail
    https://www.physicsforums.com/insights/wp-content/uploads/2022/02/probabilities-virus-testing.png 135 240 PeroK https://www.physicsforums.com/insights/wp-content/uploads/2019/02/Physics_Forums_Insights_logo.png PeroK2022-02-06 17:13:532024-08-18 09:37:26Probabilistic Factors Involved in Disease and Virus Testing
    You might also like
    cancer What Causes Cancer: Bad Luck or Bad Lifestyles?
    what_is_evolution What Is Evolution? How It Works — Mechanisms & Evidence
    pet_scan How PET Scans Work: Positrons in Medical Imaging Explained
    medical physics How to Become a Medical Physicist in 3653 Easy Steps
    ryan_m_b - biology Interview with Biologist Ryan_m_b
    how to identify wood Tell One Wood from Another: Basic Wood Anatomy
    2 replies
    1. Orodruin says:
      March 20, 2022 at 4:34 am

      Nice writeup!
      I think a couple of graphs would be useful though, like the PPV/NPV as a function of prevalence for a couple of values of p/q, to really drive the message home.

      Log in to Reply
    2. Greg Bernhardt says:
      February 16, 2022 at 9:03 am

      Very interesting, who knew testing was so complex!

      Log in to Reply

    Leave a Reply

    Want to join the discussion?
    Feel free to contribute!

    Leave a Reply Cancel reply

    You must be logged in to post a comment.

    Trending Articles

    • Oppenheimer-Snyder Model of Gravitational Collapse: Implications
    • Why Your Software is Never Perfect
    • What Planck Length Is and It’s Common Misconceptions
    • The Science Crackpot Index and Bingo Game
    • PF’s policy on Lorentz Ether Theory and Block Universe
    • Can We See an Atom?
    • Introduction to the World of Algebras
    • The Quantum Mystery of Wigner’s Friend
    • How to Self-Study Calculus: Topics, Order & Book Guide
    • An Introduction to the Generation of Mass from Energy

    Physics Forums

    • Classical Physics
    • Atomic and Condensed Matter
    • Quantum Physics
    • Special and General Relativity
    • Beyond the Standard Model
    • High Energy, Nuclear, Particle Physics
    • Astronomy and Astrophysics
    • Cosmology
    • Other Physics Topics

    Receive Insights Articles to Your Inbox

    Enter your email address:

    Blog Information

    • Become a Member!
    • Write for Us!
    • Table of Contents
    • Blog Author List

    Popular Topics

    astronomy (17) black holes (17) classical physics (35) cosmology (16) education (23) electromagnetism (19) general relativity (19) gravity (24) interview (21) mathematics (39) mathematics self-study (21) Physicist (26) programming (18) Quantum Field Theory (31) quantum mechanics (36) quantum physics (24) relativity (40) Special Relativity (16) technology (19) universe (21)
    2026 © Physics Forums, ALL RIGHTS RESERVED - Contact Us - Privacy Policy - About PF Insights
    • Link to X
    • Link to Facebook
    • Link to LinkedIn
    • Link to Youtube
    Link to: Quantum Physics via Quantum Tomography: A New Approach to Quantum Mechanics Link to: Quantum Physics via Quantum Tomography: A New Approach to Quantum Mechanics Quantum Physics via Quantum Tomography: A New Approach to Quantum Mechanicsquantum mechanics new approachLink to: Relativity on Rotated Graph Paper (a graphical motivation) Link to: Relativity on Rotated Graph Paper (a graphical motivation) relativity rotated graphRelativity on Rotated Graph Paper (a graphical motivation)
    Scroll to top Scroll to top Scroll to top