# Testing a population for Coronavirus - Minimizing the number of tests

• Andrew Mason
In summary, Germany has been testing groups of 10 in a high risk population to reduce the number of tests needed. In a moderate risk population, 109 tests are needed to test 1000 people. However, if the tests are done in groups of 32, the number of tests required is only 63.
Andrew Mason
Homework Helper
TL;DR Summary
It has become evident that identifying persons in a population who have the virus before they have symptoms is critical to controlling an outbreak. There is a simple way to drastically reduce the number of tests required: test in groups of G where ##G = 1/\sqrt{p}## and p is the probability that a randomly chosen person will test positive.
Germany has been virus testing in groups of 10. In a high risk population where p=1/100 this requires, on average, 196 tests to test 1000 people. In a moderate risk population, it requires 109 tests to test 1000 people. But if they were to test in groups of 32 in such a population one would need only 63 tests.

The tests are done as follows:

1) swabs containing biological material that would contain the virus if it is present in an individual are taken from each of G individuals
2) the biological material in these swabs, G in number, is extracted from the swabs and all mixed together to create a single uniform composite test sample
3) the composite test sample is tested for the virus
4) if it is positive, all G individuals are tested individually to determine who in the group is positive
5) if it is negative, all G individuals are eliminated as carriers

The number of tests is determined as follows:

Let T = number of tests; p = probability that an individual test will be positive; G=number of individuals in the group; N= number of individuals in the population you want to test; Pgroup = probability that a group of G will test positive

(1) T = N/G + (N/G) x Pgroup x G = N(1/G + Pgroup)

So, for example in a population of 1,000,000 people in groups of 10 the number of tests would be:

T = 1,000,000 (1/10 + Pgroup) = 100,000 + 1,000,000 x Pgroup

Now the probability that a group will test positive is 1 - the probability that all individuals in the group will be negative:

##P_{group} = 1- (1-p)^G##

So (1) becomes:

(2) ##T = N(1/G + 1-(1-p)^G)##

Since p is very small the higher order terms in the expansion of ##(1-p)^G## can be ignored so that to a very close approximation:

##(1-p)^G = (1-p)(1-p)...(1-p) \approx 1 - Gp ##

So (2) becomes:

(3) ##T = N(1/G + Gp)##

T is minimum where dT/dG = 0

##dT/dG = N((-1/G^2) + p)##

So T is minimal where:

(4) ##G = \frac{1}{\sqrt{p}}##

So, if, for example, p = 1/1000, optimal G would be the closest integer to ##\sqrt{1000}## which is 32. Substituting into (3) results in T = 1000/32+32 = 63I have to thank my mathematician brother Dave for his help in working this out.

AM

Last edited by a moderator:
etotheipi, wukunlin, Merlin3189 and 4 others
Cool.
Do we have to assume a random distribution throughout the population, with no clustering?

What if we do not know p?
Can we find it?

atyy
256bits said:
Cool.
Do we have to assume a random distribution throughout the population, with no clustering?

What if we do not know p?
Can we find it?
You raise a good point because the virus is highly contagious and will generate geographical clusters. The effect depends on how the collection of biological material occurs.

In our analysis the assumption is that the swabs are obtained from individuals at random in the population, not that the virus is necessarily randomly distributed AND that the overall rate, p, is VERY SMALL so that higher order terms in the binomial expansion of ##(1-p)^G## are relatively small.

p starts as an estimate and is adjusted as testing goes along. G would be adjusted as the value for p changes. Group testing actually enables you to estimate p more accurately. If you get more groups testing positive than your estimate for p predicts, you know that p is higher than estimated.

If there are clusters and the overall individual infection rate p is x, then the effective value of p in the rest of the population is less than x. So if swabs are gathered in geographical order rather than randomly the following would occur:
1) the number of groups testing positive will be lower than in my calculation and
2) positive groups with multiple individuals who test positive will occur more frequently than would occur if the virus was randomly distributed geographically.
So this would suggest that group size, G, should be based on a lower value of p (G would increase)

As a practical matter, it would make more sense to test people geographically rather than randomly. So you start with a group size based on an estimate of p and adjust lower or higher depending on what actual p appears to be in a geographical area. And when you find a positive result, you would test all those in contact with that person individually.

In any event, it makes much more efficient use of limited testing resources to test in groups. Optimizing group size will be more challenging and will depend on local distribution. Starting with groups of 10 and adjusting group size from there is definitely a wise use of resources.

AM

Last edited:
256bits
I have seen schemes like this used in molecular screen for detecting mutations.
However, this assumes the test is able to detect virus when the virus of one infected person is diluted (due to mixing with the other samples) with the other 99 samples.

I'm not sure that is a good assumption when it seems that some test results might be on the edge of detection.

hmmm27 and atyy
BillTre said:
I have seen schemes like this used in molecular screen for detecting mutations.
However, this assumes the test is able to detect virus when the virus of one infected person is diluted (due to mixing with the other samples) with the other 99 samples.

I'm not sure that is a good assumption when it seems that some test results might be on the edge of detection.
Yes. Whether mixing the contents of a large number of swabs makes the test less effective in detecting a virus that is present in very few copies may be an issue. But we are not suggesting using groups as large as 99 or 100. That would be optimal only if p = 1/10,000. It is higher than 1/1,000 in most countries. (see next post)

It seems to work for groups of 10 in Germany. If p = 1/1000 and you used even groups of 3, a thousand people could be tested using 337 test analyses; or 363 tests if p= 1/100.

AM

Last edited:
Just following up on BillTre's comment:

It appears that Israel has adopted a group testing model and was successful in doing 64 pooled samples at a time:
https://www.jpost.com/HEALTH-SCIENCE/Acceleration-in-multiple-coronavirus-tests-at-once-by-Israel-research-team-621533

But unless they are dealing with an infection rate of 1/4100 their group size of 64 is larger than the optimum. This was reported on March 19 at a time when the infection rate was much lower than it is now so I expect they have since reduced the group size.

AM

atyy
Via Trevor Bedford's tweet, I came across an interesting variant of the pooled testing strategy by Tomer Hertz and colleagues. By putting each person in multiple pools, they can pool and still identify which individual in the pool is positive.

https://www.medrxiv.org/content/10.1101/2020.04.14.20064618v1
Efficient high throughput SARS-CoV-2 testing to detect asymptomatic carriers
Noam Shental, Shlomia Levy, Shosh Skorniakov, Vered Wuvshet, Yonat Shemer-Avni, Angel Porgador, Tomer Hertz

Trevor also mentions a different idea by Sri Kosuri for pooled testing in which a "barcode" is added to each sample so that individual positive samples can still be identified after pooling.

Last edited:
Of course if you get a reasonable representative sample in a population, this would give you the same sort of information. The problem is that tests that detect the RNA are not reliable indicators of the current rate of infection or infectivity and that's what we need, and at the level of the individual for effective contact tracing. The time taken to get results from pooled data might also make the data useless. Testing everyone who believes they might be at risk seems likely to be far less wasteful, but even this to be useful, requires resources put into contact tracing. Germany got a head start on this and is in fact expanding the resources put into contact tracing. However in countries that have already experienced a high level of infection its questionable that contact tracing will be effective, we need a low level of infection in the population for this to be manageable and useful.
This sort of testing will have to occur at the same time as countries are trying to identify the population level of immunity and there may be an issue of where to focus resources.

atyy and BillTre
BillTre said:
I have seen schemes like this used in molecular screen for detecting mutations.
However, this assumes the test is able to detect virus when the virus of one infected person is diluted (due to mixing with the other samples) with the other 99 samples.

I'm not sure that is a good assumption when it seems that some test results might be on the edge of detection.

Yes, I've seen or heard of it used too.
Depending on numbers your objection might be true yet the proposed strategy still best, in the situation where large numbers remain untested. It could be better to test larger numbers less perfectly.
However another question is where is the bottleneck? Some of the time in the UK it seemed it was not the labs' capacity but getting samples to them.

I guess that testing of sewage systems (which I have read is being done in some areas) is an extreme example of pooling samples.

Unfortunately, that strategy does not allow for either sub-dividing, or making new combinations of samples, or retesting single samples required to identify infected individuals.

BillTre said:
I guess that testing of sewage systems (which I have read is being done in some areas) is an extreme example of pooling samples.

Unfortunately, that strategy does not allow for either sub-dividing, or making new combinations of samples, or retesting single samples required to identify infected individuals.
Testing sewage might give you a geographic area in which to focus testing efforts, depending on how the sewers are designed.

If one wants to optimize the number of people tested individually testing in large groups definitely works. One just has to determine: 1. the maximum number of tests on any individual, and 2. the limits of the group size due to dilution.

My thinking is that if saliva samples are collected from each individual (which has been approved by the FDA and appears now to be to be as good or better than the intrusive nasal swab) one should be able to do 4 tests per individual using nucleic acid technology (RT-PCR) which is the only approved testing method in Canada. (I would appreciate any comment on that).

I have worked out the numbers using a group size limit of 64 (which the Israelis have said works - post #6 above). The method used is a group testing followed by testing on a positive group sample by two binary divisions of the positive group sample before a final sequential test. For example, if the group size is 64, the positive group sample would be divided into two new group samples of 32 individuals each and each tested. Then the positive half (assuming only one positive individual in the group) is divided into groups of 16 and then tested. The individuals in the positive group of 16 are then tested individually.

Here are the numbers:

 X - Estimated likelihood a single person is infected​ Optimum number of saliva samples to aggregate (individual test limit = 4; group size limit = 64)​ T - no. of test analyses needed to test 1,000,000 individuals​ 1 IN 10,000​ 64​ 17,619​ 1 IN 3,333​ 64​ 21,569​ 1 in 2,000​ 64​ 25,469​ 1 in 1,000​ 64​ 35,008​ 1 in 500​ 48 or 47​ 51,370​ 1 in 250​ 34​ 76,248​ 1 in 100​ 31 or 32​ 86,808​ 1in 40​ 18​ 228,393​

AM

Last edited:
BillTre said:
I have seen schemes like this used in a molecular screen for detecting mutations.
However, this assumes the test is able to detect the virus when the virus of one infected person is diluted (due to mixing with the other samples) with the other 99 samples.

I'm not sure that it is a good assumption when it seems that some test results might be on the edge of detection.
I am sure a culture can be made. It is unlikely just 1 out of 100 will be infected. None the less, the groups can be smaller or the test more sensitive or samples cultured. I hope this scratchpad picture helps with how the large group testing would be performed. https://www.physicsforums.com/threa...9-related-problems-and-no-one-to-tell.988420/

#### Attachments

• 95101257_3141517632578849_3902301908122992640_o.jpg
54.9 KB · Views: 128
Re: concerns with dilution and sensitivity of tests due to pooling of samples for group testing.

From this Oxford University study, it appears that saliva from COVID19 patients has a viral load of 3.3 x 106 virus copies/ml average and a range from 103 to 108 copies/ml:

Oxford University Article: 12 Feb 20 said:
Saliva specimens can be provided easily by asking patients to spit into a sterile bottle. Since no invasive procedures are required, the collection of saliva can greatly minimize the chance of exposing healthcare workers to 2019-nCoV. We have previously demonstrated that saliva has a high concordance rate of greater than 90% with nasopharyngeal specimens in the detection of respiratory viruses, including coronaviruses [5, 6]. In some patients, Coronavirus was detected only in saliva but not in nasopharyngeal aspirate [5]. Saliva has also been used in screening respiratory viruses among hospitalized patients without fever or respiratory symptoms [7]. SARS-CoV can be detected in saliva at high titers [8].

RESULTS
A total of 12 patients with laboratory-confirmed 2019-nCoV infection in Hong Kong were included. The median age was 62.5 years, ranging from 37 to 75 years. There were 5 female and 7 male patients. At the time of writing, all patients were still hospitalized. Saliva specimens were collected at a median of 2 days after hospitalization (range, 0–7 days) (Figure 1). The 2019-nCoV was detected in the initial saliva specimens of 11 patients (91.7%). For patient K, the first saliva specimen collected on the day of hospital admission tested negative. The median viral load of the first available saliva specimens was 3.3 × 10^6 copies/mL (range, 9.9 × 10^2 to 1.2 × 10^8 copies/mL).

The test kits that have been approved by the FDA use RT-PCR, which is very sensitive. For example, PerkinElmer's test kit will detect a sample with as few as 20 copies of the virus.

https://perkinelmer-appliedgenomics...navirus-2019-ncov-nucleic-acid-detection-kit/
PerkinElmer - Coronavirus Nucleic Acid Detection Kit said:
PerkinElmer New Coronavirus Nucleic Acid Detection Kit authorized under FDA EUA

The PerkinElmer New Coronavirus Nucleic Acid Detection Kit is a real-time RT-PCR test intended for the qualitative detection of nucleic acid from SARS-CoV-2 in human oropharyngeal and nasopharyngeal swab samples. Testing is limited to laboratories certified under the Clinical Laboratory Improvement Amendments of 1988 (CLIA), 42 U.S.C. §263a, to perform high complexity tests, or by similarly qualified non-U.S. laboratories.

Specific: Detection of SARS-CoV-2 ORF1ab and N genes

Sensitive: Limit of detection of 20 copies/mL

I don't see a problem, then, in using saliva samples and fractioning each sample 6 times and conducting group testing with initial pools of up to 250 individual samples.AM

Singapore will do some pooling of samples for testing.

https://www.moh.gov.sg/news-highlights/details/controlling-the-outbreak-preparing-for-the-next-phase

"For those with a negative serological test, and for the workers in the other dormitories, we will apply the PCR tests either individually or in batches [2]."

"[2] Such pooled tests involve combining swabs of up to five individuals into one laboratory test, which does not affect the sensitivity of the tests. Where a pooled test is positive, the original five individuals could be re-tested individually to identify the infected person. This is an effective strategy where the infection prevalence rates are likely to be low. "

atyy said:
Singapore will do some pooling of samples for testing.

https://www.moh.gov.sg/news-highlights/details/controlling-the-outbreak-preparing-for-the-next-phase

"For those with a negative serological test, and for the workers in the other dormitories, we will apply the PCR tests either individually or in batches [2]."

"[2] Such pooled tests involve combining swabs of up to five individuals into one laboratory test, which does not affect the sensitivity of the tests. Where a pooled test is positive, the original five individuals could be re-tested individually to identify the infected person. This is an effective strategy where the infection prevalence rates are likely to be low. "
The article says that they are doing 8,000 PCR (RT-PCR) tests per day and gearing up to do 40,000 and that the rate of infection is low. Doing groups of five saves tests but is still not nearly as efficient as it could be. With 8,000 tests per day they can test around 39,000 individuals. (205,000 tests per million).

Using a six stage group test starting with groups of 200 (assuming an infection rate of 1/1000 or less) they could be testing 600,000 people per day. To do that, however, one would need to have a larger sample from each individual or more swabs from each person being tested. Saliva appears to work quite well.

AM

## What is the purpose of testing a population for Coronavirus?

The purpose of testing a population for Coronavirus is to identify individuals who are infected with the virus and to track the spread of the disease within the population. This information can help inform public health measures and interventions to control the spread of the virus.

## Why is it important to minimize the number of tests when testing a population for Coronavirus?

Minimizing the number of tests is important because it helps to conserve testing resources, such as testing kits and laboratory capacity. It also helps to reduce the burden on healthcare systems and allows for more efficient and targeted testing.

## How can the number of tests be minimized when testing a population for Coronavirus?

The number of tests can be minimized by implementing targeted testing strategies, such as testing individuals with symptoms or those who have been in close contact with someone who has tested positive for the virus. Prioritizing high-risk populations, such as healthcare workers and individuals in congregate living settings, can also help to minimize the number of tests needed.

## What are the potential drawbacks of minimizing the number of tests when testing a population for Coronavirus?

Minimizing the number of tests may result in some infected individuals not being identified, which could lead to further spread of the virus. It may also make it more difficult to accurately track the spread of the disease within the population.

## How can the accuracy of testing be maintained while minimizing the number of tests for Coronavirus?

To maintain accuracy, it is important to use reliable and validated testing methods and to follow proper testing protocols. Additionally, implementing a system for follow-up testing of individuals who initially test negative can help to identify any missed cases and prevent further spread of the virus.

• General Math
Replies
5
Views
941
• Biology and Medical
Replies
100
Views
6K
• General Math
Replies
1
Views
1K
• Atomic and Condensed Matter
Replies
8
Views
1K
• Math Proof Training and Practice
Replies
100
Views
7K
• General Discussion
Replies
4
Views
833
• Atomic and Condensed Matter
Replies
3
Views
1K
• Linear and Abstract Algebra
Replies
2
Views
1K
• Programming and Computer Science
Replies
1
Views
2K
• Quantum Physics
Replies
23
Views
2K