Testing a population for Coronavirus - Minimizing the number of tests

  • #1

Andrew Mason

Science Advisor
Homework Helper
7,744
443
TL;DR Summary
It has become evident that identifying persons in a population who have the virus before they have symptoms is critical to controlling an outbreak. There is a simple way to drastically reduce the number of tests required: test in groups of G where ##G = 1/\sqrt{p}## and p is the probability that a randomly chosen person will test positive.
Germany has been virus testing in groups of 10. In a high risk population where p=1/100 this requires, on average, 196 tests to test 1000 people. In a moderate risk population, it requires 109 tests to test 1000 people. But if they were to test in groups of 32 in such a population one would need only 63 tests.

The tests are done as follows:

1) swabs containing biological material that would contain the virus if it is present in an individual are taken from each of G individuals
2) the biological material in these swabs, G in number, is extracted from the swabs and all mixed together to create a single uniform composite test sample
3) the composite test sample is tested for the virus
4) if it is positive, all G individuals are tested individually to determine who in the group is positive
5) if it is negative, all G individuals are eliminated as carriers

The number of tests is determined as follows:

Let T = number of tests; p = probability that an individual test will be positive; G=number of individuals in the group; N= number of individuals in the population you want to test; Pgroup = probability that a group of G will test positive

(1) T = N/G + (N/G) x Pgroup x G = N(1/G + Pgroup)

So, for example in a population of 1,000,000 people in groups of 10 the number of tests would be:

T = 1,000,000 (1/10 + Pgroup) = 100,000 + 1,000,000 x Pgroup

Now the probability that a group will test positive is 1 - the probability that all individuals in the group will be negative:

##P_{group} = 1- (1-p)^G##

So (1) becomes:

(2) ##T = N(1/G + 1-(1-p)^G)##

Since p is very small the higher order terms in the expansion of ##(1-p)^G## can be ignored so that to a very close approximation:

##(1-p)^G = (1-p)(1-p)...(1-p) \approx 1 - Gp ##

So (2) becomes:

(3) ##T = N(1/G + Gp)##

T is minimum where dT/dG = 0

##dT/dG = N((-1/G^2) + p)##

So T is minimal where:

(4) ##G = \frac{1}{\sqrt{p}}##

So, if, for example, p = 1/1000, optimal G would be the closest integer to ##\sqrt{1000}## which is 32. Substituting into (3) results in T = 1000/32+32 = 63


I have to thank my mathematician brother Dave for his help in working this out.

AM
 
Last edited by a moderator:
  • Like
  • Informative
Likes etotheipi, wukunlin, Merlin3189 and 4 others

Answers and Replies

  • #2
Cool.
Do we have to assume a random distribution throughout the population, with no clustering?

What if we do not know p?
Can we find it?
 
  • #3
Cool.
Do we have to assume a random distribution throughout the population, with no clustering?

What if we do not know p?
Can we find it?
You raise a good point because the virus is highly contagious and will generate geographical clusters. The effect depends on how the collection of biological material occurs.

In our analysis the assumption is that the swabs are obtained from individuals at random in the population, not that the virus is necessarily randomly distributed AND that the overall rate, p, is VERY SMALL so that higher order terms in the binomial expansion of ##(1-p)^G## are relatively small.

p starts as an estimate and is adjusted as testing goes along. G would be adjusted as the value for p changes. Group testing actually enables you to estimate p more accurately. If you get more groups testing positive than your estimate for p predicts, you know that p is higher than estimated.

If there are clusters and the overall individual infection rate p is x, then the effective value of p in the rest of the population is less than x. So if swabs are gathered in geographical order rather than randomly the following would occur:
1) the number of groups testing positive will be lower than in my calculation and
2) positive groups with multiple individuals who test positive will occur more frequently than would occur if the virus was randomly distributed geographically.
So this would suggest that group size, G, should be based on a lower value of p (G would increase)

As a practical matter, it would make more sense to test people geographically rather than randomly. So you start with a group size based on an estimate of p and adjust lower or higher depending on what actual p appears to be in a geographical area. And when you find a positive result, you would test all those in contact with that person individually.

In any event, it makes much more efficient use of limited testing resources to test in groups. Optimizing group size will be more challenging and will depend on local distribution. Starting with groups of 10 and adjusting group size from there is definitely a wise use of resources.

AM
 
Last edited:
  • #4
I have seen schemes like this used in molecular screen for detecting mutations.
However, this assumes the test is able to detect virus when the virus of one infected person is diluted (due to mixing with the other samples) with the other 99 samples.

I'm not sure that is a good assumption when it seems that some test results might be on the edge of detection.
 
  • Like
Likes hmmm27 and atyy
  • #5
I have seen schemes like this used in molecular screen for detecting mutations.
However, this assumes the test is able to detect virus when the virus of one infected person is diluted (due to mixing with the other samples) with the other 99 samples.

I'm not sure that is a good assumption when it seems that some test results might be on the edge of detection.
Yes. Whether mixing the contents of a large number of swabs makes the test less effective in detecting a virus that is present in very few copies may be an issue. But we are not suggesting using groups as large as 99 or 100. That would be optimal only if p = 1/10,000. It is higher than 1/1,000 in most countries. (see next post)

It seems to work for groups of 10 in Germany. If p = 1/1000 and you used even groups of 3, a thousand people could be tested using 337 test analyses; or 363 tests if p= 1/100.

AM
 
Last edited:
  • #6
Just following up on BillTre's comment:

It appears that Israel has adopted a group testing model and was successful in doing 64 pooled samples at a time:
https://www.jpost.com/HEALTH-SCIENCE/Acceleration-in-multiple-coronavirus-tests-at-once-by-Israel-research-team-621533

But unless they are dealing with an infection rate of 1/4100 their group size of 64 is larger than the optimum. This was reported on March 19 at a time when the infection rate was much lower than it is now so I expect they have since reduced the group size.

AM
 
  • #7
Via Trevor Bedford's tweet, I came across an interesting variant of the pooled testing strategy by Tomer Hertz and colleagues. By putting each person in multiple pools, they can pool and still identify which individual in the pool is positive.

https://www.medrxiv.org/content/10.1101/2020.04.14.20064618v1
Efficient high throughput SARS-CoV-2 testing to detect asymptomatic carriers
Noam Shental, Shlomia Levy, Shosh Skorniakov, Vered Wuvshet, Yonat Shemer-Avni, Angel Porgador, Tomer Hertz

Trevor also mentions a different idea by Sri Kosuri for pooled testing in which a "barcode" is added to each sample so that individual positive samples can still be identified after pooling.
 
Last edited:
  • #8
Of course if you get a reasonable representative sample in a population, this would give you the same sort of information. The problem is that tests that detect the RNA are not reliable indicators of the current rate of infection or infectivity and that's what we need, and at the level of the individual for effective contact tracing. The time taken to get results from pooled data might also make the data useless. Testing everyone who believes they might be at risk seems likely to be far less wasteful, but even this to be useful, requires resources put into contact tracing. Germany got a head start on this and is in fact expanding the resources put into contact tracing. However in countries that have already experienced a high level of infection its questionable that contact tracing will be effective, we need a low level of infection in the population for this to be manageable and useful.
This sort of testing will have to occur at the same time as countries are trying to identify the population level of immunity and there may be an issue of where to focus resources.
 
  • Like
Likes atyy and BillTre
  • #9
I have seen schemes like this used in molecular screen for detecting mutations.
However, this assumes the test is able to detect virus when the virus of one infected person is diluted (due to mixing with the other samples) with the other 99 samples.

I'm not sure that is a good assumption when it seems that some test results might be on the edge of detection.

Yes, I've seen or heard of it used too.
Depending on numbers your objection might be true yet the proposed strategy still best, in the situation where large numbers remain untested. It could be better to test larger numbers less perfectly.
However another question is where is the bottleneck? Some of the time in the UK it seemed it was not the labs' capacity but getting samples to them.
 
  • #10
I guess that testing of sewage systems (which I have read is being done in some areas) is an extreme example of pooling samples.

Unfortunately, that strategy does not allow for either sub-dividing, or making new combinations of samples, or retesting single samples required to identify infected individuals.
 
  • #11
I guess that testing of sewage systems (which I have read is being done in some areas) is an extreme example of pooling samples.

Unfortunately, that strategy does not allow for either sub-dividing, or making new combinations of samples, or retesting single samples required to identify infected individuals.
Testing sewage might give you a geographic area in which to focus testing efforts, depending on how the sewers are designed.

If one wants to optimize the number of people tested individually testing in large groups definitely works. One just has to determine: 1. the maximum number of tests on any individual, and 2. the limits of the group size due to dilution.

My thinking is that if saliva samples are collected from each individual (which has been approved by the FDA and appears now to be to be as good or better than the intrusive nasal swab) one should be able to do 4 tests per individual using nucleic acid technology (RT-PCR) which is the only approved testing method in Canada. (I would appreciate any comment on that).

I have worked out the numbers using a group size limit of 64 (which the Israelis have said works - post #6 above). The method used is a group testing followed by testing on a positive group sample by two binary divisions of the positive group sample before a final sequential test. For example, if the group size is 64, the positive group sample would be divided into two new group samples of 32 individuals each and each tested. Then the positive half (assuming only one positive individual in the group) is divided into groups of 16 and then tested. The individuals in the positive group of 16 are then tested individually.

Here are the numbers:

X - Estimated likelihood a single person is infected
Optimum number of saliva samples to aggregate
(individual test limit = 4;
group size limit = 64)
T - no. of test analyses needed to test 1,000,000 individuals
1 IN 10,000​
64​
17,619​
1 IN 3,333​
64​
21,569​
1 in 2,000​
64​
25,469​
1 in 1,000​
64​
35,008​
1 in 500​
48 or 47​
51,370​
1 in 250​
34​
76,248​
1 in 100​
31 or 32​
86,808​
1in 40​
18​
228,393​

AM
 
Last edited:
  • #12
I have seen schemes like this used in a molecular screen for detecting mutations.
However, this assumes the test is able to detect the virus when the virus of one infected person is diluted (due to mixing with the other samples) with the other 99 samples.

I'm not sure that it is a good assumption when it seems that some test results might be on the edge of detection.
I am sure a culture can be made. It is unlikely just 1 out of 100 will be infected. None the less, the groups can be smaller or the test more sensitive or samples cultured. I hope this scratchpad picture helps with how the large group testing would be performed. https://www.physicsforums.com/threa...9-related-problems-and-no-one-to-tell.988420/
 

Attachments

  • 95101257_3141517632578849_3902301908122992640_o.jpg
    95101257_3141517632578849_3902301908122992640_o.jpg
    54.9 KB · Views: 81
  • #13
Re: concerns with dilution and sensitivity of tests due to pooling of samples for group testing.

From this Oxford University study, it appears that saliva from COVID19 patients has a viral load of 3.3 x 106 virus copies/ml average and a range from 103 to 108 copies/ml:

Oxford University Article: 12 Feb 20 said:
Saliva specimens can be provided easily by asking patients to spit into a sterile bottle. Since no invasive procedures are required, the collection of saliva can greatly minimize the chance of exposing healthcare workers to 2019-nCoV. We have previously demonstrated that saliva has a high concordance rate of greater than 90% with nasopharyngeal specimens in the detection of respiratory viruses, including coronaviruses [5, 6]. In some patients, Coronavirus was detected only in saliva but not in nasopharyngeal aspirate [5]. Saliva has also been used in screening respiratory viruses among hospitalized patients without fever or respiratory symptoms [7]. SARS-CoV can be detected in saliva at high titers [8].

RESULTS
A total of 12 patients with laboratory-confirmed 2019-nCoV infection in Hong Kong were included. The median age was 62.5 years, ranging from 37 to 75 years. There were 5 female and 7 male patients. At the time of writing, all patients were still hospitalized. Saliva specimens were collected at a median of 2 days after hospitalization (range, 0–7 days) (Figure 1). The 2019-nCoV was detected in the initial saliva specimens of 11 patients (91.7%). For patient K, the first saliva specimen collected on the day of hospital admission tested negative. The median viral load of the first available saliva specimens was 3.3 × 10^6 copies/mL (range, 9.9 × 10^2 to 1.2 × 10^8 copies/mL).

The test kits that have been approved by the FDA use RT-PCR, which is very sensitive. For example, PerkinElmer's test kit will detect a sample with as few as 20 copies of the virus.

https://perkinelmer-appliedgenomics...navirus-2019-ncov-nucleic-acid-detection-kit/
PerkinElmer - Coronavirus Nucleic Acid Detection Kit said:
PerkinElmer New Coronavirus Nucleic Acid Detection Kit authorized under FDA EUA

The PerkinElmer New Coronavirus Nucleic Acid Detection Kit is a real-time RT-PCR test intended for the qualitative detection of nucleic acid from SARS-CoV-2 in human oropharyngeal and nasopharyngeal swab samples. Testing is limited to laboratories certified under the Clinical Laboratory Improvement Amendments of 1988 (CLIA), 42 U.S.C. §263a, to perform high complexity tests, or by similarly qualified non-U.S. laboratories.

Specific: Detection of SARS-CoV-2 ORF1ab and N genes

Sensitive: Limit of detection of 20 copies/mL

I don't see a problem, then, in using saliva samples and fractioning each sample 6 times and conducting group testing with initial pools of up to 250 individual samples.


AM
 
  • #14
Singapore will do some pooling of samples for testing.

https://www.moh.gov.sg/news-highlights/details/controlling-the-outbreak-preparing-for-the-next-phase

"For those with a negative serological test, and for the workers in the other dormitories, we will apply the PCR tests either individually or in batches [2]."

"[2] Such pooled tests involve combining swabs of up to five individuals into one laboratory test, which does not affect the sensitivity of the tests. Where a pooled test is positive, the original five individuals could be re-tested individually to identify the infected person. This is an effective strategy where the infection prevalence rates are likely to be low. "
 
  • #15
Singapore will do some pooling of samples for testing.

https://www.moh.gov.sg/news-highlights/details/controlling-the-outbreak-preparing-for-the-next-phase

"For those with a negative serological test, and for the workers in the other dormitories, we will apply the PCR tests either individually or in batches [2]."

"[2] Such pooled tests involve combining swabs of up to five individuals into one laboratory test, which does not affect the sensitivity of the tests. Where a pooled test is positive, the original five individuals could be re-tested individually to identify the infected person. This is an effective strategy where the infection prevalence rates are likely to be low. "
The article says that they are doing 8,000 PCR (RT-PCR) tests per day and gearing up to do 40,000 and that the rate of infection is low. Doing groups of five saves tests but is still not nearly as efficient as it could be. With 8,000 tests per day they can test around 39,000 individuals. (205,000 tests per million).

Using a six stage group test starting with groups of 200 (assuming an infection rate of 1/1000 or less) they could be testing 600,000 people per day. To do that, however, one would need to have a larger sample from each individual or more swabs from each person being tested. Saliva appears to work quite well.

AM
 

Suggested for: Testing a population for Coronavirus - Minimizing the number of tests

Replies
5
Views
616
Replies
8
Views
2K
Replies
2
Views
970
Replies
1
Views
1K
Replies
92
Views
9K
Replies
24
Views
2K
Back
Top