Statistical analysis of COVID reinfection

Click For Summary

Discussion Overview

The discussion revolves around the statistical analysis of COVID-19 reinfection rates compared to first-time infections. Participants explore various statistical models and methods to assess whether previously infected individuals have a different probability of reinfection. The conversation includes considerations of available data, modeling approaches, and statistical testing methods.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant suggests using a Markov chain model with four states: uninfected, sick, dead, and recovered, noting the limitations of having only three data points on reinfection.
  • Another participant questions the feasibility of estimating transition probabilities over time, indicating that the available data may not be sufficient for robust conclusions.
  • Some participants propose alternative statistical methods, such as logistic regression and Chi-square tests, to analyze the data and test hypotheses regarding reinfection rates.
  • A participant expresses a desire to find similar solved problems to learn from, indicating a lack of practical experience with Markov processes.
  • There is a suggestion to refer to existing medical studies for more reliable data and insights on reinfection rates.
  • One participant emphasizes the importance of clearly defining the statistical tests to be performed, such as testing whether the rate of reinfection is reduced by vaccination.
  • Another participant expresses a willingness to explore the topic for personal interest, despite acknowledging their lack of qualifications in the field.

Areas of Agreement / Disagreement

Participants have not reached a consensus on the best approach to analyze the data or the validity of the proposed models. Multiple competing views on statistical methods and the interpretation of available data remain present.

Contextual Notes

Participants note limitations in the available data, including the small number of reinfection cases and the potential variability of transition probabilities over time. There is also uncertainty regarding the adequacy of statistical tests given the data constraints.

Who May Find This Useful

This discussion may be of interest to those involved in statistical analysis, epidemiology, and public health, particularly in the context of COVID-19 research and reinfection studies.

Vrbic
Messages
400
Reaction score
18
TL;DR
How sophistically calculate a probability of covid reinfection from available data?
Hi, I'm a physicist so I have a basic knowledge of probability and hypothesis testing etc. I would like to more sophistically calculate from available data in my country whether ones Covid infected people have a statistically significant different probability of reinfection than people who are infected the first time.

Let's define reinfection as two infections (proved by the test) with at least a 60 days period between them or more.

An available data are:
Numbers of total infected, number of totals healed, both day by day (ie I immediately know a status 60 days before) and three records of reinfected (data in January, February and March). I know that three records of reinfection are not much but... at least to get some guess.

My question is what procedure to use to find the answer to whether reinfection is more or less probable than ordinary infection.

Thank you all for comments.
 
Physics news on Phys.org
You would probably use a Markov chain model. Maybe four states: uninfected, sick, dead, and recovered. There are no transitions into uninfected and no transitions out of dead. There is also no transition from uninfected to recovered. If you consider only COVID deaths then there would also be no transition from uninfected or recovered to dead.

With only three data points on the reinfection I doubt you will be able to get a good estimate of that probability.
 
Dale said:
You would probably use a Markov chain model. Maybe four states: uninfected, sick, dead, and recovered. There are no transitions into uninfected and no transitions out of dead. There is also no transition from uninfected to recovered.

With only three data points on the reinfection I doubt you will be able to get a good estimate of that probability.

Thank you for your advice, I try it!
 
Vrbic said:
... sophistically
... sophistically
1616502703587.png
 
  • Like
Likes   Reactions: Vrbic
Dale said:
You would probably use a Markov chain model. Maybe four states: uninfected, sick, dead, and recovered. There are no transitions into uninfected and no transitions out of dead. There is also no transition from uninfected to recovered. If you consider only COVID deaths then there would also be no transition from uninfected or recovered to dead.

With only three data points on the reinfection I doubt you will be able to get a good estimate of that probability.

Hi, again thank you for your response.
I read something about the Markov chain and I probably don't make it. Could you please send me some similar solved problem where I could learn them? Is it alright to change transition probabilities during the time? I mean in January probability of sickness (transition uninfected -> sick) or reinfection (recovered -> sick) will be different than in February etc.

The first time I was thinking about that I thought about some hypothesis testing (I'm familiar only with ANOVA). Isn't it a better approach to this problem?
 
Vrbic said:
Could you please send me some similar solved problem where I could learn them?
Not really. I am aware of Markov processes, but I have never actually done it myself so I have no practical experience or concrete suggestions to offer.

Maybe one of my colleagues here will have more insight.

Vrbic said:
Is it alright to change transition probabilities during the time?
Yes, but from what you described you probably do not have enough data to assess that.

Vrbic said:
The first time I was thinking about that I thought about some hypothesis testing (I'm familiar only with ANOVA). Isn't it a better approach to this problem?
You could do a logistic regression. I don’t know what would make it better, but it certainly could be done.
 
Dale said:
Not really. I am aware of Markov processes, but I have never actually done it myself so I have no practical experience or concrete suggestions to offer.

Maybe one of my colleagues here will have more insight.

Yes, but from what you described you probably do not have enough data to assess that.

You could do a logistic regression. I don’t know what would make it better, but it certainly could be done.
Thank you again, I will look at it.
 
Why waste time doing this yourself when you are not qualified? Do you want your doctor trying to do physics? Just search some real medical studies, like this:

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)00575-4/fulltext
During the first surge (ie, before June, 2020), 533 381 people were tested, of whom 11 727 (2·20%) were PCR positive, and 525 339 were eligible for follow-up in the second surge, of whom 11 068 (2·11%) had tested positive during the first surge. Among eligible PCR-positive individuals from the first surge of the epidemic, 72 (0·65% [95% CI 0·51–0·82]) tested positive again during the second surge compared with 16 819 (3·27% [3·22–3·32]) of 514 271 who tested negative during the first surge (adjusted RR 0·195 [95% CI 0·155–0·246]). Protection against repeat infection was 80·5% (95% CI 75·4–84·5). The alternative cohort analysis gave similar estimates (adjusted RR 0·212 [0·179–0·251], estimated protection 78·8% [74·9–82·1]). In the alternative cohort analysis, among those aged 65 years and older, observed protection against repeat infection was 47·1% (95% CI 24·7–62·8). We found no difference in estimated protection against repeat infection by sex (male 78·4% [72·1–83·2] vs female 79·1% [73·9–83·3]) or evidence of waning protection over time (3–6 months of follow-up 79·3% [74·4–83·3] vs ≥7 months of follow-up 77·7% [70·9–82·9]).
Interpretation
Our findings could inform decisions on which groups should be vaccinated and advocate for vaccination of previously infected individuals because natural protection, especially among older people, cannot be relied on.
 
  • Like
Likes   Reactions: FactChecker and Dale
It may help if you describe what you want to statistically test.
Do you want to show, with some certainty, that the rate of reinfection is reduced by the vaccine? If so, consider using a Chi-square test of homogeneity. See https://stats.stackexchange.com/questions/226789/how-to-compare-ratios-in-r.
Do you want to show, with some certainty, that the rate of reinfection is 0 after the vaccine? If so, consider using a Chi-square goodness of fit test.
There are several options. See this for some.
Keep in mind that many statistical tests are typically used so that the null hypothesis is retained unless there is very strong statistical evidence otherwise. I am not sure that there is enough data to reach those levels of significance.
 
  • #10
BWV said:
Why waste time doing this yourself when you are not qualified? Do you want your doctor trying to do physics? Just search some real medical studies, like this:

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)00575-4/fulltext
Thank you very much, I will read it.

I understand your point and agree with it, but I don't want to let my doctor publish any physics result, but when it is his hobby, why do not let him have fun at a low level. And that was my query. An easy way how to do it (not precise, but use some solved problem from a book for example) and have fun with that.
 
  • Like
Likes   Reactions: FactChecker and Dale
  • #11
FactChecker said:
It may help if you describe what you want to statistically test.
Do you want to show, with some certainty, that the rate of reinfection is reduced by the vaccine? If so, consider using a Chi-square test of homogeneity. See https://stats.stackexchange.com/questions/226789/how-to-compare-ratios-in-r.
Do you want to show, with some certainty, that the rate of reinfection is 0 after the vaccine? If so, consider using a Chi-square goodness of fit test.
There are several options. See this for some.
Keep in mind that many statistical tests are typically used so that the null hypothesis is retained unless there is very strong statistical evidence otherwise. I am not sure that there is enough data to reach those levels of significance.
Thank you! It seems very helpful.

I would like to test a hypothesis: The first infection and reinfection have the same probability.
I think it should be the first step, then if I refute this claim, and it will be enough data I would like to try to find the probability difference between the first infection and reinfection, but just to test this hypothesis is enough for now.
 
  • #12
Vrbic said:
Thank you! It seems very helpful.

I would like to test a hypothesis: The first infection and reinfection have the same probability.
I think it should be the first step, then if I refute this claim, and it will be enough data I would like to try to find the probability difference between the first infection and reinfection, but just to test this hypothesis is enough for now.
One of the problems (there are many) with statistically analyzing this is that reinfection may be due to new variants that may not be in your past data at all. I'm afraid that there is no way to address that issue except by knowing the exact nature of the variation, how quickly it is spreading, where it is in the world, and what the consequences of the change are. There are many variants now and many more on the way.
 
  • #13
FactChecker said:
One of the problems (there are many) with statistically analyzing this is that reinfection may be due to new variants that may not be in your past data at all.
I definitely agree, I may do a quite good guess by using date from an earlier era when only one mutation was here. And I understand, it means my result is lost in history because it will no longer the case.

I'm afraid that there is no way to address that issue except by knowing the exact nature of the variation, how quickly it is spreading, where it is in the world, and what the consequences of the change are. There are many variants now and many more on the way.
I understand, but as I mentioned earlier, I want to get familiar with some new statistic method no to find the correct result, so for me, the journey is the destination.
 
  • #14
Vrbic said:
I definitely agree, I may do a quite good guess by using date from an earlier era when only one mutation was here. And I understand, it means my result is lost in history because it will no longer the case.
Not only "lost in history", but only analyzing history rather than predicting anything useful.
I understand, but as I mentioned earlier, I want to get familiar with some new statistic method no to find the correct result, so for me, the journey is the destination.
Fair enough. One thing that you should know about statistics is that it is very treacherous ground. There are many possible ways that results can be biased and conclusions can be wrong. There are problems with correlated variables, self-selecting samples, time-varying distributions, reversing cause-and-effect, etc., etc., etc. When you get experience in spotting the problems, you will see them every day in studies reported in the news.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 100 ·
4
Replies
100
Views
10K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 47 ·
2
Replies
47
Views
10K
  • · Replies 10 ·
Replies
10
Views
3K
Replies
1
Views
1K
  • · Replies 10 ·
Replies
10
Views
3K
  • · Replies 14 ·
Replies
14
Views
3K