Statistical analysis of COVID reinfection

Vrbic · Mar 23, 2021

Hi, I'm a physicist so I have a basic knowledge of probability and hypothesis testing etc. I would like to more sophistically calculate from available data in my country whether ones Covid infected people have a statistically significant different probability of reinfection than people who are infected the first time.

Let's define reinfection as two infections (proved by the test) with at least a 60 days period between them or more.

An available data are:
Numbers of total infected, number of totals healed, both day by day (ie I immediately know a status 60 days before) and three records of reinfected (data in January, February and March). I know that three records of reinfection are not much but... at least to get some guess.

My question is what procedure to use to find the answer to whether reinfection is more or less probable than ordinary infection.

Thank you all for comments.

Dale · Mar 23, 2021

You would probably use a Markov chain model. Maybe four states: uninfected, sick, dead, and recovered. There are no transitions into uninfected and no transitions out of dead. There is also no transition from uninfected to recovered. If you consider only COVID deaths then there would also be no transition from uninfected or recovered to dead.

With only three data points on the reinfection I doubt you will be able to get a good estimate of that probability.

Vrbic · Mar 23, 2021

Dale said:

You would probably use a Markov chain model. Maybe four states: uninfected, sick, dead, and recovered. There are no transitions into uninfected and no transitions out of dead. There is also no transition from uninfected to recovered.

With only three data points on the reinfection I doubt you will be able to get a good estimate of that probability.

Thank you for your advice, I try it!

phinds · Mar 23, 2021

Vrbic said:

... sophistically
... sophistically

Vrbic · Mar 23, 2021

Dale said:

You would probably use a Markov chain model. Maybe four states: uninfected, sick, dead, and recovered. There are no transitions into uninfected and no transitions out of dead. There is also no transition from uninfected to recovered. If you consider only COVID deaths then there would also be no transition from uninfected or recovered to dead.

With only three data points on the reinfection I doubt you will be able to get a good estimate of that probability.

Hi, again thank you for your response.
I read something about the Markov chain and I probably don't make it. Could you please send me some similar solved problem where I could learn them? Is it alright to change transition probabilities during the time? I mean in January probability of sickness (transition uninfected -> sick) or reinfection (recovered -> sick) will be different than in February etc.

The first time I was thinking about that I thought about some hypothesis testing (I'm familiar only with ANOVA). Isn't it a better approach to this problem?

Dale · Mar 23, 2021

Vrbic said:

Could you please send me some similar solved problem where I could learn them?

Not really. I am aware of Markov processes, but I have never actually done it myself so I have no practical experience or concrete suggestions to offer.

Maybe one of my colleagues here will have more insight.

Vrbic said:

Is it alright to change transition probabilities during the time?

Yes, but from what you described you probably do not have enough data to assess that.

Vrbic said:

The first time I was thinking about that I thought about some hypothesis testing (I'm familiar only with ANOVA). Isn't it a better approach to this problem?

You could do a logistic regression. I don’t know what would make it better, but it certainly could be done.

Vrbic · Mar 23, 2021

Dale said:

Not really. I am aware of Markov processes, but I have never actually done it myself so I have no practical experience or concrete suggestions to offer.

Maybe one of my colleagues here will have more insight.

Yes, but from what you described you probably do not have enough data to assess that.

You could do a logistic regression. I don’t know what would make it better, but it certainly could be done.

Thank you again, I will look at it.

BWV · Mar 23, 2021

Why waste time doing this yourself when you are not qualified? Do you want your doctor trying to do physics? Just search some real medical studies, like this:

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)00575-4/fulltext

During the first surge (ie, before June, 2020), 533 381 people were tested, of whom 11 727 (2·20%) were PCR positive, and 525 339 were eligible for follow-up in the second surge, of whom 11 068 (2·11%) had tested positive during the first surge. Among eligible PCR-positive individuals from the first surge of the epidemic, 72 (0·65% [95% CI 0·51–0·82]) tested positive again during the second surge compared with 16 819 (3·27% [3·22–3·32]) of 514 271 who tested negative during the first surge (adjusted RR 0·195 [95% CI 0·155–0·246]). Protection against repeat infection was 80·5% (95% CI 75·4–84·5). The alternative cohort analysis gave similar estimates (adjusted RR 0·212 [0·179–0·251], estimated protection 78·8% [74·9–82·1]). In the alternative cohort analysis, among those aged 65 years and older, observed protection against repeat infection was 47·1% (95% CI 24·7–62·8). We found no difference in estimated protection against repeat infection by sex (male 78·4% [72·1–83·2] vs female 79·1% [73·9–83·3]) or evidence of waning protection over time (3–6 months of follow-up 79·3% [74·4–83·3] vs ≥7 months of follow-up 77·7% [70·9–82·9]).
Interpretation
Our findings could inform decisions on which groups should be vaccinated and advocate for vaccination of previously infected individuals because natural protection, especially among older people, cannot be relied on.

FactChecker · Mar 23, 2021

It may help if you describe what you want to statistically test.
Do you want to show, with some certainty, that the rate of reinfection is reduced by the vaccine? If so, consider using a Chi-square test of homogeneity. See https://stats.stackexchange.com/questions/226789/how-to-compare-ratios-in-r.
Do you want to show, with some certainty, that the rate of reinfection is 0 after the vaccine? If so, consider using a Chi-square goodness of fit test.
There are several options. See this for some.
Keep in mind that many statistical tests are typically used so that the null hypothesis is retained unless there is very strong statistical evidence otherwise. I am not sure that there is enough data to reach those levels of significance.

Vrbic · Mar 23, 2021

BWV said:

Why waste time doing this yourself when you are not qualified? Do you want your doctor trying to do physics? Just search some real medical studies, like this:

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)00575-4/fulltext

Thank you very much, I will read it.

I understand your point and agree with it, but I don't want to let my doctor publish any physics result, but when it is his hobby, why do not let him have fun at a low level. And that was my query. An easy way how to do it (not precise, but use some solved problem from a book for example) and have fun with that.

Vrbic · Mar 23, 2021

FactChecker said:

It may help if you describe what you want to statistically test.
Do you want to show, with some certainty, that the rate of reinfection is reduced by the vaccine? If so, consider using a Chi-square test of homogeneity. See https://stats.stackexchange.com/questions/226789/how-to-compare-ratios-in-r.
Do you want to show, with some certainty, that the rate of reinfection is 0 after the vaccine? If so, consider using a Chi-square goodness of fit test.
There are several options. See this for some.
Keep in mind that many statistical tests are typically used so that the null hypothesis is retained unless there is very strong statistical evidence otherwise. I am not sure that there is enough data to reach those levels of significance.

Thank you! It seems very helpful.

I would like to test a hypothesis: The first infection and reinfection have the same probability.
I think it should be the first step, then if I refute this claim, and it will be enough data I would like to try to find the probability difference between the first infection and reinfection, but just to test this hypothesis is enough for now.

FactChecker · Mar 23, 2021

Vrbic said:

Thank you! It seems very helpful.

I would like to test a hypothesis: The first infection and reinfection have the same probability.
I think it should be the first step, then if I refute this claim, and it will be enough data I would like to try to find the probability difference between the first infection and reinfection, but just to test this hypothesis is enough for now.

One of the problems (there are many) with statistically analyzing this is that reinfection may be due to new variants that may not be in your past data at all. I'm afraid that there is no way to address that issue except by knowing the exact nature of the variation, how quickly it is spreading, where it is in the world, and what the consequences of the change are. There are many variants now and many more on the way.

Vrbic · Mar 23, 2021

FactChecker said:

One of the problems (there are many) with statistically analyzing this is that reinfection may be due to new variants that may not be in your past data at all.

I definitely agree, I may do a quite good guess by using date from an earlier era when only one mutation was here. And I understand, it means my result is lost in history because it will no longer the case.

I'm afraid that there is no way to address that issue except by knowing the exact nature of the variation, how quickly it is spreading, where it is in the world, and what the consequences of the change are. There are many variants now and many more on the way.

I understand, but as I mentioned earlier, I want to get familiar with some new statistic method no to find the correct result, so for me, the journey is the destination.

FactChecker · Mar 23, 2021

Vrbic said:

I definitely agree, I may do a quite good guess by using date from an earlier era when only one mutation was here. And I understand, it means my result is lost in history because it will no longer the case.

Not only "lost in history", but only analyzing history rather than predicting anything useful.

I understand, but as I mentioned earlier, I want to get familiar with some new statistic method no to find the correct result, so for me, the journey is the destination.

Fair enough. One thing that you should know about statistics is that it is very treacherous ground. There are many possible ways that results can be biased and conclusions can be wrong. There are problems with correlated variables, self-selecting samples, time-varying distributions, reversing cause-and-effect, etc., etc., etc. When you get experience in spotting the problems, you will see them every day in studies reported in the news.

Statistical analysis of COVID reinfection

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect