Difference between relative risk and odds ratio

jaumzaum · Sep 1, 2022

Hello!

I was studying odds ratio and its relation to relative risk. By what I understood, the statistics that is indeed important for us and that have a nice interpretation for the context is relative risk (I was also wondering if odds ratio has any interpretation). But relative risk sometimes is difficult or expansive to calculate, because we need the prevalence of the disease, and for rare diseases this means a big sample.
So what many studies do is to calculate the odds ratio and interpret it as being nearly the relative risk. And that's true for rare diseases.

I have two things that I didn't understand, considering this example from wikipedia:
This is a village from 1000 people and we want to study if a radiation leak increased the incidence of a disease.

RR = (20/400)/(6/600)=5
OR = (20/380)/(6/594)=5.2
I understand how this values were calculated.
Then they say that, for many studies, we rarely have information about the whole population. They take a sample of 50 cases, 25 that were exposed to the radiation leak and 25 that weren't:

And they calculate de OR = (20/10)/(6/16)=5.3
They say the OR of the sample is a good estimate of the OR of the population, and then, using the rare disease assumption, that it is also a good estimation of the relative risk of the population.
But when we try to calculate the relative risk for the sample, we have some problems:
RR = (20/30)/(6/22)=2,4

My question is, why is the OR of the sample a good estimate of the OR of the population, but not the RR?
And also, if there is any interpretation for the OR?

Thanks !

gleem · Sep 1, 2022

This article may be helpful. https://www.ncbi.nlm.nih.gov/pmc/ar...ive risk (also known,event in the other group.

jaumzaum · Sep 1, 2022

Thanks @gleem, I've read the article, and it provides very good examples, but it still doesn't answer my primarily question, why is the OR of a sample a good estimate of the OR of the population? And also, if OR has any real meaning/interpretation, or if it is only a mathematical tool to estimate Relative Risk?

FactChecker · Sep 1, 2022

Now I find out that I have been using the term "odds" wrong all this time!

gleem · Sep 1, 2022

jaumzaum said:

why is the OR of a sample a good estimate of the OR of the population?

It would seem to me that it becomes a better predictor of the population as the sample size increases like any other statistic. How well it predicts the actual value depends on its variance.

jaumzaum said:

Then they say that, for many studies, we rarely have information about the whole population. They take a sample of 50 cases, 25 that were exposed to the radiation leak and 25 that weren't:

Your data and table are not consistent Healthy + disease should = 25 for exposed and unexposed.

I redid your table

	Disease	Healthy
exposed	15	10
unexposed	6	19

OR =(15/10) / (6/19) = 4.75
RR = (15/25) / (6/25) = 2.5

jaumzaum said:

And also, if OR has any real meaning/interpretation, or if it is only a mathematical tool to estimate Relative Risk?

If in my table above, I normalize the healthy unexposed persons to the healthy exposed i.e. 10, then for every 10 healthy unexposed persons we have 3.157 sick unexposed persons which give 4.75 exposed sick persons for every unexposed healthy person.

PeroK · Sep 1, 2022

The two measures are very different. The maximum value for the RR numeraror is ##1##. Whereas, there is no maximum for the OR numerator. Meanwhile, assuming, a low natural disease rate, the denominator may be similar in both cases.

The two, therefore, have different ranges.

You have to understand how to interpret the two numbers. The calculation of actual risk should be similar in both cases, but will need different calculations.

jaumzaum · Sep 1, 2022

Thanks @gleem and @PeroK

gleem said:

It would seem to me that it becomes a better predictor of the population as the sample size increases like any other statistic. How well it predicts the actual value depends on its variance.
Your data and table are not consistent Healthy + disease should = 25 for exposed and unexposed.

I redid your table

Disease Healthy
exposed 15 10
unexposed 6 19

OR =(15/10) / (6/19) = 4.75
RR = (15/25) / (6/25) = 2.5
If in my table above, I normalize the healthy unexposed persons to the healthy exposed i.e. 10, then for every 10 healthy unexposed persons we have 3.157 sick unexposed persons which give 4.75 exposed sick persons for every unexposed healthy person.

Sorry for the misspelling. When I said "They take a sample of 50 cases, 25 that were exposed to the radiation leak and 25 that weren't", the correct was "They take a sample of 52 cases, 26 have the disease and 26 doesn't"

You said

gleem said:

How well it predicts the actual value depends on its variance.

I understand that, but it seems that the odds ratio of a sample has always a smaller variance than the relative risk for any given sample size, and thus provide a better estimate. I wish to understand why this is true.

In Wikipedia they also say that OR is a much easier and less expensive approach to estimate the RR, because we can extrapolate the OR of the sample, but we cannot extrapolate the RR of the sample if it is not big enough. Indeed, in the example given, the OR was 5.3 and the RR was 2.4, recalling that the OR and RR of population were 5.2 and 5 respectively

Office_Shredder · Sep 1, 2022

As a mathematician this confused me for a bit, but then I realized this is amazing.

I'm going to pick some new numbers. Suppose a town of about twenty thousand people has 130 cases of testicular cancer. You think that's a big high, so you go to investigate.

It turns out 50 of them refused to forward a chain mail that threatened to turn big men into little girls if it wasn't forwarded.

There's no way this is a real effect is is? But think of the publicity if you can manage to squeak out a .05 p value. So you decide to dig further.

The first thing you need to know is what fraction of the men in this town refused to forward the email. Let's say the real numbers are: there are 8500 men. 500 of them refused to forward the chain mail, and 10% of them got cancer. 8000 of them either didn't get the chain mail, or wisely forwarded it to ten recipients. 1% of them got cancer.

The relative risk is therefore 10.
The odds ratio is (50/450)/(80/7920)=11
But how can we get these numbers? What if we don't even know how many men are in the town? You could go house to house and ask everyone in town, but that's a lot of work, and you're only doing it if you can squeeze some grant money out of this sucker. So you decide to post up at the local McDonald's and just ask every man that comes through about it. Hopefully that's a representative sample to submit to the nih.

At the end of the day, 18 men are willing to respond to your survey. 17 say they did not get the email, or they forwarded it. 1 says they refused.

We call this the healthy sample.

Note the true ratio is 450/ 7920 people, or 1 out of 17.6, so we got almost perfect sample. About one out of every 450 healthy men in the town came through mcdonalds and answered our survey. Of course, we don't actually know this.Let's compute some odds. 50 people got cancer per 1 respondent: 50/1

80 people got cancer out of 17 respondents: 80/17

(50/1)/(80/17) = (50/450)/(80/7650)=10.6

I just multiplied the denominators by 450 to prove the point. This is almost exactly the same odds ratio as before. In fact, if we had a perfect sample of people who did not have cancer, we would get exactly the same odds ratio as before. If you multiply both denominators by .01 for example, the number is unchanged. In some sense both fractions have the same units, something like people who got cancer per 1 450th of a person who didn't get cancer, so dividing one by the other is a reasonable ratio.Now let's compute relative risk. Well (50/(50+1))/(80/(80+17)=1.19

Fundamentally, what went wrong is we're adding two incompatible numbers. 50 is all the people who refused to send the mail that got cancer. 1 is a very small fraction of the people who didn't get cancer. So 50+1 is not a meaningful number to divide 50 by. It has no meaningful units. If you wanted to get the right answer, you could do something like know your sample is 1/450th of the healthy population, and try to compute the true risks. This requires you to know that 450 number though, which often you don't. All we know is in our sample of all the sick people and a couple of healthy people, almost everyone has cancer, which doesn't help.

Difference between relative risk and odds ratio

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Difference between relative risk and odds ratio

Similar threads