Using an odds ratio when data is sparse

  • Thread starter snowfox2004
  • Start date
  • Tags
    Data Ratio
In summary, the conversation discusses the use of logistic regression and odds ratios to determine the impact of different exposures on an outcome. The issue of sparsity in the data set is raised and whether this will affect the validity of the results. ANOVA and experimental design are suggested as potential solutions for analyzing the data and accounting for possible confounding variables and interactions between exposures. However, the applicability and effectiveness of these methods may depend on the assumptions made about the data and the potential for interactions between exposures. Additional research and analysis may be necessary to accurately determine the impact of individual exposures on the outcome.
  • #1
snowfox2004
7
0
Suppose I have around 20 exposures that potentially affect an outcome and I want to see which exposures have bigger impacts on the outcome. So I want to calculate each exposures' odds ratios by exponentiating the coefficients obtained from logistic regression. So I have the following input set and output set where 1 means it (exposure or outcome) is present and 0=not present:

4GTNy.png


So, for example, the first row represents a sample where exposure 1 wasn't present, exposure 2 was present,...exposure 20 was present and the outcome was present. I fit a logistic regression model to this data and exponentiate the coefficients to get odds ratios. The potential problem is that I am going to be working with a VERY sparse data set with many samples. There are many instances where almost all exposures except one or maybe two is going to be present in a sample. My question is if this sparsity is something to be concerned about and if this will make my method of comparing exposures using odds ratios a bad idea.

Page 6 of this paper http://www.epidemiology.ch/history/PDF%20bg/Greenland%20S%201987%20interpretation%20and%20choice%20of%20effect%20measures.pdf seems to imply that sparsity won't matter too much but I want to see what the statisticians here say. Any links to papers would be appreciated.
 
Physics news on Phys.org
  • #2
Your question is the basic question addressed by the statistical subjects of Analysis of Variance (ANOVA) and design of experiments. ANOVA tries to tell which factors are the main drivers of the outcome. Design of experiments tries to tell you what combinations of factors need to be included in a set of experiments to obtain valid statistical results. There are statistical software packages that can help you do the analysis.

One problem that your post does not mention is that the effects of exposures might depend on how they are combined with other exposures. If you are really sure that the effects are independent, the problem is much simpler.
 
  • #3
I thought that fitting the ENTIRE input set involving all the exposures to a logistic regression model would automatically adjust the odds ratios to account for possible confounding variables by this paper on page 319: http://www.iarc.fr/en/publications/pdfs-online/epi/cancerepi/CancerEpi-14.pdf

Thanks for the input on ANOVA. Does ANOVA work well even with sparse data?
 
  • #4
I have to admit that I can't make it through the unfamiliar (to me) terminology of your first reference and I couldn't open the link of your second reference. So I am not sure how they address the issue of interacting factors.

If you think about how many possible combinations there are of 20 possible exposure types, I'm sure you will agree that the number of possible combinations is practically infinite. Nothing can solve the problem unless the number of interacting effects is assumed to be very limited. Linear regression will assume that each exposure adds a certain amount, regardless of what other exposures are present. If you accept that, I think your method should work. You can also judiciously introduce additional variables for the combinations which you suspect might affect each other. ANOVA is a general study of the effects of multiple factors, including low order interactions. Experimental design helps you to design experiments that are efficient. Its main concern is to design experiments where you can draw valid conclusions from sparse data. If you already have your data, it may be too late for that.
 
  • #5


I would suggest considering an alternative approach to analyzing your data. While using odds ratios may seem like a straightforward method for comparing exposures, the sparsity of your data may lead to unreliable results. This is because odds ratios are highly sensitive to rare events, and with a small number of samples, the odds ratios may be inflated or deflated.

Instead, I would recommend using a different measure such as risk ratios or risk differences, which are less sensitive to rare events and can provide more stable estimates. Additionally, you may want to consider using a different statistical method such as Bayesian analysis, which can handle sparse data more effectively.

Furthermore, I would also suggest carefully examining your sample size and considering if it is sufficient for the number of exposures you are trying to analyze. If your sample size is too small, it may be difficult to draw reliable conclusions about the impact of each exposure on the outcome.

In conclusion, while sparsity may not completely invalidate your method of comparing exposures using odds ratios, it is something to be cautious of. I would recommend exploring alternative statistical methods and carefully considering the sample size before drawing any conclusions from your data. Lastly, I would suggest consulting with a statistician for further guidance and to ensure that your analysis is appropriate for your specific research question and data.
 

1. What is an odds ratio?

An odds ratio is a statistical measure used to compare the odds of an event occurring in one group to the odds of it occurring in another group. It is commonly used in medical and social science research to determine the relationship between two variables.

2. How is an odds ratio calculated?

The odds ratio is calculated by dividing the odds of an event occurring in one group by the odds of it occurring in another group. It can be calculated using a 2x2 contingency table or by using statistical software.

3. When is an odds ratio used?

An odds ratio is used when the data being analyzed is categorical and the groups being compared are not independent. It is commonly used in case-control studies, where the cases and controls are matched on certain characteristics.

4. What is considered a "sparse" data set for using an odds ratio?

A data set is considered sparse when there are a small number of observations in each group being compared. This can lead to unstable or unreliable results when using an odds ratio, as there may not be enough data to accurately estimate the odds of the event occurring in each group.

5. Are there any limitations to using an odds ratio with sparse data?

Yes, there are limitations to using an odds ratio with sparse data. As mentioned before, the results may be unstable or unreliable due to the small number of observations. Additionally, the confidence intervals for the odds ratio may be wide, making it difficult to determine the true effect size. It is important to interpret the results with caution and consider using other statistical measures if possible.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Calculus and Beyond Homework Help
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
Replies
152
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • STEM Academic Advising
Replies
10
Views
4K
Replies
99
Views
33K
Back
Top