Using an odds ratio when data is sparse

  • Context: Graduate 
  • Thread starter Thread starter snowfox2004
  • Start date Start date
  • Tags Tags
    Data Ratio
Click For Summary

Discussion Overview

The discussion revolves around the use of odds ratios derived from logistic regression in the context of sparse data when analyzing multiple exposures that may affect an outcome. Participants explore the implications of data sparsity on the validity of comparing exposures using odds ratios, as well as the relevance of statistical methods like ANOVA and experimental design.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant expresses concern about the impact of sparsity in the dataset on the reliability of odds ratios calculated from logistic regression.
  • Another participant suggests that the issues raised relate to ANOVA and experimental design, noting that these methods aim to identify main drivers of outcomes and the necessary combinations of factors for valid results.
  • A different participant questions whether fitting the entire input set in logistic regression would adequately adjust for confounding variables, referencing a specific paper for support.
  • One participant highlights the complexity of interactions among exposures, suggesting that if effects are assumed to be independent, the analysis may be simpler.
  • Concerns are raised about the infinite combinations of exposures and how this complicates the analysis, with a suggestion that linear regression assumes additive effects of exposures.
  • There is mention of the potential for introducing additional variables to account for suspected interactions among exposures.
  • ANOVA is discussed as a method that can handle multiple factors and low-order interactions, but its effectiveness with sparse data is questioned.

Areas of Agreement / Disagreement

Participants express differing views on the implications of sparse data for odds ratio calculations and the effectiveness of ANOVA in such contexts. There is no consensus on whether the proposed methods will yield valid results given the sparsity of the data.

Contextual Notes

Participants note limitations related to the assumptions about interactions among exposures and the potential for confounding variables, which remain unresolved in the discussion.

snowfox2004
Messages
7
Reaction score
0
Suppose I have around 20 exposures that potentially affect an outcome and I want to see which exposures have bigger impacts on the outcome. So I want to calculate each exposures' odds ratios by exponentiating the coefficients obtained from logistic regression. So I have the following input set and output set where 1 means it (exposure or outcome) is present and 0=not present:

4GTNy.png


So, for example, the first row represents a sample where exposure 1 wasn't present, exposure 2 was present,...exposure 20 was present and the outcome was present. I fit a logistic regression model to this data and exponentiate the coefficients to get odds ratios. The potential problem is that I am going to be working with a VERY sparse data set with many samples. There are many instances where almost all exposures except one or maybe two is going to be present in a sample. My question is if this sparsity is something to be concerned about and if this will make my method of comparing exposures using odds ratios a bad idea.

Page 6 of this paper http://www.epidemiology.ch/history/PDF%20bg/Greenland%20S%201987%20interpretation%20and%20choice%20of%20effect%20measures.pdf seems to imply that sparsity won't matter too much but I want to see what the statisticians here say. Any links to papers would be appreciated.
 
Physics news on Phys.org
Your question is the basic question addressed by the statistical subjects of Analysis of Variance (ANOVA) and design of experiments. ANOVA tries to tell which factors are the main drivers of the outcome. Design of experiments tries to tell you what combinations of factors need to be included in a set of experiments to obtain valid statistical results. There are statistical software packages that can help you do the analysis.

One problem that your post does not mention is that the effects of exposures might depend on how they are combined with other exposures. If you are really sure that the effects are independent, the problem is much simpler.
 
I thought that fitting the ENTIRE input set involving all the exposures to a logistic regression model would automatically adjust the odds ratios to account for possible confounding variables by this paper on page 319: http://www.iarc.fr/en/publications/pdfs-online/epi/cancerepi/CancerEpi-14.pdf

Thanks for the input on ANOVA. Does ANOVA work well even with sparse data?
 
I have to admit that I can't make it through the unfamiliar (to me) terminology of your first reference and I couldn't open the link of your second reference. So I am not sure how they address the issue of interacting factors.

If you think about how many possible combinations there are of 20 possible exposure types, I'm sure you will agree that the number of possible combinations is practically infinite. Nothing can solve the problem unless the number of interacting effects is assumed to be very limited. Linear regression will assume that each exposure adds a certain amount, regardless of what other exposures are present. If you accept that, I think your method should work. You can also judiciously introduce additional variables for the combinations which you suspect might affect each other. ANOVA is a general study of the effects of multiple factors, including low order interactions. Experimental design helps you to design experiments that are efficient. Its main concern is to design experiments where you can draw valid conclusions from sparse data. If you already have your data, it may be too late for that.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 3 ·
Replies
3
Views
6K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 152 ·
6
Replies
152
Views
11K
  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 99 ·
4
Replies
99
Views
37K