Suppose I have around 20 exposures that potentially affect an outcome and I want to see which exposures have bigger impacts on the outcome. So I want to calculate each exposures' odds ratios by exponentiating the coefficients obtained from logistic regression. So I have the following input set and output set where 1 means it (exposure or outcome) is present and 0=not present:(adsbygoogle = window.adsbygoogle || []).push({});

So, for example, the first row represents a sample where exposure 1 wasn't present, exposure 2 was present,...exposure 20 was present and the outcome was present. I fit a logistic regression model to this data and exponentiate the coefficients to get odds ratios. The potential problem is that I am going to be working with a VERY sparse data set with many samples. There are many instances where almost all exposures except one or maybe two is going to be present in a sample. My question is if this sparsity is something to be concerned about and if this will make my method of comparing exposures using odds ratios a bad idea.

Page 6 of this paper http://www.epidemiology.ch/history/PDF%20bg/Greenland%20S%201987%20interpretation%20and%20choice%20of%20effect%20measures.pdf seems to imply that sparsity won't matter too much but I want to see what the statisticians here say. Any links to papers would be appreciated.

**Physics Forums - The Fusion of Science and Community**

The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

# Using an odds ratio when data is sparse

Loading...

Similar Threads - Using odds ratio | Date |
---|---|

B Why is the mode usually not as useful? | Jan 24, 2018 |

I Confusion over using integration to find probability | Jun 27, 2017 |

A Estimation of Hurst Exponent Using Rescaled Range | May 4, 2017 |

A Observing interactions with plots using est. coeff. | Apr 24, 2017 |

Odds of genetic trait inheritance using probability? | Apr 30, 2012 |

**Physics Forums - The Fusion of Science and Community**