Suppose I have around 20 exposures that potentially affect an outcome and I want to see which exposures have bigger impacts on the outcome. So I want to calculate each exposures' odds ratios by exponentiating the coefficients obtained from logistic regression. So I have the following input set and output set where 1 means it (exposure or outcome) is present and 0=not present: So, for example, the first row represents a sample where exposure 1 wasn't present, exposure 2 was present,...exposure 20 was present and the outcome was present. I fit a logistic regression model to this data and exponentiate the coefficients to get odds ratios. The potential problem is that I am going to be working with a VERY sparse data set with many samples. There are many instances where almost all exposures except one or maybe two is going to be present in a sample. My question is if this sparsity is something to be concerned about and if this will make my method of comparing exposures using odds ratios a bad idea. Page 6 of this paper http://www.epidemiology.ch/history/PDF%20bg/Greenland%20S%201987%20interpretation%20and%20choice%20of%20effect%20measures.pdf seems to imply that sparsity won't matter too much but I want to see what the statisticians here say. Any links to papers would be appreciated.