- #1

- 177

- 24

I remember an example of application of the logistic regression to medicine / epidemiology, which said (more or less) that the probability of a person having miocardial infarction was related to some variables such as age, cholesterol level, etc, and the equation included the various 'thresholds' for each of these variables.

Something like: c

_{0}+ c

_{age}(age - 50) + c

_{chol}(chol - 200) + ...

This was the x in the logistic formula P=1/(1+e

^{-x}).

If the coefficients are all positive, it follows that when age > 50 and chol > 200, a positive contribution is given to x by these two variables, which makes e

^{-x}smaller, and P closer to 1.

Now my question is, how did they find the thresholds (50 and 200) for age and chol?

If I had data on age, cholesterol, etc, vs presence/absence of the disease, and ran a logistic regression, I think I would get something like this:

x = a

_{0}+ a

_{age}age + a

_{chol}chol + ...

I.e. I would only know that age and chol increase P, but not 'when' someone should start to worry about their age and cholesterol.

Am I completely off the mark here, or is there a technique to calculate these thresholds from the data?

Thanks!

L