Continuous output: logistic vs linear regression

Click For Summary

Discussion Overview

The discussion revolves around the appropriateness of using logistic regression versus linear regression when analyzing continuous output data. Participants explore the conditions under which logistic regression might be considered, particularly in the context of dichotomizing continuous variables.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • One participant suggests that linear regression is sufficient for analyzing a positive trend in continuous data, questioning the necessity of logistic regression unless the output is binary.
  • Another participant proposes that logistic regression could be relevant if the continuous variable exhibits a strongly bimodal distribution, although they express uncertainty about its added value in such cases.
  • A different viewpoint emphasizes that if the data is not bimodal, logistic regression may not yield meaningful insights, as it would only provide log odds based on an arbitrary cut.
  • It is noted that logistic regression might be beneficial if the data contains outliers or violates OLS assumptions, particularly when seeking a binary outcome.
  • One participant reiterates the initial question about the relevance of logistic regression, emphasizing that linear regression can adequately address the inquiry regarding trends in continuous data.

Areas of Agreement / Disagreement

Participants express differing opinions on the utility of logistic regression for continuous output data. There is no consensus on when or if logistic regression should be applied in these scenarios, indicating ongoing debate and uncertainty.

Contextual Notes

Participants mention various conditions such as bimodal distributions, outliers, and OLS assumptions that may influence the choice between regression methods, but these factors remain unresolved in the discussion.

FallenApple
Messages
564
Reaction score
61
so say I suspect that there is a positive trend in the data from the scatter plot. Say the output y is continuous.

A linear regression would give me a possitive estimate of the slope. For a one unit increase in x, I would get a so and so increase in y.

I can also split the data for the y variable between high and low, dichotomizing it. And calculate the estimated increase in log odds for a one unit increase in x.

Is there even a point in doing so?

It seems like the question can be answered using linear regression.

I don't see the point in using logistic regression unless the output is necessarily binary (gender, political affiliation etc)

Even if we are interested in a output of,say,having high income vs low income, you can just have income as a continuous spectrum and use ols to get the answer.

Is there something I'm missing?
 
Physics news on Phys.org
Yes, logistic regression was designed for cases with a binary result, like death or loan default. The only circumstance in which I can imagine it might be worth considering a logistic regression for a continuous result variable is where the variable's distribution is strongly bimodal, with nearly all values clustering around one or the other of two widely separated points, and very low probability densities in between. Even then I'm not sure what value it would add, but it might add something.
 
  • Like
Likes   Reactions: FallenApple
andrewkirk said:
Yes, logistic regression was designed for cases with a binary result, like death or loan default. The only circumstance in which I can imagine it might be worth considering a logistic regression for a continuous result variable is where the variable's distribution is strongly bimodal, with nearly all values clustering around one or the other of two widely separated points, and very low probability densities in between. Even then I'm not sure what value it would add, but it might add something.

That makes sense. Because we can meaningfully cut the data and categorize it between high and low values. If the data were not bimodal, then we can't make a meaningful cut and any logistic analysis that follow would only give the log odds of passing that cut, which isn't even valid in the first place.
 
If your data has outliers, or if it violates any assumptions of the OLS regression, and provided you are looking for a binary answer, it could be a good idea to transform the response variable to a binary one and do a logistic regression instead. It would be interesting to compare both though!
 
Last edited:
FallenApple said:
so say I suspect that there is a positive trend in the data from the scatter plot. Say the output y is continuous.

A linear regression would give me a possitive estimate of the slope. For a one unit increase in x, I would get a so and so increase in y.

I can also split the data for the y variable between high and low, dichotomizing it. And calculate the estimated increase in log odds for a one unit increase in x.

Is there even a point in doing so?

It seems like the question can be answered using linear regression.

I don't see the point in using logistic regression unless the output is necessarily binary (gender, political affiliation etc)

Even if we are interested in a output of,say,having high income vs low income, you can just have income as a continuous spectrum and use ols to get the answer.

Is there something I'm missing?

It depends. If you want to spot a trend, check regression. The output will tell you whether there is a "reasonable" linear regression ( if the confidence interval of the slope does not include 0 ); you also check the value of the coefficient r, so that is "high-enough". As Andrew said, logistic regression is most often used in classification: you set up a cutoff point , e.g., you have a yes beyond your chosen cutoff point and a no otherwise.
 

Similar threads

Replies
3
Views
3K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 64 ·
3
Replies
64
Views
6K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K