Continuous output: logistic vs linear regression

In summary: You then use a logistic regression to see what fraction of your yes cases will pass the cutoff point. If the cutoff point is set too high, you'll get false positives; if it's set too low, you'll get false negatives.
  • #1
FallenApple
566
61
so say I suspect that there is a positive trend in the data from the scatter plot. Say the output y is continuous.

A linear regression would give me a possitive estimate of the slope. For a one unit increase in x, I would get a so and so increase in y.

I can also split the data for the y variable between high and low, dichotomizing it. And calculate the estimated increase in log odds for a one unit increase in x.

Is there even a point in doing so?

It seems like the question can be answered using linear regression.

I don't see the point in using logistic regression unless the output is necessarily binary (gender, political affiliation etc)

Even if we are interested in a output of,say,having high income vs low income, you can just have income as a continuous spectrum and use ols to get the answer.

Is there something I'm missing?
 
Physics news on Phys.org
  • #2
Yes, logistic regression was designed for cases with a binary result, like death or loan default. The only circumstance in which I can imagine it might be worth considering a logistic regression for a continuous result variable is where the variable's distribution is strongly bimodal, with nearly all values clustering around one or the other of two widely separated points, and very low probability densities in between. Even then I'm not sure what value it would add, but it might add something.
 
  • Like
Likes FallenApple
  • #3
andrewkirk said:
Yes, logistic regression was designed for cases with a binary result, like death or loan default. The only circumstance in which I can imagine it might be worth considering a logistic regression for a continuous result variable is where the variable's distribution is strongly bimodal, with nearly all values clustering around one or the other of two widely separated points, and very low probability densities in between. Even then I'm not sure what value it would add, but it might add something.

That makes sense. Because we can meaningfully cut the data and categorize it between high and low values. If the data were not bimodal, then we can't make a meaningful cut and any logistic analysis that follow would only give the log odds of passing that cut, which isn't even valid in the first place.
 
  • #4
If your data has outliers, or if it violates any assumptions of the OLS regression, and provided you are looking for a binary answer, it could be a good idea to transform the response variable to a binary one and do a logistic regression instead. It would be interesting to compare both though!
 
Last edited:
  • #5
FallenApple said:
so say I suspect that there is a positive trend in the data from the scatter plot. Say the output y is continuous.

A linear regression would give me a possitive estimate of the slope. For a one unit increase in x, I would get a so and so increase in y.

I can also split the data for the y variable between high and low, dichotomizing it. And calculate the estimated increase in log odds for a one unit increase in x.

Is there even a point in doing so?

It seems like the question can be answered using linear regression.

I don't see the point in using logistic regression unless the output is necessarily binary (gender, political affiliation etc)

Even if we are interested in a output of,say,having high income vs low income, you can just have income as a continuous spectrum and use ols to get the answer.

Is there something I'm missing?

It depends. If you want to spot a trend, check regression. The output will tell you whether there is a "reasonable" linear regression ( if the confidence interval of the slope does not include 0 ); you also check the value of the coefficient r, so that is "high-enough". As Andrew said, logistic regression is most often used in classification: you set up a cutoff point , e.g., you have a yes beyond your chosen cutoff point and a no otherwise.
 
1.

What is the difference between logistic and linear regression?

Logistic regression is a type of regression analysis used to predict binary outcomes, while linear regression is used to predict continuous numerical outcomes.

2.

When should I use logistic regression instead of linear regression?

Logistic regression should be used when the dependent variable is binary (i.e. has only two possible outcomes), while linear regression can be used for any type of continuous outcome.

3.

What are the advantages of using logistic regression over linear regression?

Logistic regression can handle non-linear relationships between the independent and dependent variables, and is better suited for predicting categorical outcomes.

4.

Can I use logistic regression for continuous data?

No, logistic regression is not appropriate for continuous data. It is only used for predicting binary outcomes.

5.

Which type of regression is better for classification tasks?

Logistic regression is typically preferred for classification tasks, as it provides a probability score for each class and allows for easily interpretable results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
805
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
457
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
442
Back
Top