Error in (Multi)linear Regression

Click For Summary

Discussion Overview

The discussion revolves around the conditions necessary to justify the use of (multi)linear regression models for data analysis. Participants explore the assumptions regarding error distributions, particularly the role of normality, and the implications for model reliability and coefficient estimation.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • One participant questions the necessity of normality in error distribution for justifying linear regression, noting that some sources only require errors to be i.i.d. with a mean of zero.
  • Another participant suggests that if errors are zero mean i.i.d. normal, the multi-linear model serves as an unbiased estimator, but acknowledges that different distributions might necessitate alternative models.
  • A participant inquires about scenarios where a non-linear model may be more appropriate than a linear model, specifically regarding the interpretation of linearity in coefficients.
  • One example provided involves modeling the height of a falling object, indicating that a quadratic model may be more suitable due to the nature of the data.

Areas of Agreement / Disagreement

Participants express differing views on the necessity of normality in error distributions for linear regression. While some suggest it is essential for unbiased estimation, others argue that linear models can still perform adequately under various conditions. The discussion remains unresolved regarding the strict requirements for justifying linear regression.

Contextual Notes

Limitations include the lack of consensus on the definition of "justify" in the context of linear regression and the potential variability in error distributions that may affect model choice.

WWGD
Science Advisor
Homework Helper
Messages
7,783
Reaction score
13,038
Hi,
I keep reading varying accounts on conditions needed to " justify" the use of ( multi) linear regression to model data.

Specifically, I have seen several authors require errors to be normal, i.i.d , whilr others only require the errors be i.i.d with mean 0. Just where is the assumption of normality used to justify the use of linear models? I know of Gauss Mark of, but this seems too strong. I've heard that it is used to find the distribution of the coefficients and determine reliable confidence intervals for the coefficients? If so, do you suggest a source? If not, can you explain?
 
Physics news on Phys.org
I guess the question is what does "justify" mean. If the errors are zero mean iid normal and equally distributed, then I think the multi linear model is an unbiased estimator of the actual values. If the errors are otherwise distributed then a different model (i.e. different coefficients) might be a better choice.

But in practice, a multi linear model works pretty well in lots of other situations, and it's often hard to know that the errors actually look like.
 
  • Like
Likes   Reactions: WWGD
Thank you. Just curious, as an aside, do you know of situations that cannot be modeled (log)linearly; with linearity meaning linearity in the coefficients?
 
You just mean give a scenario where a non linear model is better? Sure, suppose you drop an object, and write down the time and the height at those times as it falls. The height measurement has some noise to it. Then the right model for the height is going to be something like ##h(t)=-\frac{g}{2}t^2+h_0## if it starts at a height of ##h_0##.

I feel like there's a good chance I did not understand the question.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
Replies
3
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K