Error in (Multi)linear Regression

Click For Summary
SUMMARY

The discussion centers on the conditions necessary to justify the use of (multi)linear regression for modeling data. Key points include the debate over whether errors must be normally distributed or simply independent and identically distributed (i.i.d) with a mean of zero. The assumption of normality is primarily used to derive the distribution of coefficients and establish reliable confidence intervals. While normality strengthens the model's validity, practitioners often find that (multi)linear models perform adequately even when these assumptions are not strictly met.

PREREQUISITES
  • Understanding of (multi)linear regression principles
  • Familiarity with statistical concepts such as i.i.d and normal distribution
  • Knowledge of Gauss-Markov theorem
  • Basic grasp of model evaluation metrics and confidence intervals
NEXT STEPS
  • Study the Gauss-Markov theorem and its implications for linear regression
  • Learn about the assumptions of linear regression, focusing on error distribution
  • Explore alternative modeling techniques for non-linear relationships
  • Investigate methods for assessing model fit and residual analysis
USEFUL FOR

Statisticians, data scientists, and researchers involved in regression analysis or modeling data who seek to understand the assumptions and limitations of (multi)linear regression.

WWGD
Science Advisor
Homework Helper
Messages
7,777
Reaction score
13,011
Hi,
I keep reading varying accounts on conditions needed to " justify" the use of ( multi) linear regression to model data.

Specifically, I have seen several authors require errors to be normal, i.i.d , whilr others only require the errors be i.i.d with mean 0. Just where is the assumption of normality used to justify the use of linear models? I know of Gauss Mark of, but this seems too strong. I've heard that it is used to find the distribution of the coefficients and determine reliable confidence intervals for the coefficients? If so, do you suggest a source? If not, can you explain?
 
Physics news on Phys.org
I guess the question is what does "justify" mean. If the errors are zero mean iid normal and equally distributed, then I think the multi linear model is an unbiased estimator of the actual values. If the errors are otherwise distributed then a different model (i.e. different coefficients) might be a better choice.

But in practice, a multi linear model works pretty well in lots of other situations, and it's often hard to know that the errors actually look like.
 
  • Like
Likes   Reactions: WWGD
Thank you. Just curious, as an aside, do you know of situations that cannot be modeled (log)linearly; with linearity meaning linearity in the coefficients?
 
You just mean give a scenario where a non linear model is better? Sure, suppose you drop an object, and write down the time and the height at those times as it falls. The height measurement has some noise to it. Then the right model for the height is going to be something like ##h(t)=-\frac{g}{2}t^2+h_0## if it starts at a height of ##h_0##.

I feel like there's a good chance I did not understand the question.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
Replies
3
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K