Assumptions behind the OLS regression model?

Click For Summary

Discussion Overview

The discussion centers on the assumptions underlying the ordinary least squares (OLS) regression model, particularly focusing on the implications of these assumptions regarding normality and variance of the dependent variable and residuals. Participants explore the necessity of these assumptions for different purposes, such as parameter estimation, efficiency, and statistical testing.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant notes that the OLS model assumes residuals are independent and identically distributed as normal, which affects the model's ability to predict variability in the data.
  • Another participant argues that the distribution of the dependent variable and properties of the error term are not critical if the goal is merely to obtain estimates of model parameters.
  • Concerns are raised about the efficiency of the OLS method if the error term does not have constant variance, suggesting that this could undermine the model's effectiveness.
  • Statistical tests based on OLS results require attention to the normality of the dependent variable and the error term, indicating a need for careful consideration of these assumptions in certain contexts.

Areas of Agreement / Disagreement

Participants express differing views on the importance of the assumptions for various purposes, indicating that there is no consensus on their necessity across all contexts.

Contextual Notes

Some limitations are noted regarding the understanding of the implications of assumption violations, particularly in relation to the model's predictive capabilities and the efficiency of parameter estimates.

musicgold
Messages
303
Reaction score
19
Hi,

In many statistics textbooks I read the following text: “A models based on ordinary linear regression equation models Y, the dependent variable, as a normal random variable, whose mean is linear function of the predictors, b0 + b1*X1 + ... , and whose variance is constant. While generalized linear models extend the linear model in two ways. First, assumption of linearity in the parameters is relaxed, by introducing the link function. Second, error distributions other than the normal can be modeled.”

My Stat teacher never bothered to explain these things to us. He started the regression lesson with the equation Y = b0 + b1 * X1, and an example based on the Weight and Height relation. He never talked about these assumptions about normality and the variance.

As a result for quite some time, I treated this equation was an identity, similar to Assets= Liability + Equity. I have never understood what difference those underlying assumptions make.

Can anybody please explain me why these assumptions are required for this model, and what happens to the result of this model if these assumptions are violated?

Thanks,

MG.
 
Physics news on Phys.org
Hello,

The model is requires your residuals or errors to be independent and identically distributed as N(0,δ).

The reason for this assumptions is that for this model, a variability that fits is quantified by the model, the part that is explained by the variables + the part not explained (residuals).

The dependent variable is normally distributed because of the residuals.

If the assumptions are violated, then we cannot know if the model is able to predict part of the variability in the data.

You can check assumptions by using normal quantile plots (check normality) and residual plots (check constant variance).
 
Cylovenom,

Thanks.
 
It all depends on why you are estimating your model. If all you need is to obtain estimates of model parameters, then you don't need to worry about the distribution of Y or the properties of the error term.

If you need your estimation to be efficient (minimum variance), then you need to worry about whether the error term has constant variance -- if not, the simple regression (OLS) method will falter.

If you need to compute statistical tests based on your results, then you need to pay attention to whether Y is distributed normally as well as the properties of the error term.
 

Similar threads

Replies
3
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 8 ·
Replies
8
Views
3K