Assumptions behind the OLS regression model?

Click For Summary
SUMMARY

The discussion centers on the assumptions underlying the Ordinary Least Squares (OLS) regression model, specifically the requirement for residuals to be independent and identically distributed as N(0,δ). It highlights that the dependent variable must be normally distributed due to these residuals. Violating these assumptions can lead to unreliable predictions and inefficient estimations. The conversation emphasizes the importance of checking normality through normal quantile plots and constant variance via residual plots.

PREREQUISITES
  • Understanding of Ordinary Least Squares (OLS) regression
  • Familiarity with statistical concepts such as residuals and normal distribution
  • Knowledge of model parameter estimation techniques
  • Experience with diagnostic plots like normal quantile plots and residual plots
NEXT STEPS
  • Explore the implications of violating OLS assumptions in regression analysis
  • Learn how to perform normality tests for residuals in R or Python
  • Investigate generalized linear models and their advantages over OLS
  • Study methods for improving model efficiency when assumptions are not met
USEFUL FOR

Statisticians, data analysts, and researchers involved in regression modeling who seek to understand the foundational assumptions of OLS and their impact on model reliability and efficiency.

musicgold
Messages
303
Reaction score
19
Hi,

In many statistics textbooks I read the following text: “A models based on ordinary linear regression equation models Y, the dependent variable, as a normal random variable, whose mean is linear function of the predictors, b0 + b1*X1 + ... , and whose variance is constant. While generalized linear models extend the linear model in two ways. First, assumption of linearity in the parameters is relaxed, by introducing the link function. Second, error distributions other than the normal can be modeled.”

My Stat teacher never bothered to explain these things to us. He started the regression lesson with the equation Y = b0 + b1 * X1, and an example based on the Weight and Height relation. He never talked about these assumptions about normality and the variance.

As a result for quite some time, I treated this equation was an identity, similar to Assets= Liability + Equity. I have never understood what difference those underlying assumptions make.

Can anybody please explain me why these assumptions are required for this model, and what happens to the result of this model if these assumptions are violated?

Thanks,

MG.
 
Physics news on Phys.org
Hello,

The model is requires your residuals or errors to be independent and identically distributed as N(0,δ).

The reason for this assumptions is that for this model, a variability that fits is quantified by the model, the part that is explained by the variables + the part not explained (residuals).

The dependent variable is normally distributed because of the residuals.

If the assumptions are violated, then we cannot know if the model is able to predict part of the variability in the data.

You can check assumptions by using normal quantile plots (check normality) and residual plots (check constant variance).
 
Cylovenom,

Thanks.
 
It all depends on why you are estimating your model. If all you need is to obtain estimates of model parameters, then you don't need to worry about the distribution of Y or the properties of the error term.

If you need your estimation to be efficient (minimum variance), then you need to worry about whether the error term has constant variance -- if not, the simple regression (OLS) method will falter.

If you need to compute statistical tests based on your results, then you need to pay attention to whether Y is distributed normally as well as the properties of the error term.
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

Replies
3
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 7 ·
Replies
7
Views
3K