Assumptions behind the OLS regression model?

musicgold · May 6, 2009

Hi,

In many statistics textbooks I read the following text: “A models based on ordinary linear regression equation models Y, the dependent variable, as a normal random variable, whose mean is linear function of the predictors, b0 + b1*X1 + ... , and whose variance is constant. While generalized linear models extend the linear model in two ways. First, assumption of linearity in the parameters is relaxed, by introducing the link function. Second, error distributions other than the normal can be modeled.”

My Stat teacher never bothered to explain these things to us. He started the regression lesson with the equation Y = b0 + b1 * X1, and an example based on the Weight and Height relation. He never talked about these assumptions about normality and the variance.

As a result for quite some time, I treated this equation was an identity, similar to Assets= Liability + Equity. I have never understood what difference those underlying assumptions make.

Can anybody please explain me why these assumptions are required for this model, and what happens to the result of this model if these assumptions are violated?

Thanks,

MG.

Pyrrhus · May 14, 2009

Hello,

The model is requires your residuals or errors to be independent and identically distributed as N(0,δ).

The reason for this assumptions is that for this model, a variability that fits is quantified by the model, the part that is explained by the variables + the part not explained (residuals).

The dependent variable is normally distributed because of the residuals.

If the assumptions are violated, then we cannot know if the model is able to predict part of the variability in the data.

You can check assumptions by using normal quantile plots (check normality) and residual plots (check constant variance).

musicgold · May 14, 2009

Cylovenom,

Thanks.

Enuma_Elish · May 19, 2009

It all depends on why you are estimating your model. If all you need is to obtain estimates of model parameters, then you don't need to worry about the distribution of Y or the properties of the error term.

If you need your estimation to be efficient (minimum variance), then you need to worry about whether the error term has constant variance -- if not, the simple regression (OLS) method will falter.

If you need to compute statistical tests based on your results, then you need to pay attention to whether Y is distributed normally as well as the properties of the error term.

Assumptions behind the OLS regression model?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Graduate Expected numbers of cards of a last color remaining

Undergrad How do E[X] and E[|X|] relate?

Undergrad The problem of points

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect