Assumptions behind the OLS regression model?

AI Thread Summary
The discussion centers on the assumptions behind the Ordinary Least Squares (OLS) regression model, particularly the need for residuals to be independent and normally distributed. It highlights that these assumptions ensure the model can accurately predict variability in the data. If the assumptions are violated, the model's predictive power and the reliability of statistical tests may be compromised. The conversation also notes that while parameter estimates can still be obtained without adhering to these assumptions, efficiency and validity of tests are affected when they are not met. Understanding these foundational concepts is crucial for effective application of regression analysis.
musicgold
Messages
303
Reaction score
19
Hi,

In many statistics textbooks I read the following text: “A models based on ordinary linear regression equation models Y, the dependent variable, as a normal random variable, whose mean is linear function of the predictors, b0 + b1*X1 + ... , and whose variance is constant. While generalized linear models extend the linear model in two ways. First, assumption of linearity in the parameters is relaxed, by introducing the link function. Second, error distributions other than the normal can be modeled.”

My Stat teacher never bothered to explain these things to us. He started the regression lesson with the equation Y = b0 + b1 * X1, and an example based on the Weight and Height relation. He never talked about these assumptions about normality and the variance.

As a result for quite some time, I treated this equation was an identity, similar to Assets= Liability + Equity. I have never understood what difference those underlying assumptions make.

Can anybody please explain me why these assumptions are required for this model, and what happens to the result of this model if these assumptions are violated?

Thanks,

MG.
 
Physics news on Phys.org
Hello,

The model is requires your residuals or errors to be independent and identically distributed as N(0,δ).

The reason for this assumptions is that for this model, a variability that fits is quantified by the model, the part that is explained by the variables + the part not explained (residuals).

The dependent variable is normally distributed because of the residuals.

If the assumptions are violated, then we cannot know if the model is able to predict part of the variability in the data.

You can check assumptions by using normal quantile plots (check normality) and residual plots (check constant variance).
 
Cylovenom,

Thanks.
 
It all depends on why you are estimating your model. If all you need is to obtain estimates of model parameters, then you don't need to worry about the distribution of Y or the properties of the error term.

If you need your estimation to be efficient (minimum variance), then you need to worry about whether the error term has constant variance -- if not, the simple regression (OLS) method will falter.

If you need to compute statistical tests based on your results, then you need to pay attention to whether Y is distributed normally as well as the properties of the error term.
 
I was reading documentation about the soundness and completeness of logic formal systems. Consider the following $$\vdash_S \phi$$ where ##S## is the proof-system making part the formal system and ##\phi## is a wff (well formed formula) of the formal language. Note the blank on left of the turnstile symbol ##\vdash_S##, as far as I can tell it actually represents the empty set. So what does it mean ? I guess it actually means ##\phi## is a theorem of the formal system, i.e. there is a...
Back
Top