# Assumptions behind the OLS regression model?

• musicgold
In summary, the conversation discusses the differences between ordinary linear regression models and generalized linear models. The assumptions of linearity in the parameters and normality of the dependent variable are relaxed in the latter, allowing for a wider range of error distributions. This is important for estimating model parameters, ensuring efficiency, and conducting statistical tests. Violating these assumptions can affect the accuracy and usefulness of the model.

#### musicgold

Hi,

In many statistics textbooks I read the following text: “A models based on ordinary linear regression equation models Y, the dependent variable, as a normal random variable, whose mean is linear function of the predictors, b0 + b1*X1 + ... , and whose variance is constant. While generalized linear models extend the linear model in two ways. First, assumption of linearity in the parameters is relaxed, by introducing the link function. Second, error distributions other than the normal can be modeled.”

My Stat teacher never bothered to explain these things to us. He started the regression lesson with the equation Y = b0 + b1 * X1, and an example based on the Weight and Height relation. He never talked about these assumptions about normality and the variance.

As a result for quite some time, I treated this equation was an identity, similar to Assets= Liability + Equity. I have never understood what difference those underlying assumptions make.

Can anybody please explain me why these assumptions are required for this model, and what happens to the result of this model if these assumptions are violated?

Thanks,

MG.

Hello,

The model is requires your residuals or errors to be independent and identically distributed as N(0,δ).

The reason for this assumptions is that for this model, a variability that fits is quantified by the model, the part that is explained by the variables + the part not explained (residuals).

The dependent variable is normally distributed because of the residuals.

If the assumptions are violated, then we cannot know if the model is able to predict part of the variability in the data.

You can check assumptions by using normal quantile plots (check normality) and residual plots (check constant variance).

Cylovenom,

Thanks.

It all depends on why you are estimating your model. If all you need is to obtain estimates of model parameters, then you don't need to worry about the distribution of Y or the properties of the error term.

If you need your estimation to be efficient (minimum variance), then you need to worry about whether the error term has constant variance -- if not, the simple regression (OLS) method will falter.

If you need to compute statistical tests based on your results, then you need to pay attention to whether Y is distributed normally as well as the properties of the error term.

## 1. What are the key assumptions behind the OLS regression model?

The key assumptions behind the OLS regression model are:

• Linearity: The relationship between the dependent and independent variables should be linear.
• Independence: The observations should be independent of each other.
• Normality: The residuals should follow a normal distribution.
• Homoscedasticity: The variance of the residuals should be constant across all values of the independent variables.
• No multicollinearity: The independent variables should not be highly correlated with each other.

## 2. What happens if the assumptions are violated?

If the assumptions behind the OLS regression model are violated, the estimated coefficients may be biased and the statistical significance of the model may be affected. Violations of the linearity assumption can lead to a non-linear relationship between the variables, while violations of the independence assumption can result in biased standard errors and inaccurate confidence intervals. Violations of the normality and homoscedasticity assumptions can affect the accuracy of the estimated coefficients and confidence intervals. Multicollinearity can also lead to unstable and unreliable coefficient estimates.

## 3. How can the assumptions be checked?

The assumptions behind the OLS regression model can be checked through various diagnostic tests and graphical methods. The most commonly used diagnostic tests include the Durbin-Watson test for autocorrelation, the Breusch-Pagan test for heteroscedasticity, and the Jarque-Bera test for normality of residuals. Graphical methods, such as scatter plots and residual plots, can also be used to visually inspect the assumptions.

## 4. What can be done if the assumptions are violated?

If the assumptions behind the OLS regression model are violated, there are several steps that can be taken to address the issue. These include transforming the variables to meet the assumption (e.g. taking logarithms), using a different regression model that is more appropriate for the data, or removing influential outliers or influential observations. In some cases, it may also be necessary to collect more data or revise the research question.

## 5. Is it possible to use the OLS regression model if the assumptions are violated?

In some cases, it may be possible to still use the OLS regression model even if the assumptions are violated. However, the results and conclusions drawn from the model should be interpreted with caution, as the violations can affect the accuracy and reliability of the estimated coefficients and confidence intervals. It is important to carefully assess the impact of the violations on the results and consider alternative models if necessary.