Normality of errors and residuals in ordinary linear regression

  • I
  • Thread starter fog37
  • Start date
  • Tags
    Errors
  • #1
fog37
1,568
108
TL;DR Summary
Checking for normality of errors and residuals in ordinary linear regression
Hello,
In reviewing the classical linear regression assumptions, one of the assumptions is that the residuals have a normal distribution...I also read that this assumption is not very critical and the residual don't really have to be Gaussian.
That said, the figure below show ##Y## values and their residuals with a normal distribution of equal variance at the ##X## value:

1704757020147.png


To check for residual normality, should we check the distribution of residuals at each ##X## (not very practical)? Instead, we usually plot a histogram of ALL the residuals at different X values...But that is not what the assumption is about (normality of residuals for each predictor ##X## value)...

Thank you...
 

Attachments

  • 1704756869482.png
    1704756869482.png
    11.7 KB · Views: 20
Physics news on Phys.org
  • #2
fog37 said:
one of the assumptions is that the residuals have a normal distribution...I also read that this assumption is not very critical
Critical for what? You should probably be careful about any probability or confidence intervals that come from a model where the random term is not normal.
fog37 said:
and the residual don't really have to be Gaussian.
There are glaring and common examples that violate that assumption. If all the ##Y## must be positive, then a lot of the negative normal tail might be missing. If the random variance is a percentage of the ##Y## values, then a log transformation should be looked at.
fog37 said:
That said, the figure below show ##Y## values and their residuals with a normal distribution of equal variance at the ##X## value:

View attachment 338295

To check for residual normality, should we check the distribution of residuals at each ##X## (not very practical)? Instead, we usually plot a histogram of ALL the residuals at different X values...But that is not what the assumption is about (normality of residuals for each predictor ##X## value)...
True. A lot depends on the subject matter expertise of the statistician. Does he have a valid reason to model the subject as a linear model with a random normal term?
 
  • Like
Likes fog37
  • #3
The assumption of a Gaussian error structure is not part of the basic regression assumptions. IF that assumption is added then things like the distributions of the LS estimates are exact rather than approximate as they are without it.

When the Gaussian assumption is made it is this: the error terms are i.i.d normal with mean 0 and variance sigma squared. This links to your picture of the bell curves superimposed on the regression line as follows:
- in this case Y1 through Yn are each normally distributed with mean b0 + bx1 and variance sigma squared
- the bell curves on the regression plot don't show the distribution of the errors, it is meant to show each normal distribution of the Y values

This leads to your question: we don't need to check the error distribution for each value of x since those values don't influence the error distributions: the error distributions are, as I mentioned above, i.i.d with mean 0 and constant variance, so the checks we use on them work

You should also remember this: there is no such thing as any data that is truly normally distributed: that is an ideal, and our checks are done simply to see whether our collected data's distribution is similar enough to that of the ideal to allow us to use normal-based calculations.
 
  • Like
Likes fog37 and FactChecker
  • #4
I believe, using Cochran's theorem, it justifies the distribution of the associated Anova statistics.
 

1. Why is the normality of errors important in linear regression?

The normality of errors is crucial in linear regression because it underpins many of the statistical tests used for hypothesis testing, such as determining the significance of coefficients. When the error terms are normally distributed, it ensures the validity of the t-tests and F-tests used in evaluating the regression model. This assumption allows for more reliable and interpretable results in inference statistics.

2. How can you check if the residuals are normally distributed?

There are several methods to check the normality of residuals in a regression model. Visual methods like Q-Q plots or histograms can provide a good overview. Analytically, tests such as the Shapiro-Wilk test, Anderson-Darling test, or Kolmogorov-Smirnov test can be used to formally assess the normality of residuals. These methods help in understanding whether the residuals deviate significantly from a normal distribution.

3. What are the implications of non-normal residuals in a regression analysis?

Non-normal residuals can indicate potential issues with the regression model such as the presence of outliers, incorrect model specification, or heteroscedasticity. This can lead to biased or inefficient estimates and can affect the reliability of confidence intervals and hypothesis tests. It is essential to address these issues to improve model accuracy and the validity of conclusions drawn from the analysis.

4. How can you address non-normality in residuals?

Addressing non-normality in residuals can be approached by either transforming the dependent variable (e.g., using a log transformation if the data is right-skewed) or by using different estimation techniques like robust regression or generalized linear models. Additionally, re-examining the model for omitted variable bias or incorrect functional forms can also help in resolving issues related to non-normal residuals.

5. Does non-normality of residuals invalidate the regression model?

Non-normality of residuals does not necessarily invalidate a regression model, especially for prediction purposes. However, it may affect the interpretation and the inference based on the model. For large sample sizes, the Central Limit Theorem often ensures that the parameter estimates are approximately normally distributed even if the residuals are not. Nonetheless, it is important to investigate and possibly correct for non-normality when the primary focus is on inference or when sample sizes are small.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
486
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
473
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
841
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
495
Back
Top