Normality of errors and residuals in ordinary linear regression

  • Context: Undergrad 
  • Thread starter Thread starter fog37
  • Start date Start date
  • Tags Tags
    Errors
Click For Summary

Discussion Overview

The discussion centers on the assumption of normality of errors and residuals in ordinary linear regression. Participants explore the implications of this assumption, its criticality, and the practical aspects of checking for normality in residuals, including the use of visualizations and theoretical justifications.

Discussion Character

  • Technical explanation
  • Debate/contested

Main Points Raised

  • Some participants note that while the normal distribution of residuals is a common assumption in linear regression, it may not be critical for all applications, particularly concerning the validity of probability or confidence intervals.
  • Others argue that there are significant examples where the normality assumption is violated, such as when the dependent variable must be positive, suggesting that transformations like log transformation may be necessary.
  • One participant clarifies that the Gaussian error structure is not part of the basic regression assumptions and that if this assumption is made, it allows for exact distributions of least squares estimates rather than approximate ones.
  • Another participant emphasizes that the error terms are independent and identically distributed (i.i.d) normal with mean 0 and constant variance, which means that checking the error distribution for each value of the predictor variable is unnecessary.
  • A later reply introduces Cochran's theorem as a justification for the distribution of associated ANOVA statistics, suggesting a connection to the broader statistical framework.

Areas of Agreement / Disagreement

Participants express differing views on the criticality of the normality assumption in regression analysis, with some suggesting it is not essential while others highlight its importance for certain statistical inferences. The discussion remains unresolved regarding the necessity and implications of this assumption.

Contextual Notes

Participants mention various conditions under which the normality assumption may be violated, such as the nature of the dependent variable and the potential need for transformations. There is also an acknowledgment that true normality is an idealization, and practical checks are aimed at assessing the similarity of collected data to this ideal.

fog37
Messages
1,566
Reaction score
108
TL;DR
Checking for normality of errors and residuals in ordinary linear regression
Hello,
In reviewing the classical linear regression assumptions, one of the assumptions is that the residuals have a normal distribution...I also read that this assumption is not very critical and the residual don't really have to be Gaussian.
That said, the figure below show ##Y## values and their residuals with a normal distribution of equal variance at the ##X## value:

1704757020147.png


To check for residual normality, should we check the distribution of residuals at each ##X## (not very practical)? Instead, we usually plot a histogram of ALL the residuals at different X values...But that is not what the assumption is about (normality of residuals for each predictor ##X## value)...

Thank you...
 

Attachments

  • 1704756869482.png
    1704756869482.png
    11.7 KB · Views: 118
Physics news on Phys.org
fog37 said:
one of the assumptions is that the residuals have a normal distribution...I also read that this assumption is not very critical
Critical for what? You should probably be careful about any probability or confidence intervals that come from a model where the random term is not normal.
fog37 said:
and the residual don't really have to be Gaussian.
There are glaring and common examples that violate that assumption. If all the ##Y## must be positive, then a lot of the negative normal tail might be missing. If the random variance is a percentage of the ##Y## values, then a log transformation should be looked at.
fog37 said:
That said, the figure below show ##Y## values and their residuals with a normal distribution of equal variance at the ##X## value:

View attachment 338295

To check for residual normality, should we check the distribution of residuals at each ##X## (not very practical)? Instead, we usually plot a histogram of ALL the residuals at different X values...But that is not what the assumption is about (normality of residuals for each predictor ##X## value)...
True. A lot depends on the subject matter expertise of the statistician. Does he have a valid reason to model the subject as a linear model with a random normal term?
 
  • Like
Likes   Reactions: fog37
The assumption of a Gaussian error structure is not part of the basic regression assumptions. IF that assumption is added then things like the distributions of the LS estimates are exact rather than approximate as they are without it.

When the Gaussian assumption is made it is this: the error terms are i.i.d normal with mean 0 and variance sigma squared. This links to your picture of the bell curves superimposed on the regression line as follows:
- in this case Y1 through Yn are each normally distributed with mean b0 + bx1 and variance sigma squared
- the bell curves on the regression plot don't show the distribution of the errors, it is meant to show each normal distribution of the Y values

This leads to your question: we don't need to check the error distribution for each value of x since those values don't influence the error distributions: the error distributions are, as I mentioned above, i.i.d with mean 0 and constant variance, so the checks we use on them work

You should also remember this: there is no such thing as any data that is truly normally distributed: that is an ideal, and our checks are done simply to see whether our collected data's distribution is similar enough to that of the ideal to allow us to use normal-based calculations.
 
  • Like
Likes   Reactions: fog37 and FactChecker
I believe, using Cochran's theorem, it justifies the distribution of the associated Anova statistics.
 

Similar threads

  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 11 ·
Replies
11
Views
2K
Replies
3
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K