Why Divide by n-2 in Least-Squares Error Variance Calculation?

  • Context: Undergrad 
  • Thread starter Thread starter Cyrus
  • Start date Start date
  • Tags Tags
    Error Variance
Click For Summary

Discussion Overview

The discussion revolves around the rationale for dividing by n-2 in the calculation of least-squares error variance in the context of simple linear regression. Participants explore the implications of degrees of freedom in statistical modeling and the assumptions underlying the calculations.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • Some participants suggest that dividing by n-2 accounts for the degrees of freedom lost when fitting a line to data, as two parameters (slope and intercept) are estimated.
  • Others argue that the validity of this approach depends on the statistical independence of errors, noting that low frequency noise can further reduce the effective degrees of freedom.
  • A participant questions the origin of the n-2 degrees of freedom, prompting references to the textbook for clarification on the derivation.
  • Another participant explains that the total variability in Y can be attributed to both deterministic and random components, and emphasizes the importance of degrees of freedom in calculating sums of squares.

Areas of Agreement / Disagreement

Participants express varying levels of understanding regarding the concept of degrees of freedom in this context. While some agree on the necessity of dividing by n-2, the discussion reveals uncertainty about the implications of statistical independence and the effects of noise on the degrees of freedom.

Contextual Notes

Participants reference specific sections of the textbook for derivations, indicating that the discussion may hinge on interpretations of those materials. There are also mentions of assumptions regarding error independence and the impact of noise, which remain unresolved.

Who May Find This Useful

This discussion may be useful for students and practitioners in statistics, particularly those interested in regression analysis and the underlying assumptions of statistical models.

Cyrus
Messages
3,246
Reaction score
17
In the textbook it says this:

http://img6.imageshack.us/img6/1896/imgcxv.jpg

Where does this hocus pokus 'it turns out that dividing by n-2 rather than n appropriately compensates for this' come from?
 
Last edited by a moderator:
Physics news on Phys.org
Cyrus said:
In the textbook it says this:

http://img6.imageshack.us/img6/1896/imgcxv.jpg

Where does this hocus pokus 'it turns out that dividing by n-2 rather than n appropriately compensates for this' come from?

You divide by n-2 because you only have n-2 degrees of freedom. Are you by chance doing a least mean fit on a line where you need two points to determine a line.

Anyway, the result is only valid if your errors are statistical independent. If there is low frequency noise then you have even less then n-2 degrees of freedom.
 
Last edited by a moderator:
John Creighto said:
You divide by n-2 because you only have n-2 degrees of freedom. Are you by chance doing a least mean fit on a line where you need two points to determine a line.

Anyway, the result is only valid if your errors are statistical independent. If there is low frequency noise then you have even less then n-2 degrees of freedom.

Why do I have n-2 degrees of freedom?
 
Cyrus said:
Why do I have n-2 degrees of freedom?

The book says that the formula you are questioning is derived in (7.21) of section (7.2), maybe post that part if you are confused.

You are trying to fit a line to the data. Their are two points required to define a a line. You have n data points. The number degrees of freedom is equal to the number of data points minus the number points you need to fit a curve. It is therefore n-2.
 
In simple linear regression two things go on.
First, you are expressing the mean value of [tex]Y[/tex] as linear function; this essentially says you are splitting [tex]Y[/tex] itself into two sources, a deterministic piece (the linear term) and a probabilistic term (the random error)

[tex] Y = \underbrace{\beta_0 + \beta_1 x}_{\text{Deterministic}} +\overbrace{\varepsilon}^{\text{Random}}[/tex]

When it comes to the ANOVA table, this also means that the total variability in [tex]Y[/tex] can be attributed to two sources: the deterministic portion and the random portion. It turns out that in this approach the variability in [tex]Y[/tex] can be broken into two sources. It is customary to do this with the sums of squares first. The basic notation used is

[tex] \begin{align*}<br /> SSE &= \sum (y-\widehat y)^2 \\<br /> SST & = \sum (y-\overline y)^2 \\<br /> SSR & = SST - SSE = \sum (\overline y - \widehat y)^2<br /> \end{align*}[/tex]

here
SST is the numerator of the usual sample variance of [tex]Y[/tex] - think of it as measuring the variability around the sample mean
SSE is the sum of the squared residuals - think of this as measuring the variability around
the regression line (which is another way of modeling the mean value of [tex]Y[/tex], when you think of it)
SSR is measures the error between the sample mean and the linear-regression predicted values

Every time you measure variability with a sum of squares like these, you have to worry about the appropriate degrees of freedom. Mathematically, these also add - just like the sums of squares do.

The ordinary sample variance has [tex]n - 1[/tex] degrees of freedom. Perhaps think of this because, in order to calculate this, you must have done [tex]1[/tex] calculation, the sample mean. Thinking this way, [tex]SSE[/tex] must have [tex]n - 2[/tex] degrees of freedom, since its calculation requires two pieces of work - the slope and the intercept.
This leaves [tex]1[/tex] degree of freedom for [tex]SSR[/tex].
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
6K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 31 ·
2
Replies
31
Views
3K
  • · Replies 8 ·
Replies
8
Views
12K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K