Why Standard Deviation is Calculated Differently: Explained

Click For Summary

Discussion Overview

The discussion centers on the different formulas for calculating standard deviation, specifically the use of n versus n-1 in the denominator. Participants explore the implications of these choices in the context of statistical theory and normal distribution.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Conceptual clarification

Main Points Raised

  • One participant questions why standard deviation is calculated using the formula \(\sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n-1}}\) instead of alternatives like \(\sigma =\frac{\sum_{i=1}^{n}\left| \overline{x}-x_{i}\right| }{n}\) or \(\sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n}}\), suggesting a connection to the normal distribution.
  • Another participant explains that the root-mean-square is used because it aligns with the usual formula for distance, and while other distance measures can be used, they would complicate the formulas associated with the normal distribution.
  • A different viewpoint emphasizes that the formula with n-1 is for a sample from an infinite population, while using n is appropriate for a finite population, highlighting the concept of an "unbiased estimator" and the increased uncertainty when sampling.
  • One participant notes that using squares for distance measurement simplifies manipulation and relates the standard deviation to moments about the mean.
  • Further inquiries are made about the definitions of unbiased estimators and maximum likelihood estimators, as well as the implications of using n versus n-1 in the context of normal distribution and z-scores.

Areas of Agreement / Disagreement

Participants express differing views on the appropriateness of using n versus n-1 in standard deviation calculations, and the discussion remains unresolved regarding the implications of these choices in statistical analysis.

Contextual Notes

Participants mention technical reasons for using n-1, including its role as an unbiased estimator, but do not fully resolve the implications of this choice or the definitions of related statistical concepts.

phoenixthoth
Messages
1,600
Reaction score
2
Why is the standard deviation [tex]\sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n-1}}[/tex] and not [tex]\sigma =\frac{\sum_{i=1}^{n}\left| \overline{x}-x_{i}\right| }{n}[/tex] or at least [tex]\sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n}}[/tex]?

I suppose that it has something to do with the normal distribution but I'm not sure in what way.

Thanks for the input.
 
Physics news on Phys.org
First, the "root-mean-square" is used because it coincides with our usual formula for distance between two points. "standard deviation" is basically a distance measure.
That said, the sum of the absolute values of coordinate differences CAN be used as a distance (as can "max of absolute value of coordinate differences") and can be used as a measure of "standard deviation". Since it's not the usual one, all your formulas (including that form normal distribution would have to be changed and that's a pain.

Secondly, the formula [tex]\sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n-1}}[/tex] is for the standard deviation of a SAMPLE from some infinite population.

[tex]\sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n}}[/tex] is correct for the standard deviation of a finite population.

There are technical reasons for the "n-1" (it gives an "unbiased estimator") but I like to think of it as just making the "spread" a little larger to reflect the fact that, since we are using a sample, not the entire population, we have more uncertainty.
 
Also, using squares to measure distance, instead of absolute values, tends to be significantly easier to manipulate. It also places the standard deviation (ok, ok, I mean the variance) among a class of things called the moments about the mean, where you use ^k for any positive k, instead of simply ^2.
 
Thanks for you input. Can you go over what an unbiased estimator is and what maximum likelihood estimators are? I'd really appreciate it. Thanks.
 
And also the use of n vs n-1 in the normal distribution too. Like if I wanted to assume data was normally distributed and use z-scores and stuff to make estimates on the probability of data lying within a range why use n or n-1 (and which do I use)... Thanks again.
 

Similar threads

  • · Replies 42 ·
2
Replies
42
Views
7K
  • · Replies 5 ·
Replies
5
Views
3K
Replies
1
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 28 ·
Replies
28
Views
3K