Why Standard Deviation is Calculated Differently: Explained

In summary, the standard deviation is used because it coincides with our usual formula for distance between two points. "standard deviation" is basically a distance measure. The sum of the absolute values of coordinate differences CAN be used as a distance (as can "max of absolute value of coordinate differences") and can be used as a measure of "standard deviation". Since it's not the usual one, all your formulas (including that form normal distribution would have to be changed and that's a pain.
  • #1
phoenixthoth
1,605
2
Why is the standard deviation [tex]\sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n-1}}[/tex] and not [tex]\sigma =\frac{\sum_{i=1}^{n}\left| \overline{x}-x_{i}\right| }{n}[/tex] or at least [tex]\sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n}}[/tex]?

I suppose that it has something to do with the normal distribution but I'm not sure in what way.

Thanks for the input.
 
Physics news on Phys.org
  • #2
First, the "root-mean-square" is used because it coincides with our usual formula for distance between two points. "standard deviation" is basically a distance measure.
That said, the sum of the absolute values of coordinate differences CAN be used as a distance (as can "max of absolute value of coordinate differences") and can be used as a measure of "standard deviation". Since it's not the usual one, all your formulas (including that form normal distribution would have to be changed and that's a pain.

Secondly, the formula [tex]\sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n-1}}[/tex] is for the standard deviation of a SAMPLE from some infinite population.

[tex]\sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n}}[/tex] is correct for the standard deviation of a finite population.

There are technical reasons for the "n-1" (it gives an "unbiased estimator") but I like to think of it as just making the "spread" a little larger to reflect the fact that, since we are using a sample, not the entire population, we have more uncertainty.
 
  • #3
Also, using squares to measure distance, instead of absolute values, tends to be significantly easier to manipulate. It also places the standard deviation (ok, ok, I mean the variance) among a class of things called the moments about the mean, where you use ^k for any positive k, instead of simply ^2.
 
  • #4
Thanks for you input. Can you go over what an unbiased estimator is and what maximum likelihood estimators are? I'd really appreciate it. Thanks.
 
  • #5
And also the use of n vs n-1 in the normal distribution too. Like if I wanted to assume data was normally distributed and use z-scores and stuff to make estimates on the probability of data lying within a range why use n or n-1 (and which do I use)... Thanks again.
 

1. Why is standard deviation calculated differently?

Standard deviation is calculated differently because it takes into account the spread or variability of data points from the mean. This measure is used to understand how much the data deviates from the average, and different methods of calculation can provide more accurate results depending on the distribution of the data.

2. How is standard deviation calculated?

Standard deviation is calculated by finding the difference between each data point and the mean, squaring those differences, summing them, dividing by the number of data points, and then taking the square root of that value. This gives us a measure of the spread of the data from the mean.

3. What is the difference between sample and population standard deviation?

The main difference between sample and population standard deviation is the data set used in the calculation. Sample standard deviation is calculated using a subset of the data, while population standard deviation is calculated using all of the data. This is important because sample standard deviation tends to underestimate the true variability of the population.

4. When should I use which standard deviation calculation method?

The method of standard deviation calculation to use depends on the type of data and the purpose of the analysis. For normally distributed data, the standard deviation calculated using the formula based on the entire population is recommended. For skewed or non-normal data, the standard deviation calculated using the formula based on a sample may be more appropriate.

5. Can standard deviation be negative?

No, standard deviation cannot be negative. It is a measure of spread and represents the average distance of data points from the mean. Since distance cannot be negative, standard deviation can never be negative.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
900
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
884
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
980
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
983
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
810
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
Back
Top