Why Standard Deviation is Calculated Differently: Explained

phoenixthoth
Messages
1,600
Reaction score
2
Why is the standard deviation \sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n-1}} and not \sigma =\frac{\sum_{i=1}^{n}\left| \overline{x}-x_{i}\right| }{n} or at least \sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n}}?

I suppose that it has something to do with the normal distribution but I'm not sure in what way.

Thanks for the input.
 
Physics news on Phys.org
First, the "root-mean-square" is used because it coincides with our usual formula for distance between two points. "standard deviation" is basically a distance measure.
That said, the sum of the absolute values of coordinate differences CAN be used as a distance (as can "max of absolute value of coordinate differences") and can be used as a measure of "standard deviation". Since it's not the usual one, all your formulas (including that form normal distribution would have to be changed and that's a pain.

Secondly, the formula \sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n-1}} is for the standard deviation of a SAMPLE from some infinite population.

\sigma =\sqrt{\frac{\sum_{i=1}^{n}\left( \overline{x}-x_{i}\right) ^{2}}{n}} is correct for the standard deviation of a finite population.

There are technical reasons for the "n-1" (it gives an "unbiased estimator") but I like to think of it as just making the "spread" a little larger to reflect the fact that, since we are using a sample, not the entire population, we have more uncertainty.
 
Also, using squares to measure distance, instead of absolute values, tends to be significantly easier to manipulate. It also places the standard deviation (ok, ok, I mean the variance) among a class of things called the moments about the mean, where you use ^k for any positive k, instead of simply ^2.
 
Thanks for you input. Can you go over what an unbiased estimator is and what maximum likelihood estimators are? I'd really appreciate it. Thanks.
 
And also the use of n vs n-1 in the normal distribution too. Like if I wanted to assume data was normally distributed and use z-scores and stuff to make estimates on the probability of data lying within a range why use n or n-1 (and which do I use)... Thanks again.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.

Similar threads

Back
Top