Why is standard deviation the preferred method for measuring dispersion?

striphe · Sep 7, 2010

I’m wondering why standard deviation is used as the main method of measuring dispersion, when i would consider that more ergonomic (user friendly) measurements are possible.

An example of such would be, the sum of |x-mean|/ n . I would think that the mean plus and minus this value would account for 50% of a normal distribution.

CRGreathouse · Sep 7, 2010

striphe said:

An example of such would be, the sum of |x-mean|/ n . I would think that the mean plus and minus this value would account for 50% of a normal distribution.

What is the x in this formula?

striphe · Sep 7, 2010

an observation value.

bpet · Sep 8, 2010

striphe said:

An example of such would be, the sum of |x-mean|/ n .

This would be the average absolute deviation, useful for distributions without a second moment but doesn't have the same linearity properties as variance (squared stdev). Other measures such as the median absolute deviation are useful for being more robust to outliers.

striphe · Sep 8, 2010

whats the advantages of stdev over average absolute deviation

statdad · Sep 8, 2010

Much of the reason the standard deviation and variance are prevalent is due to one thing: for a long time the notion that data comes from normally distributed populations ruled (some may argue it still rules). In probability/mathematical statistics, as soon as the form of the population distribution is assumed, certain statistics are "better" than others because they are motivated by the distribution itself. If you believe data are normally distributed
* the biased version of the sample variance is the maximum likelihood estimate of population variance
* the unbiased version is independent of the sample mean

The quantity

[tex]
\frac 1 n \sum_{i=1}^n |x_i - \overline x|
[/tex]

doesn't have a simple analog in the normal model, and as given is not an unbiased estimate of the standard deviation or variance. It is worth noting, however, that the question of whether the sample standard deviation or the average absolute deviation was the more appropriate measure of dispersion was more appropriate (the physicist eddington, and statistician fisher were involved). fisher's argument, again based on the assumption of normality, was considered better at the time.

This write-up
http://www.leeds.ac.uk/educol/documents/00003759.htm

may give you a better feel for the discussion.

striphe · Sep 8, 2010

The two main arguments for standard deviation:
(a) in perfect normal distributions conditions the sample SD is closer to the population SD
(b) SD is easier to manipulate than MD

Argument (a) is cut down rather easily by the paper, but argument (b) i am unsure why it exists. The absolute value of a figure is no different to squaring the figure than square rooting the figure. Using this logic there is minimal difference between the two

SD = ((x-mean)^2/n)^0.5
MD = (((x-mean)^2)^0.5)/n

To me they would be no difference in difficulty in working with these algebraically.

Am I right in suggesting that mean+- SD represents 50% of a normal distribution?

SW VandeCarr · Sep 8, 2010

=striphe;2872228

Am I right in suggesting that mean+- SD represents 50% of a normal distribution?

You are not right.

By Chebyshev's Theorem for at least 50% of the data lying within k standard deviations of the mean:

[tex]P(min_ x=0.5)=1-\frac{1}{k^2}=1-\frac{1}{2}[/tex]

So [tex]k=\sqrt{2}=1.414[/tex] SD of the mean

CRGreathouse · Sep 9, 2010

striphe said:

Am I right in suggesting that mean+- SD represents 50% of a normal distribution?

No, that would be 68.2% of a normal distribution.

SW VandeCarr · Sep 9, 2010

CRGreathouse said:

No, that would be 68.2% of a normal distribution.

Yes. That would be for the Standard Normal Distribution. Chebyshev's Theorem is more general and applies to any distribution. For the p(x)=0.682 Chebychev's Theorem gives approximately [tex]\sqrt{3}[/tex]. standard deviations.

http://www.philender.com/courses/intro/notes3/chebyshev.html

In other words, the normal assumption is not needed to refute the suggestion.

striphe · Sep 9, 2010

sorry I meant MD

statdad · Sep 9, 2010

striphe said:

sorry I meant MD

Since you specifically referred to the normal distribution in your most recent question, Chebyshev's theorem is not needed to answer any question about the percentage within one standard deviation of the mean - it wouldn't even be appropriate since, if you know or are assuming normality, you an leverage that to get the 68% value. (Whether normality is a reasonable assumption is an entirely different question.) However, even this is poorly worded.

* If you are discussing only the population, then you need to work with the parameters, and [tex] \mu \pm \sigma [/tex] contains roughly the central 68% of the distribution

* If you have a sample which you've deemed to be mound-shaped and symmetric in its distribution, then [tex] \overline x \pm sd [/tex] contains roughly 68% of the sample values

Now the population MD is [tex] \sigma \sqrt{\, \frac 2 {\pi}} [/tex] for a normal distribution, so (population again)

[tex]
\mu \pm MD \sqrt{\, \frac{\pi} 2}
[/tex]

will contain roughly the central 68% of the population. A similar comment could be made for the sample versions when the sample distribution is mound-shaped and symmetric

However, if the sample is skewed, there is not (that I know of) anything like Chebyshev's theorem for using [tex] \overline x [/tex] and MD. (The idea above won't work, since the
simple relationship between SD and MD doesn't hold without the normality assumption).

SW VandeCarr · Sep 9, 2010

statdad said:

* If you are discussing only the population, then you need to work with the parameters, and [tex] \mu \pm \sigma [/tex] contains roughly the central 68% of the distribution

* If you have a sample which you've deemed to be mound-shaped and symmetric in its distribution, then [tex] \overline x \pm sd [/tex] contains roughly 68% of the sample values

Now the population MD is [tex] \sigma \sqrt{\, \frac 2 {\pi}} [/tex] for a normal distribution, so (population again)

[tex]
\mu \pm MD \sqrt{\, \frac{\pi} 2}
[/tex]

will contain roughly the central 68% of the population. A similar comment could be made for the sample versions when the sample distribution is mound-shaped and symmetric

Did you mean:

[tex]AMD=\sigma \sqrt{\, \frac 2{\pi}}[/tex] ?

Then for 1 AMD: 0682*0.798=0.544 of the population.

statdad · Sep 9, 2010

Yes, it is true that
[tex]
MD = \sigma \sqrt{\, \frac{2}{\pi}}
[/tex]

This means that

[tex]
\sigma = MD \sqrt{\, \frac{\pi}2}
[/tex]

so that

[tex]
\mu \pm \sigma
[/tex]

is the same as

[tex]
\mu \pm MD \sqrt{\, \frac{\pi}{2}}
[/tex]

The percentage of area trapped between these limits does not get multiplied by the square root term.

In other words,

[tex]
MD \sqrt{\, \frac{\pi}{2}}
[/tex]

is simply another way of writing the population SD and can be used interchangeably with it.

SW VandeCarr · Sep 9, 2010

statdad said:

In other words,

[tex]
MD \sqrt{\, \frac{\pi}{2}}
[/tex]

is simply another way of writing the population SD and can be used interchangeably with it.

Yes, but the OP was asking ( after correcting him/herself in post 11) if 50% of the population was within 1 AMD of the mean with a normal distribution. In fact it's 54.4%. No?

EDIT: Also, shouldn't we talking about AMD, not MD? The latter is signed and is used in the calculation of covariance.

EDIT: BTW I misread the OPs post 7 and thought he/she was talking about the AMD anyway. I jumped to Chevychev's Theorem because I've never related the AMD to the normal distribution. I didn't know of the relation until you pointed it out and I checked it for myself.

statdad · Sep 9, 2010

SW VandeCarr said:

Yes, but the OP was asking ( after correcting him/herself in post 11) if 50% of the population was within 1 AMD of the mean with a normal distribution. In fact it's 54.4%. No?

EDIT: Also, shouldn't we talking about AMD, not MD? The latter is signed and is used in the calculation of covariance.

EDIT: BTW I misread the OPs post 7 and thought he/she was talking about the AMD anyway. I jumped to Chevychev's Theorem because I've never related the AMD to the normal distribution. I didn't know of the relation until you pointed it out and I checked it for myself.

I guess I've been using MD meaning absolute mean deviation - sorry.

The question in post 11: unless the AMD is modified to represent the standard deviation, I don't believe there is a simple way to provide an answer. That's why I wrote what I did.

striphe · Sep 9, 2010

I considered that 50% of all values lie on each side of the mean and that if one was to split a population in two so that there exists one set of values below the mean and one set above the mean, you could calculate a mean for each of them, which i will refer to as the half means.

The observations between the half means represent 50% of the population. This average mean deviation, is this distance divided by 2 and so i would have thought that if a population is not skewed, then 50% would be the correct value of how much of a population lies between mean+-MD.

SW VandeCarr · Sep 9, 2010

striphe said:

I considered that 50% of all values lie on each side of the mean and that if one was to split a population in two so that there exists one set of values below the mean and one set above the mean, you could calculate a mean for each of them, which i will refer to as the half means.

The observations between the half means represent 50% of the population. This average mean deviation, is this distance divided by 2 and so i would have thought that if a population is not skewed, then 50% would be the correct value of how much of a population lies between mean+-MD.

If you are talking about a distribution where the concept of a standard deviation applies (it doesn't apply to all distributions) then, using Chevychev's Theorem, 50% of observations will lie within 1.414 SD of the mean. That's the total of both sides of the mean.

In a standard normal distribution 68.2% of the area under the curve lies within 1 SD of the mean (both sides). If you want to use the AMD with the normal distribution I calculate that 55.4% of the area under the curve will lie within 1 AMD of the mean. See statdad's and my previous posts. I've never considered using AMD in hypothesis testing.

In my own experience, we only used the normal curve SD when a good symmetrical normal distribution of the data was at hand. Otherwise, some of us preferred using Chevychev's Theorem (CT) instead of normalization techniques. CT is more conservative for hypothesis testing in the tails of the distribution when the population is skewed. A two sided p=0.05 alpha requires only 1.96 SD for the normal distribution while the two sided CTSD requires 3.16 SD.

The two sided test is based on both tails having 0.05 of the area under the curve so the total area in the tails is 0.10 and the area between is 0.90; so with CT; [tex] Pr=1-\frac {1}{10}, SD=\sqrt 10.[/tex]

http://webusers.globale.net/josborne/Stats/ChebychevTheorem.PDF

striphe · Sep 11, 2010

Finally, does there exist terminology for these half means that i was talking about?

statdad · Sep 11, 2010

striphe said:

I considered that 50% of all values lie on each side of the mean and that if one was to split a population in two so that there exists one set of values below the mean and one set above the mean, you could calculate a mean for each of them, which i will refer to as the half means.

The observations between the half means represent 50% of the population. This average mean deviation, is this distance divided by 2 and so i would have thought that if a population is not skewed, then 50% would be the correct value of how much of a population lies between mean+-MD.

If I understand you correctly your situation defines the quartiles. As a very crude picture, imagine the data stretched along the number line.

Min Q1 Median Q3 Max

Q1 = first quartile (same as 25th percentile)
Q3 = third quartile (same as 75th percentile)

Then 25% of the data is between Min and Q1
25% of the data is between Q1 and median
25% of the data is between median and Q3
25% of the data is between Q3 and max

Does this sound like Q1 and Q3 are the half-means you are thinking of?

statdad · Sep 11, 2010

SW VandeCarr said:

If you are talking about a distribution where the concept of a standard deviation applies (it d

In my own experience, we only used the normal curve SD when a good symmetrical normal distribution of the data was at hand. Otherwise, some of us preferred using Chevychev's Theorem (CT) instead of normalization techniques. CT is more conservative for hypothesis testing in the tails of the distribution when the population is skewed. A two sided p=0.05 alpha requires only 1.96 SD for the normal distribution while the two sided CTSD requires 3.16 SD.

The two sided test is based on both tails having 0.05 of the area under the curve so the total area in the tails is 0.10 and the area between is 0.90; so with CT; [tex] Pr=1-\frac {1}{10}, SD=\sqrt 10.[/tex]

http://webusers.globale.net/josborne/Stats/ChebychevTheorem.PDF

I'd be very cautious in the use of Chebychev's theorem this way: even though you find the central 90% of the area with Chebyshev's rule, if the distribution is skewed there is no way at all to know that the two tails split the remaining 10% equally: it could be 2% and 8% or any other combination. If you assume it's equally split you are essentially assuming symmetry.

In fact, since Chebyshev's theorem gives a lower bound for the central area, the proper statement is that you've trapped at least 90% of the central area: if it's higher, there is less than 10% of the area in the tails, and your 90% confidence level goes by the wayside.

striphe · Sep 11, 2010

When i made my 50% assumption, I made the mistake of considering that 50% of observations always lie on each side of mean and than passed that on to these half means.

As median and mean are different, so are these half means from quartiles

SW VandeCarr · Sep 11, 2010

statdad said:

In fact, since Chebyshev's theorem gives a lower bound for the central area, the proper statement is that you've trapped at least 90% of the central area: if it's higher, there is less than 10% of the area in the tails, and your 90% confidence level goes by the wayside.

That's my point. We were much more concerned with alpha rather than beta error, so C's theorem in more conservative. We know at least 90% of the area is between the tails Moreover we know the shape of the distribution and which tail is in the direction of the alternative hypothesis.

statdad · Sep 11, 2010

SW VandeCarr said:

That's my point. We were much more concerned with alpha rather than beta error, so C's theorem in more conservative. We know at least 90% of the area is between the tails Moreover we know the shape of the distribution and which tail is in the direction of the alternative hypothesis.

Then I misunderstood, although I'm not sure I quite have it still. If you know which tail is of interest, why use a two-sided procedure? (Mostly rhetorical question here, not a burning issue: as long as you and your colleagues have sorted things out, there is no reason to use your time explaining to me.)

SW VandeCarr · Sep 11, 2010

statdad said:

Then I misunderstood, although I'm not sure I quite have it still. If you know which tail is of interest, why use a two-sided procedure? (Mostly rhetorical question here, not a burning issue: as long as you and your colleagues have sorted things out, there is no reason to use your time explaining to me.)

It's straightforward. If you have a skewed distribution in say a clinical trail population (not really a random sample of a population) you can normalize the distribution with some transformation. However some of us thought to use the C theorem instead of transforming data points (which can be problematic). It wasn't intended for publication, but to satisfy ourselves that we had solid statistical significance. Afaik, the C theorem doesn't provide for a one sided evaluation. We mainly focused on getting (for the actual trial data) least 95% between the tails ie:
[tex]\sqrt{20}= 4.472 SD[/tex].

winterfors · Sep 13, 2010

Concerning the initial question of why variance (and its square root, the standard deviation) is the most commonly used measure of uncertainty:

The use of variance is linked to the use of the mean of a random variable as an estimator of the variable, since the choosing an estimator that minimizes the expected squared distance to the true value is equivalent to choosing the mean value [itex]\mu[/itex] as estimator

[tex]\texttt{argmin}E\left[(X-\hat{x})^2\right] = \mu[/tex]

If you use some other measure of uncertainty, such as the expected absolute distance to the true value, the mean will no longer the estimator that minimizes uncertainty

Why is standard deviation the preferred method for measuring dispersion?

1. What is standard deviation and why is it important in measuring dispersion?

2. How is standard deviation calculated?

3. What are the advantages of using standard deviation over other measures of dispersion?

4. Is standard deviation affected by outliers in the data?

5. In what situations is standard deviation not the preferred method for measuring dispersion?

Similar threads

Hot Threads

Recent Insights