Standard deviation versus absolute mean deviation

Click For Summary
SUMMARY

The discussion centers on the advantages of using Mean Absolute Deviation (MAD) over Standard Deviation (SD) in statistical analysis. Participants highlight that MAD is less sensitive to outliers and provides a more robust measure of variability in non-normally distributed data. A unique variation of deviation, referred to as the Gini Mean Difference, is also introduced, which calculates the average of absolute differences between all pairs of numbers. This method offers an alternative perspective on measuring dispersion, particularly in datasets with potential outliers.

PREREQUISITES
  • Understanding of statistical concepts such as Mean Absolute Deviation (MAD) and Standard Deviation (SD).
  • Familiarity with the Gini Mean Difference and its calculation methods.
  • Knowledge of data distribution types, particularly normal and non-normal distributions.
  • Basic proficiency in mathematical operations involving averages and absolute values.
NEXT STEPS
  • Research the applications and limitations of Mean Absolute Deviation in various statistical contexts.
  • Explore the Gini Mean Difference and its relevance in analyzing skewed data distributions.
  • Learn about the Median Absolute Deviation (MAD) and its advantages over traditional measures of variability.
  • Investigate the impact of outliers on Standard Deviation versus Mean Absolute Deviation in real-world datasets.
USEFUL FOR

Statisticians, data analysts, and researchers looking to enhance their understanding of variability measures, particularly in datasets prone to outliers or non-normal distributions.

Twinbee
Messages
116
Reaction score
0
What are the advantages of using the absolute mean deviation over the standard deviation. Is it possible to show a simple example where the former is more (or less) appropriate?

Also, related to the mean deviation is my own variation. Does it have a name? Instead of calculating the absolute differences from the mean for each number, my technique would instead find the average of all the absolute differences for each number against each other number.

So for example: given the numbers 3,7,7,19
Average is: 9
Absolute Mean deviation is: 5
My 'special' deviation is: 6

This is found thusly:

(|3-3| + |7-3| + |7-3| + |19-3| +
|3-7| + |7-7| + |7-7| + |19-7| +
|3-7| + |7-7| + |7-7| + |19-7| +
|3-19| + |7-19| + |7-19| + |19-19| ) / 16

= 6

As you can see, everything is compared against everything else. What do people here think? One could also remove the 3-3, 7-7, 7-7, and 19-19 bits, and then divide by 12 for a similar variation (results in 8 by the way).

Could this method be usefully applied in stats?
 
Physics news on Phys.org
Your variation is essentially the Gini Mean difference, if I understand your explanation correctly.

The Mean Absolute Deviation (MAD), which is

<br /> \frac 1 n \sum |x_i - \bar x |<br />

was proposed as an estimate of variation, but in the case of normally distributed data it is neither unbiased nor particularly efficient, compared to the usual estimates.

Note that there are other, better (more robust) measures of variability. The median absolute deviation (another MAD) is

<br /> MAD = \text{median}_i \left( | x_i - \text{median}_j (x_j)|\right)<br />

(The same name is also given to this estimate: 1.4826 MAD - the multiplication by 1.4826 makes this unbiased for \sigma in the normal distribution case. Here also MAD refers to the median absolute deviation.)
 
Ah, the Gini Mean Difference looks like the one. I wonder what applications it should be used for over SD or MD.

was proposed as an estimate of variation, but in the case of normally distributed data it is neither unbiased nor particularly efficient, compared to the usual estimates.
Interesting you say that it's biased. Doesn't it depend on the distribution? In that sense, the standard deviation would appear biased for evenly distributed data (non-normal).

There's this page which sings the praises of the MD over the SD and says it should be used in most cases of 'real data' where even slight errors may creep in. One other advantage apparently is that when outliers (long tail data) are squared, this creates bias, and the MD avoids this. Of course, I'm not sure how much all this is true, but here's the page:

http://www.leeds.ac.uk/educol/documents/00003759.htm

Thanks for the reply.
 
Last edited:
An old thread, but a goodie. My first post did indeed describe the Gini mean difference, but I described two different versions which I'll call GiniA and GiniB:

GiniA(3,7,7,19) = 8
GiniB(3,7,7,19) = 6

For the above values, 8 is the correct answer according to the standard Wikipedia definition. However, I think a 'better?' value may be 6 as GiniB shows, since it includes the missing 4 differences (3-3, 7-7, 7-7 and 19-19) and divides the result by n*n instead of n*(n-1). This is demonstrated in my first post as well (16 sums divided by 16 instead of by 12).

Is there any reason to believe GiniB could be useful? It seems natural to divide by 16, since one could say that each value is an 'error of itself' (i.e. no error) as well as errors of each other.
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 5 ·
Replies
5
Views
1K
  • · Replies 8 ·
Replies
8
Views
6K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K