Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Statistics: Question about mild and extreme outliers.

  1. Feb 12, 2012 #1
    Statistics: Question about "mild" and "extreme" outliers.

    I am studying statistics, and have noticed the definitions of the mild and extreme outliers.

    Mild outlier: Between 1.5q and 3q away from the nearest quartile, where q denotes the interquartile range.

    Extreme outlier: More than 3q away from the nearest quartile, where q denotes the interquartile range.

    So usually when I see mathematical definitions, I like to see a reason for them.
    So I ask, are these numbers '1.5' and '3' chosen arbitrarily, or is there a real reason why these specific numbers were chosen?

    I thank you for your time.

  2. jcsd
  3. Feb 13, 2012 #2
    Re: Statistics: Question about "mild" and "extreme" outliers.

    In my old engineering text, the quantile plots -- of which a quartile is a particular case; The entire section on quantiles is prefixed with "Often the appearance of the sample will provide information about the distribution ...." emphasis, mine.
    Ronald E. Walpole, Raymond H. Meyers, "Probability and statistics for Engineers and Scientists, 5th edition".

    From what I understand, quantiles are more of a diagnostic tool to gain qualitative feel for the nature of the data. For example, there is a "median" in the quantile plots that estimates the "mean", but it isn't an efficient estimator -- but might suggest the data is very -- non bell curve/normal.

    A quartile is a quantile representing 25% of the samples taken from a monotonic ranking of sample magnitudes. eg:q(0.25), and therefore there are four natural quartiles, with the extreme ones being the 25%(lower quartile) and 75%(upper quartile) / 25th and 75th percentiles respectively.

    Since a quartile is going to be heavily influenced by the shape of the data-set, I *conjecture* what you are observing is an estimate based on the assumption that the quartiles in some way reflect standard deviations in a normal sampling distribution. eg: they are empirical values. However the authors do not give historical details and I am only speculating based on context.

    You might want to compare the idea of quantiles against another method known as "binning" or more commonly, using "cells". The method has more rigorous mathematics associated with it in the references I have come across. That might give you a way to compare and contrast the idea you are interested in; since you are "studying" in general, I assume.
    Good luck.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook