Statistics: Question about mild and extreme outliers.

  • Context: Undergrad 
  • Thread starter Thread starter Bipolarity
  • Start date Start date
  • Tags Tags
    Statistics
Click For Summary
SUMMARY

The discussion centers on the definitions of mild and extreme outliers in statistics, specifically defined as 1.5q and 3q away from the nearest quartile, respectively, where q represents the interquartile range. The choice of the numbers 1.5 and 3 is questioned, with insights suggesting they may not be arbitrary but rather empirical values reflecting standard deviations in a normal distribution. The conversation also touches on the role of quantiles as diagnostic tools and contrasts them with the binning method, which has more rigorous mathematical foundations.

PREREQUISITES
  • Understanding of interquartile range (IQR) in statistics
  • Familiarity with quartiles and their significance in data analysis
  • Basic knowledge of normal distribution and standard deviations
  • Awareness of statistical methods such as binning
NEXT STEPS
  • Research the mathematical foundations of interquartile range (IQR)
  • Study the properties and applications of quantiles in data analysis
  • Explore the binning method and its advantages over traditional outlier detection
  • Examine the historical context and empirical basis for the choice of outlier thresholds
USEFUL FOR

Statisticians, data analysts, and students studying statistics who seek to deepen their understanding of outlier detection methods and their theoretical underpinnings.

Bipolarity
Messages
773
Reaction score
2
Statistics: Question about "mild" and "extreme" outliers.

I am studying statistics, and have noticed the definitions of the mild and extreme outliers.

Mild outlier: Between 1.5q and 3q away from the nearest quartile, where q denotes the interquartile range.

Extreme outlier: More than 3q away from the nearest quartile, where q denotes the interquartile range.

So usually when I see mathematical definitions, I like to see a reason for them.
So I ask, are these numbers '1.5' and '3' chosen arbitrarily, or is there a real reason why these specific numbers were chosen?

I thank you for your time.

BiP
 
Physics news on Phys.org


In my old engineering text, the quantile plots -- of which a quartile is a particular case; The entire section on quantiles is prefixed with "Often the appearance of the sample will provide information about the distribution ..." emphasis, mine.
Ronald E. Walpole, Raymond H. Meyers, "Probability and statistics for Engineers and Scientists, 5th edition".

From what I understand, quantiles are more of a diagnostic tool to gain qualitative feel for the nature of the data. For example, there is a "median" in the quantile plots that estimates the "mean", but it isn't an efficient estimator -- but might suggest the data is very -- non bell curve/normal.

A quartile is a quantile representing 25% of the samples taken from a monotonic ranking of sample magnitudes. eg:q(0.25), and therefore there are four natural quartiles, with the extreme ones being the 25%(lower quartile) and 75%(upper quartile) / 25th and 75th percentiles respectively.

Since a quartile is going to be heavily influenced by the shape of the data-set, I *conjecture* what you are observing is an estimate based on the assumption that the quartiles in some way reflect standard deviations in a normal sampling distribution. eg: they are empirical values. However the authors do not give historical details and I am only speculating based on context.

You might want to compare the idea of quantiles against another method known as "binning" or more commonly, using "cells". The method has more rigorous mathematics associated with it in the references I have come across. That might give you a way to compare and contrast the idea you are interested in; since you are "studying" in general, I assume.
Good luck.
 

Similar threads

Replies
6
Views
4K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 3 ·
Replies
3
Views
11K
  • · Replies 53 ·
2
Replies
53
Views
3K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K