Statistics: Question about mild and extreme outliers.

  • Thread starter Bipolarity
  • Start date
  • Tags
    Statistics
In summary, the definitions of mild and extreme outliers in statistics refer to values that are 1.5 to 3 times the interquartile range away from the nearest quartile. These numbers are not chosen arbitrarily, but are based on the concept of quantiles, which are used as a diagnostic tool to understand the nature of the data. The quartiles, which represent 25% of the data, are influenced by the shape of the data and can be compared to other methods such as binning.
  • #1
Bipolarity
776
2
Statistics: Question about "mild" and "extreme" outliers.

I am studying statistics, and have noticed the definitions of the mild and extreme outliers.

Mild outlier: Between 1.5q and 3q away from the nearest quartile, where q denotes the interquartile range.

Extreme outlier: More than 3q away from the nearest quartile, where q denotes the interquartile range.

So usually when I see mathematical definitions, I like to see a reason for them.
So I ask, are these numbers '1.5' and '3' chosen arbitrarily, or is there a real reason why these specific numbers were chosen?

I thank you for your time.

BiP
 
Physics news on Phys.org
  • #2


In my old engineering text, the quantile plots -- of which a quartile is a particular case; The entire section on quantiles is prefixed with "Often the appearance of the sample will provide information about the distribution ..." emphasis, mine.
Ronald E. Walpole, Raymond H. Meyers, "Probability and statistics for Engineers and Scientists, 5th edition".

From what I understand, quantiles are more of a diagnostic tool to gain qualitative feel for the nature of the data. For example, there is a "median" in the quantile plots that estimates the "mean", but it isn't an efficient estimator -- but might suggest the data is very -- non bell curve/normal.

A quartile is a quantile representing 25% of the samples taken from a monotonic ranking of sample magnitudes. eg:q(0.25), and therefore there are four natural quartiles, with the extreme ones being the 25%(lower quartile) and 75%(upper quartile) / 25th and 75th percentiles respectively.

Since a quartile is going to be heavily influenced by the shape of the data-set, I *conjecture* what you are observing is an estimate based on the assumption that the quartiles in some way reflect standard deviations in a normal sampling distribution. eg: they are empirical values. However the authors do not give historical details and I am only speculating based on context.

You might want to compare the idea of quantiles against another method known as "binning" or more commonly, using "cells". The method has more rigorous mathematics associated with it in the references I have come across. That might give you a way to compare and contrast the idea you are interested in; since you are "studying" in general, I assume.
Good luck.
 

What is an outlier in statistics?

An outlier is a data point that is significantly different from the other data points in a dataset. It is a value that is either much higher or much lower than the rest of the data, and it can affect the overall analysis and interpretation of the data.

What is a mild outlier?

A mild outlier is a data point that is slightly different from the rest of the data, but it is not significantly different enough to be considered an extreme outlier. It is typically within a few standard deviations from the mean of the dataset.

What is an extreme outlier?

An extreme outlier is a data point that is significantly different from the rest of the data. It is typically more than three standard deviations away from the mean of the dataset and can greatly affect the results of the analysis.

Why is it important to identify outliers in a dataset?

Identifying outliers is important because they can skew the results of statistical analysis and lead to inaccurate conclusions. Outliers can also indicate errors in the data collection process or unusual events that should be further investigated.

How can outliers be dealt with in statistical analysis?

There are several ways to deal with outliers in statistical analysis. They can be removed from the dataset, transformed, or winsorized (replaced with a less extreme value). The appropriate method will depend on the specific dataset and the goals of the analysis.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
297
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Biology and Medical
9
Replies
287
Views
18K
  • Sci-Fi Writing and World Building
2
Replies
37
Views
3K
  • High Energy, Nuclear, Particle Physics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
955
Back
Top