Way of distinguishing the outliers

  • Thread starter sfspitfire23
  • Start date
In summary, the conversation discusses the challenge of separating typical and non-typical values in a set of data. Different methods, such as assuming normality and removing outliers, are mentioned but it is emphasized that expert knowledge and context are crucial for accurate analysis. It is also noted that a good understanding of the data generation process and obtaining a random sample are important for determining typical values. Consultation with an expert is recommended for effective statistical analysis.
  • #1
sfspitfire23
2
0
Hi guys,

Am hoping to tickle your guys' brains. I have a bunch of sets of data. Some are large (+50) some are quite small (less than 10). I would like a way to separate the elements that cluster closely to the mean of their respective set from the outliers of their set. The idea is to get a sense of what values are "typical" and which are "not typical" in each set. Put another way, which values are like the others and which are not like the others in the set.

I've tried assuming normality and taking all the values that fall within 10% of the mean as the "typical" values. Problem is that the distributions of my small data sets are far from normal.

Anyone have any suggestions? Anyone know a method that might work well?

Thanks
 
Physics news on Phys.org
  • #2
My suggestion is that you shouldn't get lobotomized by statistical procedures. If you have some expert knowledge about the data then bring it to bear, don't just thumb through a collection of statistical procedures.

Statistics must be based on applying probability. You can't apply it unless you know or assume some specific process for how the data is generated. For example, what are the sources of the "non typical" data points? Are they clerical errors where digits are transposed? Are they something like genetic mutations? or contamination of blood samples by improper handling?

Once you have a procedure to remove outliers, what are you going to do with the data that remains? What quantities will be estimated from the data? What decisions will be made based on the analysis?
 
  • #3
Hey sfspitfire23 and welcome to the forums.

To build on the good advice of Stephen, its important to put the data into context.

For example if you are measuring heights and you get a data value that's 500cm tall, then in the context of your data that will most likely be removed. It may actually be that someone really is 500cm tall and then you can't just throw it out because it is not erroneous. If however you have decided to focus purely on the common case and that data point screws up everything else, then yes you would remove it.

You have to do this kind of thing when you are evaluating the data: you can't just chuck data out because its an outlier. You have to analyze that in the context of your experiment and get an understanding of the kinds of ranges your data will take on, and of those ranges what makes sense in the context of what you are trying to do.

Also in terms of "typical" and "non-typical" cutting data points that have extremely high variance may not be the best way. The best way to determine is something is "typical" is to analyze the process and see if the data obtained is a good "random sample" which means that it represents the overall data rather highly in the context of your experiment.

This is why you need to consult with an expert as Stephen Tashi pointed out above: you have to do this with all statistical analyses if you want accurate results and inferences.
 

What is the "Way of distinguishing the outliers"?

The "Way of distinguishing the outliers" refers to the process of identifying and separating data points that deviate significantly from the rest of the data. These data points, known as outliers, can skew the results of statistical analyses and should be carefully analyzed and, if necessary, removed from the data set.

Why is it important to distinguish outliers?

Distinguishing outliers is important because they can significantly affect the results of statistical analyses and can lead to incorrect conclusions. By identifying and removing outliers, the data can be more accurately represented and analyzed.

What are some methods for distinguishing outliers?

Common methods for distinguishing outliers include graphical methods, such as box plots and scatter plots, as well as statistical methods like the Z-score and interquartile range. These methods can help identify data points that fall significantly outside of the expected range.

How do outliers affect data analysis?

Outliers can have a significant impact on data analysis by skewing the results and making it difficult to draw accurate conclusions. They can also affect measures of central tendency, such as the mean, making it important to identify and remove outliers before performing statistical analyses.

Can outliers be useful in data analysis?

In some cases, outliers can provide valuable insights and should not be removed from the data. For example, outliers may indicate unusual events or trends that could be important for understanding the data. However, it is important to carefully consider the impact of outliers and determine whether they should be included or removed from the analysis.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
777
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
389
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
995
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
822
Back
Top