Do outliers exist in categorical data and how can they be detected?

  • Context: Graduate 
  • Thread starter Thread starter cynnetje
  • Start date Start date
  • Tags Tags
    Chi square Data
Click For Summary
SUMMARY

Outliers in categorical data can exist, particularly when multiple variables are involved. The discussion emphasizes that traditional methods like boxplots are not applicable for categorical data. Instead, ensuring that each cell in a chi-square test has a minimum of 5 expected counts is crucial for valid results. If this condition is not met, the Fisher Exact Test serves as a suitable alternative.

PREREQUISITES
  • Understanding of chi-square tests for independence
  • Familiarity with categorical data analysis techniques
  • Knowledge of the Fisher Exact Test
  • Basic statistical concepts related to expected counts
NEXT STEPS
  • Research the assumptions of chi-square tests, focusing on expected counts
  • Learn about the Fisher Exact Test and its applications in categorical data
  • Explore specialized techniques for categorical data analysis
  • Study examples of outlier detection in multi-variable categorical datasets
USEFUL FOR

Statisticians, data analysts, and researchers involved in categorical data analysis who need to understand outlier detection and the appropriate statistical tests for their data.

cynnetje
Messages
2
Reaction score
0
Hello!

I am working on a pre-analysis plan and have to specify what I am going to do with outliers. I have two categorical variables (5 levels and 2 levels) and I will be performing a chi-square test for independence.

I thought of using a boxplot to detect outliers, but now I am not sure if it is even possible to have outliers in categorical data. You have such a small range, so a lot of variation in the data won't be possible. The only outlier I could think of is wrong data (data which falls outside the possible range due mistakes). I have looked online and in my statistic books, but was unable to find a solution, so I really hope someone here can help me out.

To summarize, is it possible to have outliers in categorical data and if yes, how do I detect them?

Thank you so much for your time and have a nice day!
 
Physics news on Phys.org
You can't do a box and whiskers plot for categorical data. The idea of outliers doesn't make a lot of sense for that kind of data.

However, what you do need is to make sure that each cell in your chi square test has a minimum of 5 expected counts.
 
  • Like
Likes   Reactions: cynnetje
Thank you for your answer, very helpful! We will be checking the assumptions, thank you for mentioning it:)
 
You are welcome. Let us know if you have any follow up questions.
 
Just to add onto Dale's response. If you find cells with an expectation less than 5, an alternative test you may use is the Fisher Exact Test. Also if 80% of the cells are above 5 and all cells are above 1, then a chi-square distribution can still be a good approximation for the p-value. The all cells above 5 rule and Fisher Exact Test are both conservative rules.
 
  • Like
Likes   Reactions: Dale
Hey cynnetje.

Have you ever done any sort of categorical analysis statistics at your university?

There are specialized techniques used for categorical data and these are done in either a A-level statistics course [undergraduate] or a specialized course on categorical analysis [in graduate school].
 
In general you can have some sort of outliers with categorial data, but only if you have multiple variables. As an example, take 10 binary variables where all but one test persons have "1" in 0 to 2 of the variables, where this one test person has "1" in all 10 variables. That is clearly an outlier.
 

Similar threads

  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 26 ·
Replies
26
Views
3K
  • · Replies 5 ·
Replies
5
Views
9K
  • · Replies 20 ·
Replies
20
Views
4K
  • · Replies 5 ·
Replies
5
Views
4K