A Do outliers exist in categorical data and how can they be detected?

  • A
  • Thread starter Thread starter cynnetje
  • Start date Start date
  • Tags Tags
    Chi square Data
cynnetje
Messages
2
Reaction score
0
Hello!

I am working on a pre-analysis plan and have to specify what I am going to do with outliers. I have two categorical variables (5 levels and 2 levels) and I will be performing a chi-square test for independence.

I thought of using a boxplot to detect outliers, but now I am not sure if it is even possible to have outliers in categorical data. You have such a small range, so a lot of variation in the data won't be possible. The only outlier I could think of is wrong data (data which falls outside the possible range due mistakes). I have looked online and in my statistic books, but was unable to find a solution, so I really hope someone here can help me out.

To summarize, is it possible to have outliers in categorical data and if yes, how do I detect them?

Thank you so much for your time and have a nice day!
 
Physics news on Phys.org
You can't do a box and whiskers plot for categorical data. The idea of outliers doesn't make a lot of sense for that kind of data.

However, what you do need is to make sure that each cell in your chi square test has a minimum of 5 expected counts.
 
  • Like
Likes cynnetje
Thank you for your answer, very helpful! We will be checking the assumptions, thank you for mentioning it:)
 
You are welcome. Let us know if you have any follow up questions.
 
Just to add onto Dale's response. If you find cells with an expectation less than 5, an alternative test you may use is the Fisher Exact Test. Also if 80% of the cells are above 5 and all cells are above 1, then a chi-square distribution can still be a good approximation for the p-value. The all cells above 5 rule and Fisher Exact Test are both conservative rules.
 
  • Like
Likes Dale
Hey cynnetje.

Have you ever done any sort of categorical analysis statistics at your university?

There are specialized techniques used for categorical data and these are done in either a A-level statistics course [undergraduate] or a specialized course on categorical analysis [in graduate school].
 
In general you can have some sort of outliers with categorial data, but only if you have multiple variables. As an example, take 10 binary variables where all but one test persons have "1" in 0 to 2 of the variables, where this one test person has "1" in all 10 variables. That is clearly an outlier.
 

Similar threads

Back
Top