Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

A Outliers categorical data?

  1. Dec 3, 2016 #1
    Hello!

    I am working on a pre-analysis plan and have to specify what I am going to do with outliers. I have two categorical variables (5 levels and 2 levels) and I will be performing a chi-square test for independence.

    I thought of using a boxplot to detect outliers, but now I am not sure if it is even possible to have outliers in categorical data. You have such a small range, so a lot of variation in the data won't be possible. The only outlier I could think of is wrong data (data which falls outside the possible range due mistakes). I have looked online and in my statistic books, but was unable to find a solution, so I really hope someone here can help me out.

    To summarize, is it possible to have outliers in categorical data and if yes, how do I detect them?

    Thank you so much for your time and have a nice day!
     
  2. jcsd
  3. Dec 3, 2016 #2

    Dale

    Staff: Mentor

    You can't do a box and whiskers plot for categorical data. The idea of outliers doesn't make a lot of sense for that kind of data.

    However, what you do need is to make sure that each cell in your chi square test has a minimum of 5 expected counts.
     
  4. Dec 3, 2016 #3
    Thank you for your answer, very helpful! We will be checking the assumptions, thank you for mentioning it:)
     
  5. Dec 3, 2016 #4

    Dale

    Staff: Mentor

    You are welcome. Let us know if you have any follow up questions.
     
  6. Dec 4, 2016 #5

    MarneMath

    User Avatar
    Education Advisor

    Just to add onto Dale's response. If you find cells with an expectation less than 5, an alternative test you may use is the Fisher Exact Test. Also if 80% of the cells are above 5 and all cells are above 1, then a chi-square distribution can still be a good approximation for the p-value. The all cells above 5 rule and Fisher Exact Test are both conservative rules.
     
  7. Dec 5, 2016 #6

    chiro

    User Avatar
    Science Advisor

    Hey cynnetje.

    Have you ever done any sort of categorical analysis statistics at your university?

    There are specialized techniques used for categorical data and these are done in either a A-level statistics course [undergraduate] or a specialized course on categorical analysis [in graduate school].
     
  8. Dec 5, 2016 #7

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    In general you can have some sort of outliers with categorial data, but only if you have multiple variables. As an example, take 10 binary variables where all but one test persons have "1" in 0 to 2 of the variables, where this one test person has "1" in all 10 variables. That is clearly an outlier.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted