Measures of Center/Spread in Categorical/Ordinal

  • A
  • Thread starter WWGD
  • Start date
In summary, there is confusion about whether mean and standard deviation can be used as measures of center and spread for ordinal variables. While some papers use these measures, it is not clear how reputable the journals are. It is also unclear if we are only considering binary variables or if there are more than two possible values. Other considerations include the use of median and mode for spread, the distinction between features and labels in supervised learning, and the potential use of entropy for discussing spread. Some prefer using CDFs and treating ordinal variables as cardinal, while others suggest reducing them to the boolean case.
  • #1
WWGD
Science Advisor
Gold Member
6,910
10,290
Hi All,
I am kind of confused about whether mean and sd can/are used as measures of center, spread respectively when dealing with Ordinal variables.
On one hand, this choice does not seem to reasonably support the interpretation. It seems that median and mode .OTOH, I have seen it used in some papers, although I am not sure how reputable are the journals where these papers were published. EDIT: Are Mode and Median the standard measures of center and spread for Ordinal Categorical variables?
 
Physics news on Phys.org
  • #2
Are we just talking ##\{0,1\}## or perhaps ##\{-1,1\}## or whatever binary encoding, or something more? It seemed odd when I first saw it, but I quite like ##\{-1,1\}## -- if the mix of positives and negatives is even, you get something with zero mean, which is nice.

In any case, if it is just bools, then there's one sort of inference we can do.

If its more than 2 possible values, how can median and mode give you measures of spread?

Other thoughts:

1.) are we talking about variables / features, or something different like labels (i.e. the thing you want to estimate in supervised learning)? I am hoping we're talking features / dimensions to the data not labels.

2.) Btw, I haven't done all that much in this regard, but one nice thing to talk about is entropy. All you need is probabilities / percentage mix for each of those ordinal values for a given feature-- and you get a pretty nice way of discussing spread / dispersion.

3.) I generally like CDFs and the median sounds fine in this regard. I'm not so sure about it being the 'center' though.

4.) sometimes reducing things to the boolean case is useful ('one hot encoding' is a term that certain people like to use). It's basically an indicator variable expansion. Once you have it in a suitable boolean setup, I think it's not uncommon to act as if the ranking is cardinal as opposed to ordinal... and again throw in a continuity relaxation and run regression or whatever on it.
 
Last edited:
  • Like
Likes WWGD

1. What is the difference between measures of center and measures of spread in categorical/ordinal data?

In categorical/ordinal data, measures of center refer to the central tendency of the data, such as the mode or median. These measures tell us where the data is concentrated. Measures of spread, on the other hand, describe the variability or dispersion of the data, such as the range or interquartile range. These measures tell us how spread out the data is.

2. How do you calculate the mode in categorical/ordinal data?

The mode in categorical/ordinal data is the most frequently occurring category or value. To calculate the mode, you can either count the number of times each category appears and choose the one with the highest frequency, or you can use a bar chart to visually identify the category with the highest bar.

3. What is the difference between median and mean in categorical/ordinal data?

The median in categorical/ordinal data is the middle value when the data is arranged in order. It is not affected by extreme values. The mean, on the other hand, is the average of all the values and can be heavily influenced by extreme values. In categorical/ordinal data, the mean may not be a meaningful measure of central tendency.

4. How do you measure spread in categorical/ordinal data?

In categorical/ordinal data, the most commonly used measure of spread is the range, which is the difference between the maximum and minimum values. Another measure is the interquartile range, which is the difference between the 75th and 25th percentiles. These measures can be visualized using a box plot.

5. Can you compare measures of center/spread between different categories in categorical/ordinal data?

Yes, measures of center/spread can be compared between different categories in categorical/ordinal data. This can be done by creating separate box plots for each category and comparing the measures of center and spread visually. It is important to keep in mind that these measures may not be as meaningful in categorical/ordinal data as they are in numerical data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
737
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
731
Replies
2
Views
740
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
980
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
4K
  • General Math
Replies
1
Views
1K
Replies
1
Views
782
Back
Top