Measures of Center/Spread in Categorical/Ordinal

  • Context: Graduate 
  • Thread starter Thread starter WWGD
  • Start date Start date
Click For Summary
SUMMARY

The discussion centers on the appropriateness of using mean and standard deviation as measures of center and spread for ordinal variables. Participants agree that median and mode are the standard measures for ordinal categorical variables, as mean and standard deviation do not adequately represent the data's interpretation. The conversation also touches on the utility of entropy and cumulative distribution functions (CDFs) for discussing spread, as well as the potential for one-hot encoding to treat ordinal data as cardinal in certain analyses.

PREREQUISITES
  • Understanding of ordinal variables and their characteristics
  • Familiarity with statistical measures: mean, median, mode, and standard deviation
  • Knowledge of entropy and its application in statistics
  • Experience with one-hot encoding in data preprocessing
NEXT STEPS
  • Research the application of median and mode in ordinal data analysis
  • Learn about entropy and how it quantifies uncertainty in categorical data
  • Explore cumulative distribution functions (CDFs) for visualizing ordinal data
  • Investigate the implications of one-hot encoding on ordinal versus cardinal data analysis
USEFUL FOR

Statisticians, data analysts, and machine learning practitioners who work with ordinal data and seek to understand appropriate measures of center and spread.

WWGD
Science Advisor
Homework Helper
Messages
7,778
Reaction score
13,019
Hi All,
I am kind of confused about whether mean and sd can/are used as measures of center, spread respectively when dealing with Ordinal variables.
On one hand, this choice does not seem to reasonably support the interpretation. It seems that median and mode .OTOH, I have seen it used in some papers, although I am not sure how reputable are the journals where these papers were published. EDIT: Are Mode and Median the standard measures of center and spread for Ordinal Categorical variables?
 
Physics news on Phys.org
Are we just talking ##\{0,1\}## or perhaps ##\{-1,1\}## or whatever binary encoding, or something more? It seemed odd when I first saw it, but I quite like ##\{-1,1\}## -- if the mix of positives and negatives is even, you get something with zero mean, which is nice.

In any case, if it is just bools, then there's one sort of inference we can do.

If its more than 2 possible values, how can median and mode give you measures of spread?

Other thoughts:

1.) are we talking about variables / features, or something different like labels (i.e. the thing you want to estimate in supervised learning)? I am hoping we're talking features / dimensions to the data not labels.

2.) Btw, I haven't done all that much in this regard, but one nice thing to talk about is entropy. All you need is probabilities / percentage mix for each of those ordinal values for a given feature-- and you get a pretty nice way of discussing spread / dispersion.

3.) I generally like CDFs and the median sounds fine in this regard. I'm not so sure about it being the 'center' though.

4.) sometimes reducing things to the boolean case is useful ('one hot encoding' is a term that certain people like to use). It's basically an indicator variable expansion. Once you have it in a suitable boolean setup, I think it's not uncommon to act as if the ranking is cardinal as opposed to ordinal... and again throw in a continuity relaxation and run regression or whatever on it.
 
Last edited:
  • Like
Likes   Reactions: WWGD

Similar threads

  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 33 ·
2
Replies
33
Views
2K
  • · Replies 0 ·
Replies
0
Views
971
  • · Replies 3 ·
Replies
3
Views
7K