What to do ordinal response variable?

Click For Summary

Discussion Overview

The discussion revolves around the treatment of ordinal response variables in statistical modeling, particularly in the context of linear regression and generalized linear models (GLMs). Participants explore various methods for transforming ordinal data, such as rankings and letter grades, to fit these models effectively.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants suggest transforming ordinal response variables by coding them into a uniform distribution on [0,1] and using the cumulative distribution function (CDF) of the standard normal as a link function.
  • There is a proposal to use midpoint values for rankings, questioning whether this approach is valid for different types of ordinal data, such as letter grades.
  • One participant discusses the potential transformation of letter grades into numerical values, raising the question of how to appropriately split and code these grades for analysis.
  • Concerns are raised about the variance argument in the quasi family of GLMs, with a suggestion to experiment due to limited documentation.
  • Participants discuss the implications of dichotomizing ordinal data, particularly in the context of grades, questioning whether this simplification would result in a loss of statistical power.
  • There is a mention of searching for 'Ordinal regression' as a relevant term for the discussed methods.

Areas of Agreement / Disagreement

Participants express varying opinions on the best methods for transforming ordinal response variables, with no consensus reached on the optimal approach. The discussion remains unresolved regarding the trade-offs of dichotomizing data.

Contextual Notes

Limitations include the dependence on the context of the analysis and the potential loss of information when simplifying ordinal data. The discussion also highlights the need for clarity on the variance argument in the quasi family of GLMs.

FallenApple
Messages
564
Reaction score
61
So what if my response variable, y, is say a scale. For example, ranking, something like 1-10. How would I transform the response to make linear regression work?
 
Physics news on Phys.org
Do a glm where the response variable is uniformly distributed on [0,1] and the link function is the cdf of the standard normal.
Code response variable values 1,...,10 as 0.05, 0.15,...,0.95.
 
andrewkirk said:
Do a glm where the response variable is uniformly distributed on [0,1] and the link function is the cdf of the standard normal.
Code response variable values 1,...,10 as 0.05, 0.15,...,0.95.
Oh ok got it. But why the mid point, is it because the rankings are evenly spaced? so 1 becomes 0.05.

But what if its something else. Like grades? A,B,C,D,F then would I split it into five? So 100/5=20, then take the mid point so A=10, B=30, C=50, D=70, F=90?

So Prob( Z < transformed(y))= sum of regressors?

so in R it would be qnorm(new_y, 1,0)~x1+x2+...+xp ?
 
I think the code would be something like

Code:
glm(y~x1+x2+ ... + xn, family=quasi(link = "probit", variance = "constant"))
But I am not completely sure about the variance argument. Unfortunately, the R documentation on the 'quasi' family is almost non-existent. Best to try it and see what happens.

FallenApple said:
So Prob( Z < transformed(y))= sum of regressors?
We need to apply ##\Phi## to the sum of regressors.

If you have an ordered factor variable fac and the vector of corresponding integer values is fac.int then the transformation would be

Code:
n<-length(levels(fac))
y.transformed<-  (2 * fac.int - 1) / (2 * n)

You could also try searching 'Ordinal regression', which is the term for what you are trying to do.
 
Last edited:
andrewkirk said:
I think the code would be something like

Code:
glm(y~x1+x2+ ... + xn, family=quasi(link = "probit", variance = "constant"))
But I am not completely sure about the variance argument. Unfortunately, the R documentation on the 'quasi' family is almost non-existent. Best to try it and see what happens.We need to apply ##\Phi## to the sum of regressors.

If you have an ordered factor variable fac and the vector of corresponding integer values is fac.int then the transformation would be

Code:
n<-length(levels(fac))
y.transformed<-  (2 * fac.int - 1) / (2 * n)

You could also try searching 'Ordinal regression', which is the term for what you are trying to do.
Oh ok. I'll research into the probit glm.What about dichotomizing it? Would that hurt?

For grades, I could do C-A as 1 for pass and D-F as 0 for fail. So it depends on context if I can do this?
 
FallenApple said:
So it depends on context if I can do this?
Yes, it depends on what you are trying to achieve.
 
andrewkirk said:
Yes, it depends on what you are trying to achieve.

Well, mostly its just to make it simpler. But what is the tradeoff? If I dichotomize say grades, to pass fail? Would I lose power? Presumably, if there is an effect of say x on grades from level to level of grades, then there would be a cooresponding effect of grades from fail to pass and vice versa.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 21 ·
Replies
21
Views
3K