A What to do ordinal response variable?

AI Thread Summary
When dealing with ordinal response variables like rankings or grades, transforming the response for linear regression can be achieved by using a generalized linear model (GLM) with a probit link function. For rankings from 1 to 10, values can be coded as 0.05 to 0.95, while grades can be split into categories and transformed accordingly. The transformation involves applying the cumulative distribution function (CDF) of the standard normal to the sum of regressors. Dichotomizing ordinal data, such as grades into pass/fail, may simplify analysis but risks losing information and statistical power. Ultimately, the choice of transformation or dichotomization should align with the research goals and context.
FallenApple
Messages
564
Reaction score
61
So what if my response variable, y, is say a scale. For example, ranking, something like 1-10. How would I transform the response to make linear regression work?
 
Physics news on Phys.org
Do a glm where the response variable is uniformly distributed on [0,1] and the link function is the cdf of the standard normal.
Code response variable values 1,...,10 as 0.05, 0.15,...,0.95.
 
andrewkirk said:
Do a glm where the response variable is uniformly distributed on [0,1] and the link function is the cdf of the standard normal.
Code response variable values 1,...,10 as 0.05, 0.15,...,0.95.
Oh ok got it. But why the mid point, is it because the rankings are evenly spaced? so 1 becomes 0.05.

But what if its something else. Like grades? A,B,C,D,F then would I split it into five? So 100/5=20, then take the mid point so A=10, B=30, C=50, D=70, F=90?

So Prob( Z < transformed(y))= sum of regressors?

so in R it would be qnorm(new_y, 1,0)~x1+x2+...+xp ?
 
I think the code would be something like

Code:
glm(y~x1+x2+ ... + xn, family=quasi(link = "probit", variance = "constant"))
But I am not completely sure about the variance argument. Unfortunately, the R documentation on the 'quasi' family is almost non-existent. Best to try it and see what happens.

FallenApple said:
So Prob( Z < transformed(y))= sum of regressors?
We need to apply ##\Phi## to the sum of regressors.

If you have an ordered factor variable fac and the vector of corresponding integer values is fac.int then the transformation would be

Code:
n<-length(levels(fac))
y.transformed<-  (2 * fac.int - 1) / (2 * n)

You could also try searching 'Ordinal regression', which is the term for what you are trying to do.
 
Last edited:
andrewkirk said:
I think the code would be something like

Code:
glm(y~x1+x2+ ... + xn, family=quasi(link = "probit", variance = "constant"))
But I am not completely sure about the variance argument. Unfortunately, the R documentation on the 'quasi' family is almost non-existent. Best to try it and see what happens.We need to apply ##\Phi## to the sum of regressors.

If you have an ordered factor variable fac and the vector of corresponding integer values is fac.int then the transformation would be

Code:
n<-length(levels(fac))
y.transformed<-  (2 * fac.int - 1) / (2 * n)

You could also try searching 'Ordinal regression', which is the term for what you are trying to do.
Oh ok. I'll research into the probit glm.What about dichotomizing it? Would that hurt?

For grades, I could do C-A as 1 for pass and D-F as 0 for fail. So it depends on context if I can do this?
 
FallenApple said:
So it depends on context if I can do this?
Yes, it depends on what you are trying to achieve.
 
andrewkirk said:
Yes, it depends on what you are trying to achieve.

Well, mostly its just to make it simpler. But what is the tradeoff? If I dichotomize say grades, to pass fail? Would I lose power? Presumably, if there is an effect of say x on grades from level to level of grades, then there would be a cooresponding effect of grades from fail to pass and vice versa.
 
Back
Top