What to do ordinal response variable?

FallenApple · Jun 23, 2017

So what if my response variable, y, is say a scale. For example, ranking, something like 1-10. How would I transform the response to make linear regression work?

andrewkirk · Jun 23, 2017

Do a glm where the response variable is uniformly distributed on [0,1] and the link function is the cdf of the standard normal.
Code response variable values 1,...,10 as 0.05, 0.15,...,0.95.

FallenApple · Jun 23, 2017

andrewkirk said:

Do a glm where the response variable is uniformly distributed on [0,1] and the link function is the cdf of the standard normal.
Code response variable values 1,...,10 as 0.05, 0.15,...,0.95.

Oh ok got it. But why the mid point, is it because the rankings are evenly spaced? so 1 becomes 0.05.

But what if its something else. Like grades? A,B,C,D,F then would I split it into five? So 100/5=20, then take the mid point so A=10, B=30, C=50, D=70, F=90?

So Prob( Z < transformed(y))= sum of regressors?

so in R it would be qnorm(new_y, 1,0)~x1+x2+...+xp ?

andrewkirk · Jun 23, 2017

I think the code would be something like

Code:

glm(y~x1+x2+ ... + xn, family=quasi(link = "probit", variance = "constant"))

But I am not completely sure about the variance argument. Unfortunately, the R documentation on the 'quasi' family is almost non-existent. Best to try it and see what happens.

FallenApple said:

So Prob( Z < transformed(y))= sum of regressors?

We need to apply ##\Phi## to the sum of regressors.

If you have an ordered factor variable fac and the vector of corresponding integer values is fac.int then the transformation would be

Code:

n<-length(levels(fac))
y.transformed<-  (2 * fac.int - 1) / (2 * n)

You could also try searching 'Ordinal regression', which is the term for what you are trying to do.

FallenApple · Jun 23, 2017

andrewkirk said:
I think the code would be something like
Code:
glm(y~x1+x2+ ... + xn, family=quasi(link = "probit", variance = "constant"))
But I am not completely sure about the variance argument. Unfortunately, the R documentation on the 'quasi' family is almost non-existent. Best to try it and see what happens.We need to apply ##\Phi## to the sum of regressors.

If you have an ordered factor variable fac and the vector of corresponding integer values is fac.int then the transformation would be
Code:
n<-length(levels(fac))
y.transformed<-  (2 * fac.int - 1) / (2 * n)
You could also try searching 'Ordinal regression', which is the term for what you are trying to do.

Oh ok. I'll research into the probit glm.What about dichotomizing it? Would that hurt?

For grades, I could do C-A as 1 for pass and D-F as 0 for fail. So it depends on context if I can do this?

andrewkirk · Jun 23, 2017

FallenApple said:

So it depends on context if I can do this?

Yes, it depends on what you are trying to achieve.

FallenApple · Jun 24, 2017

andrewkirk said:

Yes, it depends on what you are trying to achieve.

Well, mostly its just to make it simpler. But what is the tradeoff? If I dichotomize say grades, to pass fail? Would I lose power? Presumably, if there is an effect of say x on grades from level to level of grades, then there would be a cooresponding effect of grades from fail to pass and vice versa.

What to do ordinal response variable?

FAQ: What to do ordinal response variable?

What is an ordinal response variable?

How do I analyze ordinal response variables?

Can I treat an ordinal response variable as a continuous variable?

How do I handle missing data in ordinal response variables?

Are there any limitations to using ordinal response variables?

Similar threads

Hot Threads

Recent Insights