What to do ordinal response variable?

In summary: So presumably, if there is no effect of x on grades from level to level of grades, then there would be no effect of dichotomizing.The tradeoff is that if there is no effect of x on grades from level to level of grades, then there would be no effect of dichotomizing.
  • #1
FallenApple
566
61
So what if my response variable, y, is say a scale. For example, ranking, something like 1-10. How would I transform the response to make linear regression work?
 
Physics news on Phys.org
  • #2
Do a glm where the response variable is uniformly distributed on [0,1] and the link function is the cdf of the standard normal.
Code response variable values 1,...,10 as 0.05, 0.15,...,0.95.
 
  • #3
andrewkirk said:
Do a glm where the response variable is uniformly distributed on [0,1] and the link function is the cdf of the standard normal.
Code response variable values 1,...,10 as 0.05, 0.15,...,0.95.
Oh ok got it. But why the mid point, is it because the rankings are evenly spaced? so 1 becomes 0.05.

But what if its something else. Like grades? A,B,C,D,F then would I split it into five? So 100/5=20, then take the mid point so A=10, B=30, C=50, D=70, F=90?

So Prob( Z < transformed(y))= sum of regressors?

so in R it would be qnorm(new_y, 1,0)~x1+x2+...+xp ?
 
  • #4
I think the code would be something like

Code:
glm(y~x1+x2+ ... + xn, family=quasi(link = "probit", variance = "constant"))
But I am not completely sure about the variance argument. Unfortunately, the R documentation on the 'quasi' family is almost non-existent. Best to try it and see what happens.

FallenApple said:
So Prob( Z < transformed(y))= sum of regressors?
We need to apply ##\Phi## to the sum of regressors.

If you have an ordered factor variable fac and the vector of corresponding integer values is fac.int then the transformation would be

Code:
n<-length(levels(fac))
y.transformed<-  (2 * fac.int - 1) / (2 * n)

You could also try searching 'Ordinal regression', which is the term for what you are trying to do.
 
Last edited:
  • #5
andrewkirk said:
I think the code would be something like

Code:
glm(y~x1+x2+ ... + xn, family=quasi(link = "probit", variance = "constant"))
But I am not completely sure about the variance argument. Unfortunately, the R documentation on the 'quasi' family is almost non-existent. Best to try it and see what happens.We need to apply ##\Phi## to the sum of regressors.

If you have an ordered factor variable fac and the vector of corresponding integer values is fac.int then the transformation would be

Code:
n<-length(levels(fac))
y.transformed<-  (2 * fac.int - 1) / (2 * n)

You could also try searching 'Ordinal regression', which is the term for what you are trying to do.
Oh ok. I'll research into the probit glm.What about dichotomizing it? Would that hurt?

For grades, I could do C-A as 1 for pass and D-F as 0 for fail. So it depends on context if I can do this?
 
  • #6
FallenApple said:
So it depends on context if I can do this?
Yes, it depends on what you are trying to achieve.
 
  • #7
andrewkirk said:
Yes, it depends on what you are trying to achieve.

Well, mostly its just to make it simpler. But what is the tradeoff? If I dichotomize say grades, to pass fail? Would I lose power? Presumably, if there is an effect of say x on grades from level to level of grades, then there would be a cooresponding effect of grades from fail to pass and vice versa.
 

FAQ: What to do ordinal response variable?

What is an ordinal response variable?

An ordinal response variable is a type of variable that represents categories or levels that have a natural order or ranking. This can include variables such as education level, income bracket, or Likert scale responses.

How do I analyze ordinal response variables?

There are several methods for analyzing ordinal response variables, including ordinal logistic regression, proportional odds models, and cumulative link models. These methods take into account the ordered nature of the variable and allow for the interpretation of results in terms of the odds or probabilities of moving from one category to another.

Can I treat an ordinal response variable as a continuous variable?

It is generally not recommended to treat an ordinal response variable as continuous, as this can lead to incorrect interpretation of results. Ordinal variables have distinct categories with a natural ordering, whereas continuous variables have a continuous range of values. Treating an ordinal variable as continuous can also lead to issues with assumptions of statistical tests.

How do I handle missing data in ordinal response variables?

Missing data in ordinal response variables can be handled using various methods, such as multiple imputation or maximum likelihood estimation. The appropriate method will depend on the amount and pattern of missing data, as well as the specific analysis being conducted. It is important to carefully consider the best approach for handling missing data to ensure accurate and unbiased results.

Are there any limitations to using ordinal response variables?

While ordinal response variables can be useful in representing ordered categories, they do have some limitations. These variables do not have equal intervals between categories, making it difficult to make precise numerical comparisons. Additionally, the number of categories may be limited, which can affect the sensitivity of the analysis.

Similar threads

Replies
5
Views
2K
Replies
3
Views
1K
Replies
5
Views
1K
Replies
30
Views
3K
Back
Top