Are there Issues with Separation of Values in Ordinal Logistic Regression

In summary, there may be issues with separation in ordinal 3-valued logistic regression due to the need for the S-curve to go to 0 quickly. This can cause the Bo term to go to infinity. This problem may occur with small datasets or miscoded datasets, but there are ways to overcome it, such as using a hidden logistic or penalizing the maximum estimator. It may also be possible to slightly alter data values to avoid separation and still preserve the properties of the data.
  • #1
WWGD
Science Advisor
Gold Member
7,010
10,476
Hi all , just curious if someone knows of any issues of Separation of Points in Ordinal 3-valued
Logistic Regression. I think I have an idea of why there are issues with separation in binary
Logistic -- the need for the S-curve to go to 0 quickly makes the Bo term go to infinity. Are there
similar issues with 3-valued (or higher-valued) Logistic Regression?
 
  • #3
I'm not entirely clear what you mean by "Separation of Points". Whenever I hear "Separation" with regards to logistic regression, it deals with complete separation or quasi separation, which tends to occur with small dataset/miscoded datasets. The problem that causes this (MLE not existing) doesn't disappear in more general cases.

There's ways around that (sometimes), but I feel that we may be talking about two different things.
 
  • #4
MarneMath said:
I'm not entirely clear what you mean by "Separation of Points". Whenever I hear "Separation" with regards to logistic regression, it deals with complete separation or quasi separation, which tends to occur with small dataset/miscoded datasets. The problem that causes this (MLE not existing) doesn't disappear in more general cases.

There's ways around that (sometimes), but I feel that we may be talking about two different things.
Hi thanks for replying. Separation happens when there is a value Xo of the independent variable (obviously this applies to cases with numerica; variables) such that for all X>Xo all trials (Bernoulli or multinomial) are fails or all trials are successes. e.g., if Y dependent was "has Cancer" and X is number of cigarettes smoked per week, then X is separated if for, e.g., X>10 all are fails, i.e., everyone who smoked more than 10 cigarettes got cancer.
 
  • #5
Ok, then I think we are talking about he same thing. Then yes, separation is a problem even for higher orders. Most statistical packages are good at notifying you when this happens. One way around this is by using a penalizing the maximum estimator. I'm personally a fan of using a hidden logistic to overcome this when necessary.
 
  • #6
Just a followup on this: would it be reasonable, in the sense of not affecting "intrinsic" properties of a data set with separation of values with smallish size each, say in the range [0,5] , to slightly alter ; increase/decrease some of the data values , so as to overcome this issue, i.e., so that the values beyond a certain number are not monotone? Say my cutoff point for this data set within the [0,5] range is 3 and I have several points with value 3. Then I could change the data set to replace , in some cases, 3 by 3.02, in other cases 3 would be replaced by, say 2.98 , in order to avoid this problem? I just want to be able to model the probability of success by doing this; obviously, I would think, most of the properties of the data would be preserved by doing this?
 
Last edited:
  • Like
Likes Greg Bernhardt

1. What is ordinal logistic regression?

Ordinal logistic regression is a statistical method used to predict the relationship between a set of independent variables and an ordinal dependent variable. It is commonly used when the dependent variable has three or more ordered categories.

2. What is meant by the separation of values in ordinal logistic regression?

The separation of values in ordinal logistic regression refers to the phenomenon where one or more categories of the dependent variable are perfectly predicted by the independent variables. This means that there is no variability in the data for that particular category, making it difficult to estimate the model coefficients.

3. Why is separation of values an issue in ordinal logistic regression?

Separation of values can lead to unreliable and biased estimates of the model coefficients, which can result in incorrect conclusions being drawn from the data. It can also make it challenging to determine the significance of the independent variables in predicting the dependent variable.

4. How can separation of values be detected in ordinal logistic regression?

Separation of values can be detected by examining the variance inflation factor (VIF) and the condition number. A high VIF and a condition number close to or greater than 30 may indicate the presence of separation of values in the data.

5. How can separation of values be addressed in ordinal logistic regression?

There are several approaches that can be used to address separation of values in ordinal logistic regression, including the use of penalized likelihood methods, the removal of highly correlated variables, and the use of alternative modeling techniques such as Bayesian methods. It is important to carefully consider the specific characteristics of the data and the research question when choosing a method to address separation of values.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
736
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
15
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
Back
Top