Are there Issues with Separation of Values in Ordinal Logistic Regression

WWGD · Aug 25, 2016

Hi all , just curious if someone knows of any issues of Separation of Points in Ordinal 3-valued
Logistic Regression. I think I have an idea of why there are issues with separation in binary
Logistic -- the need for the S-curve to go to 0 quickly makes the Bo term go to infinity. Are there
similar issues with 3-valued (or higher-valued) Logistic Regression?

MarneMath · Aug 30, 2016

I'm not entirely clear what you mean by "Separation of Points". Whenever I hear "Separation" with regards to logistic regression, it deals with complete separation or quasi separation, which tends to occur with small dataset/miscoded datasets. The problem that causes this (MLE not existing) doesn't disappear in more general cases.

There's ways around that (sometimes), but I feel that we may be talking about two different things.

WWGD · Aug 30, 2016

MarneMath said:

I'm not entirely clear what you mean by "Separation of Points". Whenever I hear "Separation" with regards to logistic regression, it deals with complete separation or quasi separation, which tends to occur with small dataset/miscoded datasets. The problem that causes this (MLE not existing) doesn't disappear in more general cases.

There's ways around that (sometimes), but I feel that we may be talking about two different things.

Hi thanks for replying. Separation happens when there is a value Xo of the independent variable (obviously this applies to cases with numerica; variables) such that for all X>Xo all trials (Bernoulli or multinomial) are fails or all trials are successes. e.g., if Y dependent was "has Cancer" and X is number of cigarettes smoked per week, then X is separated if for, e.g., X>10 all are fails, i.e., everyone who smoked more than 10 cigarettes got cancer.

MarneMath · Aug 30, 2016

Ok, then I think we are talking about he same thing. Then yes, separation is a problem even for higher orders. Most statistical packages are good at notifying you when this happens. One way around this is by using a penalizing the maximum estimator. I'm personally a fan of using a hidden logistic to overcome this when necessary.

WWGD · Jun 21, 2017

Just a followup on this: would it be reasonable, in the sense of not affecting "intrinsic" properties of a data set with separation of values with smallish size each, say in the range [0,5] , to slightly alter ; increase/decrease some of the data values , so as to overcome this issue, i.e., so that the values beyond a certain number are not monotone? Say my cutoff point for this data set within the [0,5] range is 3 and I have several points with value 3. Then I could change the data set to replace , in some cases, 3 by 3.02, in other cases 3 would be replaced by, say 2.98 , in order to avoid this problem? I just want to be able to model the probability of success by doing this; obviously, I would think, most of the properties of the data would be preserved by doing this?

Are there Issues with Separation of Values in Ordinal Logistic Regression

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect