What is logistic regression: Definition + 14 Threads
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (the coefficients in the linear combination). Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable (two classes, coded by an indicator variable) or a continuous variable (any real value). The corresponding probability of the value labeled "1" can vary between 0 (certainly the value "0") and 1 (certainly the value "1"), hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.
Binary variables are widely used in statistics to model the probability of a certain class or event taking place, such as the probability of a team winning, of a patient being healthy, etc. (see § Applications), and the logistic model has been the most commonly used model for binary regression since about 1970. Binary variables can be generalized to categorical variables when there are more than two possible values (e.g. whether an image is of a cat, dog, lion, etc.), and the binary logistic regression generalized to multinomial logistic regression. If the multiple categories are ordered, one can use the ordinal logistic regression (for example the proportional odds ordinal logistic model). See § Extensions for further extensions. The logistic regression model itself simply models probability of output in terms of input and does not perform statistical classification (it is not a classifier), though it can be used to make a classifier, for instance by choosing a cutoff value and classifying inputs with probability greater than the cutoff as one class, below the cutoff as the other; this is a common way to make a binary classifier.
Analogous linear models for binary variables with a different sigmoid function instead of the logistic function (to convert the linear combination to a probability) can also be used, most notably the probit model; see § Alternatives. The defining characteristic of the logistic model is that increasing one of the independent variables multiplicatively scales the odds of the given outcome at a constant rate, with each independent variable having its own parameter; for a binary dependent variable this generalizes the odds ratio. More abstractly, the logistic function is the natural parameter for the Bernoulli distribution, and in this sense is the "simplest" way to convert a real number to a probability. In particular, it maximizes entropy (minimizes added information), and in this sense makes the fewest assumptions of the data being modeled; see § Maximum entropy.
The parameters of a logistic regression are most commonly estimated by maximum-likelihood estimation (MLE). This does not have a closed-form expression, unlike linear least squares; see § Model fitting. Logistic regression by MLE plays a similarly basic role for binary or categorical responses as linear regression by ordinary least squares (OLS) plays for scalar responses: it is a simple, well-analyzed baseline model; see § Comparison with linear regression for discussion. The logistic regression as a general statistical model was originally developed and popularized primarily by Joseph Berkson, beginning in Berkson (1944), where he coined "logit"; see § History.
If some variables of a logistic regression models are non significant, should they be considered for a risk index calculation?
Should the logistic model include only relevant variables?
Thanks for the attention.
Hi
I am trying to remember the name of the situation in logistic regression when all data points beyond a fixed one are all successes or all fails. So we have data points## ( a_{i1}, a_{i2},.., a_{in} , 0/1) ##, with data points ##a_{ij}##ordered; last input a Boolean and a fixed value for j...
I have a simple dataset that consists of one predictor Sex and one response variable Survived. Let's say the estimated coefficients are ##\hat{\beta_0}## and ##\hat{\beta_1}## for the intercept and the coefficient of Sex predictor, respectively. Mathematically this means that:
\hat{p}(X) =...
I was trying to find an easy interpretation of the predicted probabilities of a logistic regression model, when one of my coworkers claimed that the logistic regression model is a likelihood.
Now, I know that maximum likelihood estimation is used to estimate the parameters, but I didn't think...
Hi all, I have logistically- regressed 3 different numerical variables ,v1,v2,v3 separately against the same variable w . All variables have the same type of S-curve (meaning, in this case, that probabilities increase as vi ; i=1,2,3 increases ). Is there a way of somehow joining the three...
Hi all , just curious if someone knows of any issues of Separation of Points in Ordinal 3-valued
Logistic Regression. I think I have an idea of why there are issues with separation in binary
Logistic -- the need for the S-curve to go to 0 quickly makes the Bo term go to infinity. Are there...
Hello,
I remember an example of application of the logistic regression to medicine / epidemiology, which said (more or less) that the probability of a person having miocardial infarction was related to some variables such as age, cholesterol level, etc, and the equation included the various...
So I've been following through a online course in machine learning offered by Stanford university. I have been recently reading up on logistic regression and stochastic gradient ascent. Here is a link to the original notes: http://cs229.stanford.edu/notes/cs229-notes1.pdf (pages 16-19).
Here...
I am doing an independent research project and I have written a logistic regression program in SAS. The percent concordance is 97%, but hardly any variables are significant. Can anyone help me understand why this would happen?
I am doing some research and running a SAS program using logistic regression. The concordance is 99%, but hardly any variables are significant. Can anyone help me understand what this means?
Hi,
I am studying logistic regression and gradient ascent and have seen it used with a cost function and without one. Could anyone tell me why you would use a cost function? It seems just as effective without one.
alpha = .05
h = data * weights
error = labels - sigmoid(h)...
To determine whether schools are increasing the amount of students meeting or
exceeding standards, I obtained a MEAP database reporting the amount of students that
scored in the met or exceed standards range during years 2005-2009, for grades 3-5. I then
graphed the data for using numbers...
Hi
Is it possible to use both dummy variable regressors as well as categorical response variables in the same model? Consider the following model from cricket (you don't need to know the game to answer this question):
Y = mode of dismissal (0 = not out, 1 = bowled, 2 = caught, 3 =...
Hello all, I've performed a simple logistic regression and could use some help in interpretations/minor calculations.
Five different doses of insecticide were applied under standardized conditions to samples of an insect species. The data were:
Dose: ---2.6---3.8---5.1---7.7---10.2---
Dead...