How Do Generalized Linear Models Extend Beyond Standard Linear Regression?

In summary: The affine function of the ##x_j##s inside the big parenthesis on the RHS is called the Linear Predictor.The estimation process, which typically uses Maximum Likelihood, involves estimating ##a_0## to ##a_m##, plus any free parameters of the distribution of the ##Y_j##. All parameters of the distribution are required to be constant across possible values of the ##x_j##s, except for the mean. The mean effectively specifies one parameter, so if the distribution family has only one parameter, such as Bernoulli or Poisson, there are no more parameters to be estimated. For the common case of two-distribution families like the normal or logn
  • #1
WWGD
Science Advisor
Gold Member
7,010
10,470
Hi,
Just wanted to see if I understood the meaning of Generalized Linear Models:
In the case of Standard ( "Non-generalized") Linear models, a dependent variable y is a linear function of a dependent variable x. In a Generalized Linear Model (GLM), a dependent variable y is linear in some function L of the variable x, i.e., y is linear in L(x)? I am thinking of the example case of Logistic Regression as an example of GLM where y depends linearly on the log-odds : Log(p/(1-p))? Do we have multilinear cases? Is this accurate, and if so, complete, essentially?
Thanks.
EDIT: Also, is there some result tellings us given data pairs ##\{(x_i,y_i) \}; y_i \~ x_i##, i.e., y depends on x, when we can find a function L so that y is linear in L(x)?
 
Physics news on Phys.org
  • #2
WWGD said:
In a Generalized Linear Model (GLM), a dependent variable y is linear in some function L of the variable x, i.e., y is linear in L(x)? I
Not quite. that would mean applying the nonlinear function L to every independent variable where there is more than one of them, which is not what happens. What actually happens is that a nonlinear function is applied to an affine combination (affine means linear but with a constant term (intercept) included*) of the independent variables. That function is the inverse of what's called the link function, usually denoted by ##g##.

The model is
$$
g\left(E[y_j]\right) = a_0 + \sum_{k=1}^m a_k x_{kj}
$$
and the task is to estimate ##a_0,...,a_m## and the free parameter(s) of the distribution of the independent error terms ##\epsilon_j##.

The other way that GLM differs from standard linear regression is that the error terms ##\epsilon_j## need not be normally distributed. The assumed distribution family of ##y_j## needs to be specified - eg normal, lognormal, Bernoulli, gamma. To ascertain which distribution in a given family it is - ie what the parameters of the distribution are - one parameter is fixed by the requirement that ##g\left(E[y_j]\right)=a_0 + \sum_{k=1}^m a_k x_{kj}##, and the other parameters are estimated in the regression process.

In logistic regression the link function ##g## is the logit function (the inverse of the logistic function) and ##g(y_j)## is assumed to have a Bernoulli distribution. Since Bernoulli is a single-parameter distribution there are no additional parameters to be estimated for that distribution.

* It can be treated as a pure linear combination by the addition of a dummy independent variable ##x_{0j}## whose value is always 1.
 
Last edited:
  • Like
Likes WWGD
  • #3
Thanks, but it seems like g(Yi) itself affine on the xi, isn't it?
 
  • #4
WWGD said:
it seems like g(Yi) itself affine on the xi, isn't it?
No, because the error term ##\epsilon_j## prevents that. But it is the case that the expected value ##E[y_j]## is an affine function of the ##x_{kj}## because

$$g\left(E[y_j]\right) = a_0 + \sum_{k=1}^m a_k x_{kj} $$

EDIT: I corrected an error in this: I originally had ##E[y_j]## on the LHS of that equation and it should have been ##g\left(E[y_j]\right)##.
 
Last edited:
  • Like
Likes WWGD
  • #5
Ok. thanks again, just so I can organize it internally: Then, up to error term we do have affiness , I.e. for g(Yi)-##\epsilon _i##? Also, don't we also have E(## \epsilon_i ##)=0?
 
  • #6
Is the affiness of E(##y_i##) that makes the linear part of the generalized linear model?
 
  • #7
On second thoughts, maybe the use of epsilons ##\epsilon_j## was not helpful. They're not wrong, but I'm coming to the view that presenting GLMs that way makes it less intuitive. So let me try again and see if I do a better job this time.

For a GLM we specify two things, a distribution family for the y observations, and a link function ##g##.

The model is that each observation ##Y_j## is a random draw from a distribution from the specified family, whose mean ##\mu_j=E[Y_j]## satisfies the equation ##g(\mu_j)=a_0 + \sum_{k=1}^m a_k x_{kj}##. So we have:

$$\mu_j = E[Y_j] = g^{-1}\left(a_0 + \sum_{k=1}^m a_k x_{kj}\right)$$

The affine function of the ##x_j##s inside the big parenthesis on the RHS is called the Linear Predictor.

The estimation process, which typically uses Maximum Likelihood, involves estimating ##a_0## to ##a_m##, plus any free parameters of the distribution of the ##Y_j##. All parameters of the distribution are required to be constant across possible values of the ##x_j##s, except for the mean. The mean effectively specifies one parameter, so if the distribution family has only one parameter, such as Bernoulli or Poisson, there are no more parameters to be estimated. For the common case of two-distribution families like the normal or lognormal, there will be an additional parameter to estimate - the standard deviation for the normal dist - and that will be estimated as part of the GLM process as well. For a three-parameter family (eg Generalised Pareto, although I haven't seen them used in GLM) there will be two parameters to be estimated in addition to the ##a_j##s.

The above is all for the case where the response variable ##y## is scalar. It gets a bit more complex if ##y## is a vector.

With that re-framing of the background, the answers to your latest two questions are

WWGD said:
Ok. thanks again, just so I can organize it internally: Then, up to error term we do have affiness , I.e. for g(Yi)-##\epsilon _i##? Also, don't we also have E(## \epsilon_i ##)=0?
It is ##g\left(E[Y_j]\right)##, not ##g\left(Y_j\right)## that is an affine function of the ##x_j##s. I wrongly stated that above as ##E[g(Y_j)]##.
WWGD said:
Is the affiness of E(##y_i##) that makes the linear part of the generalized linear model?
The linear aspect of the model is the linear predictor.

A well-known example of GLM is logistic regression, in which the distribution family is Bernoulli and the link function ##g## is logit, the inverse of the logistic function. So the logistic function is applied to the linear predictor, which is the estimated log-odds, to obtain the probability ##\mu_j## given ##x_1,...,x_m##.

Another example is ordinary linear regression, in which the link function is the identity function ##g(x)=x## and the distribution family is normal.
 
  • Like
Likes WWGD and Tosh5457
  • #8
The main difference is that GLM allows for error distributions that are not normal or have autocorrelation. In finance Generalized Method of Moments (GMM) is most often used which is an equivalent technique. It corrects the standardized errors of the predictors by inserting a covariance matrix into the minimization of the norm (OLS can be thought of like this but with an identity matrix for W)

8c2e8ea09321d00d4b33a035d63ea76a06eefcef


https://en.wikipedia.org/wiki/Generalized_method_of_moments
 
  • Like
Likes WWGD
  • #9
BWV said:
The main difference is that GLM allows for error distributions that are not normal or have autocorrelation. In finance Generalized Method of Moments (GMM) is most often used which is an equivalent technique. It corrects the standardized errors of the predictors by inserting a covariance matrix into the minimization of the norm (OLS can be thought of like this but with an identity matrix for W)

8c2e8ea09321d00d4b33a035d63ea76a06eefcef


https://en.wikipedia.org/wiki/Generalized_method_of_moments
Thanks
How does one determine in practice the distribution of the errorrs?
 
  • #10
In practice you don't really care, other than you know they diverge from OLS assumptions due to heteroskedasticity and/or autocorrelation. The Newey West estimator, which is present in most statistical packages like R is generally used

https://en.wikipedia.org/wiki/Newey–West_estimator
 
  • Like
Likes WWGD

What is a generalized linear model?

A generalized linear model (GLM) is a statistical model that extends the linear regression model to accommodate non-normal error distributions and non-linear relationships between the dependent and independent variables. It is a flexible approach that can handle a wide range of data types and is commonly used in various fields such as biology, economics, and social sciences.

What are the assumptions of a generalized linear model?

The main assumptions of a generalized linear model are that the dependent variable is continuous, the errors are independent and identically distributed, the errors have a specific distribution (e.g. Gaussian, binomial, Poisson), and the relationship between the dependent and independent variables is linear.

How is a generalized linear model different from a linear regression model?

A generalized linear model is an extension of the linear regression model, where the dependent variable is assumed to follow a non-normal distribution and the relationship between the dependent and independent variables can be non-linear. In contrast, a linear regression model assumes that the dependent variable follows a normal distribution and the relationship between the variables is linear.

What is the purpose of using a link function in a generalized linear model?

A link function is used in a generalized linear model to transform the dependent variable into a linear form so that it can be modeled using traditional linear regression techniques. This allows for the incorporation of non-linear relationships between the dependent and independent variables, making the GLM more flexible and applicable to a wider range of data types.

What is overdispersion and how is it addressed in a generalized linear model?

Overdispersion occurs when the observed variability in the dependent variable is greater than what is expected based on the model. In a generalized linear model, overdispersion can be addressed by using a different error distribution or by including additional explanatory variables in the model to capture the extra variability. Another approach is to use a generalized linear mixed model, which allows for the inclusion of random effects to account for unobserved sources of variability.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
850
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
457
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Back
Top