Generalized Linear Model Basics

The affine function of the ##x_j##s inside the big parenthesis on the RHS is called the Linear Predictor.The estimation process, which typically uses Maximum Likelihood, involves estimating ##a_0## to ##a_m##, plus any free parameters of the distribution of the ##Y_j##. All parameters of the distribution are required to be constant across possible values of the ##x_j##s, except for the mean. The mean effectively specifies one parameter, so if the distribution family has only one parameter, such as Bernoulli or Poisson, there are no more parameters to be estimated. For the common case of two-distribution families like the normal or logn
  • #1

WWGD

Science Advisor
Gold Member
6,643
9,632
Hi,
Just wanted to see if I understood the meaning of Generalized Linear Models:
In the case of Standard ( "Non-generalized") Linear models, a dependent variable y is a linear function of a dependent variable x. In a Generalized Linear Model (GLM), a dependent variable y is linear in some function L of the variable x, i.e., y is linear in L(x)? I am thinking of the example case of Logistic Regression as an example of GLM where y depends linearly on the log-odds : Log(p/(1-p))? Do we have multilinear cases? Is this accurate, and if so, complete, essentially?
Thanks.
EDIT: Also, is there some result tellings us given data pairs ##\{(x_i,y_i) \}; y_i \~ x_i##, i.e., y depends on x, when we can find a function L so that y is linear in L(x)?
 
Physics news on Phys.org
  • #2
WWGD said:
In a Generalized Linear Model (GLM), a dependent variable y is linear in some function L of the variable x, i.e., y is linear in L(x)? I
Not quite. that would mean applying the nonlinear function L to every independent variable where there is more than one of them, which is not what happens. What actually happens is that a nonlinear function is applied to an affine combination (affine means linear but with a constant term (intercept) included*) of the independent variables. That function is the inverse of what's called the link function, usually denoted by ##g##.

The model is
$$
g\left(E[y_j]\right) = a_0 + \sum_{k=1}^m a_k x_{kj}
$$
and the task is to estimate ##a_0,...,a_m## and the free parameter(s) of the distribution of the independent error terms ##\epsilon_j##.

The other way that GLM differs from standard linear regression is that the error terms ##\epsilon_j## need not be normally distributed. The assumed distribution family of ##y_j## needs to be specified - eg normal, lognormal, Bernoulli, gamma. To ascertain which distribution in a given family it is - ie what the parameters of the distribution are - one parameter is fixed by the requirement that ##g\left(E[y_j]\right)=a_0 + \sum_{k=1}^m a_k x_{kj}##, and the other parameters are estimated in the regression process.

In logistic regression the link function ##g## is the logit function (the inverse of the logistic function) and ##g(y_j)## is assumed to have a Bernoulli distribution. Since Bernoulli is a single-parameter distribution there are no additional parameters to be estimated for that distribution.

* It can be treated as a pure linear combination by the addition of a dummy independent variable ##x_{0j}## whose value is always 1.
 
Last edited:
  • Like
Likes WWGD
  • #3
Thanks, but it seems like g(Yi) itself affine on the xi, isn't it?
 
  • #4
WWGD said:
it seems like g(Yi) itself affine on the xi, isn't it?
No, because the error term ##\epsilon_j## prevents that. But it is the case that the expected value ##E[y_j]## is an affine function of the ##x_{kj}## because

$$g\left(E[y_j]\right) = a_0 + \sum_{k=1}^m a_k x_{kj} $$

EDIT: I corrected an error in this: I originally had ##E[y_j]## on the LHS of that equation and it should have been ##g\left(E[y_j]\right)##.
 
Last edited:
  • Like
Likes WWGD
  • #5
Ok. thanks again, just so I can organize it internally: Then, up to error term we do have affiness , I.e. for g(Yi)-##\epsilon _i##? Also, don't we also have E(## \epsilon_i ##)=0?
 
  • #6
Is the affiness of E(##y_i##) that makes the linear part of the generalized linear model?
 
  • #7
On second thoughts, maybe the use of epsilons ##\epsilon_j## was not helpful. They're not wrong, but I'm coming to the view that presenting GLMs that way makes it less intuitive. So let me try again and see if I do a better job this time.

For a GLM we specify two things, a distribution family for the y observations, and a link function ##g##.

The model is that each observation ##Y_j## is a random draw from a distribution from the specified family, whose mean ##\mu_j=E[Y_j]## satisfies the equation ##g(\mu_j)=a_0 + \sum_{k=1}^m a_k x_{kj}##. So we have:

$$\mu_j = E[Y_j] = g^{-1}\left(a_0 + \sum_{k=1}^m a_k x_{kj}\right)$$

The affine function of the ##x_j##s inside the big parenthesis on the RHS is called the Linear Predictor.

The estimation process, which typically uses Maximum Likelihood, involves estimating ##a_0## to ##a_m##, plus any free parameters of the distribution of the ##Y_j##. All parameters of the distribution are required to be constant across possible values of the ##x_j##s, except for the mean. The mean effectively specifies one parameter, so if the distribution family has only one parameter, such as Bernoulli or Poisson, there are no more parameters to be estimated. For the common case of two-distribution families like the normal or lognormal, there will be an additional parameter to estimate - the standard deviation for the normal dist - and that will be estimated as part of the GLM process as well. For a three-parameter family (eg Generalised Pareto, although I haven't seen them used in GLM) there will be two parameters to be estimated in addition to the ##a_j##s.

The above is all for the case where the response variable ##y## is scalar. It gets a bit more complex if ##y## is a vector.

With that re-framing of the background, the answers to your latest two questions are

WWGD said:
Ok. thanks again, just so I can organize it internally: Then, up to error term we do have affiness , I.e. for g(Yi)-##\epsilon _i##? Also, don't we also have E(## \epsilon_i ##)=0?
It is ##g\left(E[Y_j]\right)##, not ##g\left(Y_j\right)## that is an affine function of the ##x_j##s. I wrongly stated that above as ##E[g(Y_j)]##.
WWGD said:
Is the affiness of E(##y_i##) that makes the linear part of the generalized linear model?
The linear aspect of the model is the linear predictor.

A well-known example of GLM is logistic regression, in which the distribution family is Bernoulli and the link function ##g## is logit, the inverse of the logistic function. So the logistic function is applied to the linear predictor, which is the estimated log-odds, to obtain the probability ##\mu_j## given ##x_1,...,x_m##.

Another example is ordinary linear regression, in which the link function is the identity function ##g(x)=x## and the distribution family is normal.
 
  • Like
Likes WWGD and Tosh5457
  • #8
The main difference is that GLM allows for error distributions that are not normal or have autocorrelation. In finance Generalized Method of Moments (GMM) is most often used which is an equivalent technique. It corrects the standardized errors of the predictors by inserting a covariance matrix into the minimization of the norm (OLS can be thought of like this but with an identity matrix for W)

8c2e8ea09321d00d4b33a035d63ea76a06eefcef


https://en.wikipedia.org/wiki/Generalized_method_of_moments
 
  • Like
Likes WWGD
  • #9
BWV said:
The main difference is that GLM allows for error distributions that are not normal or have autocorrelation. In finance Generalized Method of Moments (GMM) is most often used which is an equivalent technique. It corrects the standardized errors of the predictors by inserting a covariance matrix into the minimization of the norm (OLS can be thought of like this but with an identity matrix for W)

8c2e8ea09321d00d4b33a035d63ea76a06eefcef


https://en.wikipedia.org/wiki/Generalized_method_of_moments
Thanks
How does one determine in practice the distribution of the errorrs?
 
  • #10
In practice you don't really care, other than you know they diverge from OLS assumptions due to heteroskedasticity and/or autocorrelation. The Newey West estimator, which is present in most statistical packages like R is generally used

https://en.wikipedia.org/wiki/Newey–West_estimator
 
  • Like
Likes WWGD

Suggested for: Generalized Linear Model Basics

Replies
5
Views
687
Replies
1
Views
605
Replies
22
Views
2K
Replies
9
Views
881
Replies
4
Views
321
Replies
6
Views
1K
Replies
1
Views
492
Back
Top