Derivative of Log Likelihood Function

Click For Summary

Discussion Overview

The discussion revolves around the differentiation of the log likelihood function in a mixture of Gaussians model. Participants express confusion regarding specific steps in the differentiation process, particularly focusing on the terms involving the covariance matrix and the multivariate Gaussian function.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant expresses confusion about the differentiation of the term π_{k} N(x_{n}|μ_{k}, Σ) in the log likelihood function.
  • Another participant suggests that the complexity arises from the covariance matrix Σ_k appearing both inside and outside the exponent in the cumulative distribution function (CDF) of the multivariate Gaussian.
  • There is a discussion about applying the product rule to differentiate the CDF, leading to a proposal for a specific expression involving the covariance matrix.
  • A participant questions the use of CΣ_{k}^{-1} in the differentiation, pointing out the need to consider the determinant in the multivariate Gaussian formula.
  • Further clarification is provided, indicating that the previous expression was only indicative and that the correct form should involve the determinant of the covariance matrix.
  • One participant suggests that working through the univariate case might clarify the differentiation process for the multivariate case.
  • Another participant attempts to rewrite the expression with exponents and discusses the complications arising from the extra term left over after factoring.
  • Finally, one participant claims to have resolved their confusion using properties from the matrix cookbook.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the differentiation process, as there are multiple viewpoints and ongoing confusion regarding the correct application of mathematical principles. Some participants agree on the complexity of the differentiation, while others propose different approaches to clarify the issue.

Contextual Notes

Participants express uncertainty about specific mathematical steps and the implications of using the determinant versus the inverse of the covariance matrix. There are unresolved aspects regarding the differentiation of the multivariate Gaussian function.

NATURE.M
Messages
298
Reaction score
0
So looking through my notes I can't seem to understand how to get from one step to the next. I have attached a screenshot of the 2 lines I'm very confused about. Thanks.

BTW: The equations are for the log likelihood in a mixture of gaussians model

EDIT: To elaborate I am particularly confused about how they get numerator term π_{k} N(x_{n}|μ_{k}, Σ). I can't seem to understand how they are differentiating this to obtain that. I understand how they obtain the denominator term from differentiating the log but that's about all. To differentiate the multivariate gaussian I would think the log function needs to be used to break up the internal terms. Although I can't put this intuition together.
 

Attachments

  • Screen Shot 2015-11-15 at 5.37.12 PM.png
    Screen Shot 2015-11-15 at 5.37.12 PM.png
    7.6 KB · Views: 656
Last edited:
Physics news on Phys.org
I think it's because ##\Sigma_k## appears both inside and outside (as an inverse) the exponent in the cdf function ##\mathscr{N}##.
So
$$\frac{\partial}{\partial \Sigma_k}\mathscr{N}(\mu,\Sigma_k)=
\frac{\partial}{\partial \Sigma_k}\left[C\Sigma_k{}^{-1}\exp[f(\mu,\Sigma_k)]\right]$$
for known constant ##C## and function ##f##.
By the product rule, this is then equal to
$$C\exp[f(\mu,\Sigma_k)]\left[\frac{\partial}{\partial \Sigma_k}\Sigma_k{}^{-1}+\Sigma_k{}^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$

There will be some messy algebra involved.

You might find it easier to first work through the univariate case, differentiating wrt ##\sigma## and seeing if you can obtain an analogous expression. If that works out, it shouldn't be too hard to extend it to the multivar case.
 
andrewkirk said:
I think it's because ##\Sigma_k## appears both inside and outside (as an inverse) the exponent in the cdf function ##\mathscr{N}##.
So
$$\frac{\partial}{\partial \Sigma_k}\mathscr{N}(\mu,\Sigma_k)=
\frac{\partial}{\partial \Sigma_k}\left[C\Sigma_k{}^{-1}\exp[f(\mu,\Sigma_k)]\right]$$
for known constant ##C## and function ##f##.
By the product rule, this is then equal to
$$C\exp[f(\mu,\Sigma_k)]\left[\frac{\partial}{\partial \Sigma_k}\Sigma_k{}^{-1}+\Sigma_k{}^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$

There will be some messy algebra involved.

You might find it easier to first work through the univariate case, differentiating wrt ##\sigma## and seeing if you can obtain an analogous expression. If that works out, it shouldn't be too hard to extend it to the multivar case.
I don't understand how you got $$C\Sigma_{k}^{-1}$$ In the multivariate gaussian we have $$\frac{1}{|\Sigma_{k}|}$$ How did you convert that determinant into an inverse ? Maybe you meant the same thing but forgot the determinant sign ?
 
Last edited:
NATURE.M said:
I don't understand how you got $$C\Sigma_{k}^{-1}$$ In the multivariate gaussian we have $$\frac{1}{|\Sigma_{k}|}$$ How did you convert that determinant into an inverse ?
I didn't. What I wrote is only broadly indicative of the structure. I didn't look up the multivariate Gaussian formula. With your correction that line becomes:

$$C\exp[f(\mu,\Sigma_k)]\left[\frac{\partial}{\partial \Sigma_k}|\Sigma_k|^{-1}+|\Sigma_k|^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$
which is
$$C\exp[f(\mu,\Sigma_k)]\left[-|\Sigma_k|^{-2}\frac{\partial |\Sigma_k|}{\partial \Sigma_k}+|\Sigma_k|^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$

I think if you work through the univariate case first it'll become much clearer.
 
andrewkirk said:
I didn't. What I wrote is only broadly indicative of the structure. I didn't look up the multivariate Gaussian formula. With your correction that line becomes:

$$C\exp[f(\mu,\Sigma_k)]\left[\frac{\partial}{\partial \Sigma_k}|\Sigma_k|^{-1}+|\Sigma_k|^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$
which is
$$C\exp[f(\mu,\Sigma_k)]\left[-|\Sigma_k|^{-2}\frac{\partial |\Sigma_k|}{\partial \Sigma_k}+|\Sigma_k|^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$

I think if you work through the univariate case first it'll become much clearer.

Okay so rewriting with exponents of -1/2 (for the gaussian) and repeating the operation we would have:
$$C\exp[f(\mu,\Sigma_k)]\left[-\frac{1}{2}|\Sigma_k|^{\frac{-3}{2}}\frac{\partial |\Sigma_k|}{\partial \Sigma_k}+|\Sigma_k|^{\frac{-1}{2}}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$
So the problem becomes the extra $$|\Sigma_k|^{-1}$$ that gets left over after we factor out $$|\Sigma_k|^{\frac{-1}{2}}$$ Any ideas ?
 
So I think I resolved my troubles using a few properties outlined in the matrix cookbook.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
5K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 16 ·
Replies
16
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K