Multivariate Gaussian - Normalization factor via diagnolization

In summary: I too find the multivariate gaussian hard to interpret at times. What I like to do is start with easy problems and build up.When working with single variable Gaussians, you'll frequently use a standard normal r.v. -- i.e. zero mean, unit variance. Let's do the first part of that and make your variables zero mean\begin{align}p(v; \mu, \Sigma) \propto exp \Big(-\frac{1}{2} v^T \Sigma^{-1} v\Big)\end{align}where ##v## is just the zero mean (i.e. centered) version of ##x##. To
  • #1
binbagsss
1,304
11

Homework Statement


Hi,

I am trying to follow my book's hint that to find the normalization factor one should

multigaus.png


"Diagnoalize ##\Sigma^{-1}## to get ##n## Gaussian which will have variance given by the eigenvalues of ##\Sigma## . Then integrate gives ##\sqrt{2\pi}\Lambda_i##, then use that the product of eigenvalues is the determinant ##.

Homework Equations



What I know:

##\Sigma## is symmetric and so it can be diagnolized ##\Sigma=PDP^{T}## where ##P## is the orthogonal matrix of eigenvectors and ##D## is the matrix of eigenvalues

The Attempt at a Solution


[/B]
I'm Stuck:

I am blank as to where start explicitly to be honest, having not done any examples in this, if I have, at least for four years or so.

Many thanks in advance !
 
Physics news on Phys.org
  • #2
binbagsss said:

Homework Statement


Hi,

I am trying to follow my book's hint that to find the normalization factor one should

View attachment 111556

"Diagnoalize ##\Sigma^{-1}## to get ##n## Gaussian which will have variance given by the eigenvalues of ##\Sigma## . Then integrate gives ##\sqrt{2\pi}\Lambda_i##, then use that the product of eigenvalues is the determinant ##.

Homework Equations



What I know:

##\Sigma## is symmetric and so it can be diagnolized ##\Sigma=PDP^{T}## where ##P## is the orthogonal matrix of eigenvectors and ##D## is the matrix of eigenvalues

The Attempt at a Solution


[/B]
I'm Stuck:

I am blank as to where start explicitly to be honest, having not done any examples in this, if I have, at least for four years or so.

Many thanks in advance !

Just re-express your integral over ##(x_1, x_2, \ldots, x_n)## as an integral over ##(y_1,y_2, \ldots, y_n)##, where ##\vec{y} ={P^T}^{-1} \vec{x}##. Use the standard change-of-variable formula for multivariate integrals.
 
Last edited:
  • #3
Ray Vickson said:
Just re-express your integral over ##(x_1, x_2, \ldots, x_n)## as an integral over ##(y_1,y_2, \ldots, y_n)##, where ##\vec{y} ={P^T}^{-1} \vec{x}##. Use the standard change-of-variable formula for multivariate integrals.

mmmm okay thanks, I think the general idea is making more sense, so by transforming to a diagonal matrix we loose cross terms in the gaussian so the integral reduces to a product over n individual gaussians.
Because ##D## is the diagonal matrix giving the eigenvalues of ##\Sigma## , ##\Sigma^{-1}## eigenvalues will be given by ##D^{-1}## and then reading of the form of the gaussian it is easy to see that the eigenvalues of ##\Sigma## will be the variances.

As for the actual algebra of the transformation, it's not looking so good for me.
I have:

(By the way, should the transformation be ##\tilde{y}=P^T \tilde{x}##, where ##\tilde{x}=x-\mu##)

##\tilde{y}=P^T \tilde{x} \implies \tilde{x}=(P^T)^{-1}\tilde{y}##

So ##\tilde{x}^T\Sigma^{-1}\tilde{x}=\tilde{y^{T}}((P^{T})^{-1})^{T}(P^{T})^{-1}D^{-1}P^{-1}(P^{T})^{-1}\tilde{y}##

Using ##\Sigma^{-1}=(PDP^T)^{-1}##

Whilst we know ##P^T=P^{-1}##, we can not simplify ##((P^{T})^{-1})^T## or ## (P^T)^{-1} ## can we?

Thanks in advance
 
  • #4
binbagsss said:
mmmm okay thanks, I think the general idea is making more sense, so by transforming to a diagonal matrix we loose cross terms in the gaussian so the integral reduces to a product over n individual gaussians.
Because ##D## is the diagonal matrix giving the eigenvalues of ##\Sigma## , ##\Sigma^{-1}## eigenvalues will be given by ##D^{-1}## and then reading of the form of the gaussian it is easy to see that the eigenvalues of ##\Sigma## will be the variances.

As for the actual algebra of the transformation, it's not looking so good for me.
I have:

(By the way, should the transformation be ##\tilde{y}=P^T \tilde{x}##, where ##\tilde{x}=x-\mu##)

##\tilde{y}=P^T \tilde{x} \implies \tilde{x}=(P^T)^{-1}\tilde{y}##

So ##\tilde{x}^T\Sigma^{-1}\tilde{x}=\tilde{y^{T}}((P^{T})^{-1})^{T}(P^{T})^{-1}D^{-1}P^{-1}(P^{T})^{-1}\tilde{y}##

Using ##\Sigma^{-1}=(PDP^T)^{-1}##

Whilst we know ##P^T=P^{-1}##, we can not simplify ##((P^{T})^{-1})^T## or ## (P^T)^{-1} ## can we?

Thanks in advance

Since ##\Sigma = P D P^T##, we have ##x^T \Sigma^{-1} x = x^T (P^T)^{-1} D^{-1} P^{-1} x##, so I should have written ##y = P^{-1} x##. Since, we do have ##(P^T)^{-1} = (P^{-1})^T## for any invertible matrix ##P##, we have ##x^T \Sigma^{-1} x = y^T D^{-1} y##. The Jacobian of the transformation gives
$$ dy_1\, dy_2 \, \cdots \, dy_n = J \: dx_1 \, dx_2, \cdots \, dx_n,$$
where
$$J = \left| \det \left( \frac{\partial y_i}{\partial x_j} \right) \right| = |\det(P^{-1})| = 1.$$
 
  • Like
Likes binbagsss
  • #5
Ray Vickson said:
Since ##\Sigma = P D P^T##, we have ##x^T \Sigma^{-1} x = x^T (P^T)^{-1} D^{-1} P^{-1} x##, so I should have written ##y = P^{-1} x##. Since, we do have ##(P^T)^{-1} = (P^{-1})^T## for any invertible matrix ##P##, we have ##x^T \Sigma^{-1} x = y^T D^{-1} y##. The Jacobian of the transformation gives
$$ dy_1\, dy_2 \, \cdots \, dy_n = J \: dx_1 \, dx_2, \cdots \, dx_n,$$
where
$$J = \left| \det \left( \frac{\partial y_i}{\partial x_j} \right) \right| = |\det(P^{-1})| = 1.$$

I am looking at Kardar, which then says that 'similar manipulations can be used to find the characteristic function' , my lecture notes also say this, and mention we should convert back too !

##\tilde{p(\vec{k})}=e^{-i k.\lambda + \frac{1}{2} k.C.k} ##

So the definition of ##\tilde{p(\vec{k})} ## is ##\tilde{p(\vec{k})} = <e^{-ik.x}>##

So using the same transformation as above I have ## \frac{1}{\sqrt{det(2\pi C)}}\int e^{-1/2 y^T D^{-1} y} e^{-i k.(P.y)} dy ##

Which doesn't look like it's simplified anything, since ##P## isn't of an easy form?
I'm guessing I need a different substitution?
Any help much appreciated anyone, thank you !
 
  • #6
I too find the multivariate gaussian hard to interpret at times. What I like to do is start with easy problems and build up.

When working with single variable Gaussians, you'll frequently use a standard normal r.v. -- i.e. zero mean, unit variance. Let's do the first part of that and make your variables zero mean

\begin{align}
p(v; \mu, \Sigma) \propto exp \Big(-\frac{1}{2} v^T \Sigma^{-1} v\Big)
\end{align}

where ##v## is just the zero mean (i.e. centered) version of ##x##. To put this in mathematically I'd probably say: ##v_i = x_i - \mu_i##, or something along those lines. Note the above setup assumes your covariance matrix is computed from ##v## not from ##x##. (I would have used a new symbol for said covariance matrix, but ##\Sigma## is pretty much the notation for a covariance matrix.)

Can we assume these variables are independent? If so then that means your ##\Sigma## is already diagonalized -- i.e. it is diagonal. Why? because diagonal entries refer to variance and off diagonal entries refer to covariance. Recall that independent variables with zero mean have no covariance. (Technically we could skate by with pairwise independence but let's assume they are mutually independent.)

Is this interpretable now? Note: that if we were trying to make ##v## have unit variance we'd do ##\Sigma^{-\frac{1}{2} }v## -- this is possible because ##\Sigma## is Symmetric positive definite (irrespective of independence concerns... unless one of your variables or data sets is degenerate in which case ##\Sigma## has an eigenvalue = 0, but this but degeneracy needs to be dealt with one way or another before you can move ahead, so by assumption/ necessity, all eigs >0.)

From here you can move on to the normalizing constant. It really is just a matter of recognizing that when you multiply a sequence of random variables that are each described by a constant times exponential function, you add the stuff inside the exponential function, and multiply the constants outside. So with respect to the normalizing constant, if you're looking at the joint pdf of ##n## Gaussians, the ##\frac{1}{\sqrt 2}## term gets multiplied n times, so you could write that part of the constant as ##\frac{1}{(2 \pi)^\frac{n}{2}}##.

Also recall from the single variable case, that the other part of the normalizing constant, is one divided by the square root of variance. From here, multiplying a bunch of square roots of variance can also be written as the square root of the product of a bunch of variances... and the product of variances is the product of the diagonal entries of ##\Sigma## in our case -- and since ##\Sigma## is diagonal, we can also call this the determinant. Hence the other piece of the normalizing constant is ##\frac{1}{det(\Sigma ^{\frac{1}{2}})}## or equivalently ##\frac{1}{det(\Sigma) ^{\frac{1}{2}}}##. It is perhaps worth pointing out that when I said "that if we were trying to make ##v## have unit variance we'd do ##\Sigma^{-\frac{1}{2} }v##" -- if we followed through on this, we would not have needed to put this determinant factor in the normalizing constant (as said determinant would be 1.) If you're actually working with data that is supposed to be independent but perhaps isn't, you would be whitening your data (a term from signal processing) at such a step --which can also be interpreted as being proportional to the result you get when setting all singular values = 1, assuming you have a non-singular data matrix and so forth.

Post Script:
For avoidance of doubt, unless explicitly stated otherwise, when I say square root of something I mean the positive square root -- in the same way that standard deviations are square roots of variance.
 
Last edited:
  • Like
Likes binbagsss
  • #7
StoneTemplePython said:
Note the above setup assumes your covariance matrix is computed from ##v## not from ##x##. (I would have used a new symbol for said covariance matrix, but ##\Sigma## is pretty much the notation for a covariance matrix.)
.

So I got the correct normalization constant by doing

##\tilde{y}=P^T \tilde{x}##, where ##\tilde{x}=x-\mu##)

And ##\Sigma^{-1}=(PDP^T)^{-1}##

##D## the eigenvalue matrix of ##\Sigma## and ##P## the eigenvector matrix.

However I did not take this fact into consideration- the covariance matrix is the one specified in the question and so must be the one computed from ##x##? So this is wrong?

Thanks in advance
 
  • #8
The thing is -- variance and covariance are distance metrics. It really doesn't matter if you have a mean in there or not -- the variances certainly don't change (they are defined as dispersion about a given mean using a euclidean distance metric) and if you work through the math, the covariances don't either.

Quick walkthrough of the math: ##cov(X,Y) = E[XY] - E[X]E[Y]##

consider the case where we shift the mean of X by some amount fixed value called b
$$
cov(X + b,Y) = E[(X+b)Y] - E[X+b]E[Y]\\
cov(X + b,Y) = E[XY+ bY] - (E[X] + b)E[Y]\\
cov(X + b,Y) = E[XY] +E[bY] - (E[X] + b)E[Y]\\
cov(X + b,Y) = E[XY] + bE[Y] - (E[X]E[Y] + bE[Y])\\
cov(X + b,Y) = E[XY] - E[X]E[Y] \\
cov(X + b,Y) = cov(X, Y)
$$
now set b := mean of X, and you demonstrate covariance doesn’t change after centering X. Repeat application and center Y.

That said I'm quite biased toward working with zero mean variables -- the expressions are a lot simpler, and if you're working with data, there are very compelling linear algebra related reasons to center the data as part of preprocessing. The only real assumption you make is that your underlying variables have a mean (and that your sampling isn't horrible) -- but you don't need to make any assumption beyond that, so it's quite general.

So if you twist my arm -- I'd say you're ok. I almost always try to work with zero mean variables, then if needed, do a bijective claim at the end.

(Note for completeness: the only time I'm aware that you can't really work with zero mean variables, where a mean exists, is when you're looking at things in a different way -- i.e. expected time till failure or expected time till absorption or whatever stochastic process. That doesn't apply here.)

N.B. apparently putting square brackets around b triggers bold and blows up all the associated LaTeX. This kept happening when I was showing that the expectation of a b is b.
 
Last edited:

FAQ: Multivariate Gaussian - Normalization factor via diagnolization

1. What is a multivariate Gaussian distribution?

A multivariate Gaussian distribution is a probability distribution that describes the behavior of multiple variables that are normally distributed. It is a generalization of the univariate Gaussian distribution, which describes the behavior of a single variable. The multivariate Gaussian distribution is commonly used in statistics and data analysis.

2. What is the normalization factor in a multivariate Gaussian distribution?

The normalization factor, also known as the normalization constant or partition function, is a constant that ensures the total probability of a multivariate Gaussian distribution sums to 1. It is calculated by integrating the multivariate Gaussian function over all possible values of the variables.

3. How is the normalization factor obtained via diagonalization?

The normalization factor in a multivariate Gaussian distribution can be obtained via diagonalization by using the eigenvalues and eigenvectors of the covariance matrix. The covariance matrix is a square matrix that contains the variances and covariances of the variables in the distribution. By diagonalizing the covariance matrix, we can simplify the calculation of the normalization factor.

4. Why is the normalization factor important in a multivariate Gaussian distribution?

The normalization factor is important because it ensures that the total probability of a multivariate Gaussian distribution sums to 1. This allows us to interpret the probability values of the distribution and make meaningful comparisons between different distributions. Additionally, the normalization factor is used in various statistical calculations and models that involve multivariate Gaussian distributions.

5. Can the normalization factor be calculated for any multivariate Gaussian distribution?

Yes, the normalization factor can be calculated for any multivariate Gaussian distribution as long as the covariance matrix is not singular (i.e. has a determinant of 0). If the covariance matrix is singular, the multivariate Gaussian distribution is degenerate and the normalization factor cannot be calculated.

Back
Top