Layperson's description of multivariate gaussian distributions?

In summary, the multivariate gaussian distribution is the joint distribution function for random variables each of which has a gaussian distribution. It takes the form of a density mixture, usually in the form of a weighted set of multivariate guassians. The EM algorithm is used to describe a multivariate data distribution as a density mixture, usually in the form of a weighted set of multivariate guassians.
  • #1
xnull
16
0
I am a Computer Science student who wants to implement the EM statistical clustering algorithm. I'm doing this on my spare time outside of any classes that I'm taking. I've been doing a lot of reading and understand almost everything I need to fully. However, I only understand univariable normal distributions.

Could anyone on here give me an undergraduate mathematical rationale for the formal description of the multivariate gaussian distribution's pdf?

I'll might then ask some questions about the underlying math.

Thanks to anyone who steps up to the plate.
 
Physics news on Phys.org
  • #2
I can't figure out what you are trying to find out. However, multivariate gaussian distribution is the joint distribution function for random variables each of which has a gaussian distribution.
 
  • #3
Thank you mathman!

I am looking for an undergraduate level explanation of how the multivariate guassian distribution's pdf got deduced. All the while I'm looking for the rationale behind the math - little sentences which explain why each step is justified if it isn't obvious and little sentences which explain the general approach (like justifying the use of matices).

Your post makes sense to me. I understand what the function is. I just don't quite understand how we came to that understanding and what all of the parameters of the distribution are / why they are formalized as vectors and matrices.
 
  • #4
xnull. I've been studying the same, on and off, for a while now, in the context of combination of error. Can you link-to or post the equations?
 
  • #6
mathman said:
However, multivariate gaussian distribution is the joint distribution function for random variables each of which has a gaussian distribution.

actually, that's a common mistake. Whereas each individual random variable of a multivariate gaussian themselves are Gaussian, the converse is not true. There exist Gaussian random variables which do not form a multivariate Gaussian. However, the following are true

- independent Gaussian r.v.s form a multivariate Gaussian.
- any set of linear combinations of independent Gaussians are multivariate Gaussian.
- a vector of random variables is multivariate Gaussian if and only if all linear combinations of them are (individually) Gaussian rvs.
- any pdf proportional exp(-Q(x1,x2,...)) for a quadratic Q is a multivariate Gaussian pdf.
- a r.v. X taking values in Rd is multivariate Gaussian if and only if its characteristic function is of the form exp(-Q(x)) for a quadratic Q.
- the central limit theorem says that for a sequence of IID random variables with finite variance X1, X2, ... taking values in Rd, then (X1+...+Xn)/sqrt(n) converges in distribution to a multivariate Gaussian.
 
  • #7
I think Phrak wanted references/links on the expectation maximization algorithm. Now you have a keyword, Phrak! Google away.

One use of the EM algorithm (and this is what I suspect xnull is using the EM algorithm for) is to describe a multivariate data distribution as a density mixture, usually in the form of a weighted set of multivariate guassians. This is the approach used in a lot of Bayesian-based supervised and unsupervised learning techniques to create pattern classifiers.

mathman said:
I can't figure out what you are trying to find out. However, multivariate gaussian distribution is the joint distribution function for random variables each of which has a gaussian distribution.
This is overly simplistic. Specifically, it ignores correlations. It is those correlations that necessitate the matrix/vector formulation of the multivariate gaussian:

[tex]f(\boldsymbol x) =
\frac 1 {(2\pi)^{N/2}|\mathbf{\Sigma}|^{1/2}}\,
\exp\left(-\,\frac 1 2\, (\boldsymbol x - \boldsymbol \mu)^T\,\Sigma^{-1}\,(\boldsymbol x - \boldsymbol \mu)\right)[/tex]

Examining each part, the first term makes the integral of the multivariate gaussian PDF over all space be one. This factor is needed because whether some function integrates to unity over the entire space is the number one test of whether that function is a probability measure.

The important part is the exponential. This is the multidimensional analaog of the one dimensional form [itex]\exp(-1/2\,(x-\mu)^2/\sigma^2)[/itex]. The expression [itex](x-\mu)^2}/\sigma^2[/itex] is a one dimensional positive definite quadratic form. Extending to higher dimensions suggests using a multidimensional positive definite quadratic form, to wit

[tex] (\boldsymbol x - \boldsymbol b)^T\mathbf A(\boldsymbol x - \boldsymbol b)[/tex]

To be a positive definite quadratic form the matrix A has to be positive definite. In addition to being positive definite, this treatment limits the matrix A to be symmetric since the covariance matrix is symmetric. To qualify as a PDF the function needs to integrate to one over all space: we need to find a scale factor [itex]\alpha[/itex] such that

[tex]\alpha \int \exp\left(-\,\frac 1 2\, (\boldsymbol x - \boldsymbol b)^T\,\mathbf A\,(\boldsymbol x - \boldsymbol b)\right) d\boldsymbol x= 1[/tex]

The easiest way to evaluate this integral is to transform to the principal axis system, in which case the transformed A matrix is diagonal. That such a transform exists is guaranteed to the positive definite, symmetric nature of the matrix A. The integral is separable in this system, resulting in

[tex]\int \exp\left(-\,\frac 1 2\, (\boldsymbol x - \boldsymbol b)^T\,\mathbf A\,(\boldsymbol x - \boldsymbol b)\right) d\boldsymbol x= \frac{(2\pi)^{N/2}}{|\mathbf A|^{1/2}} [/tex]

The scale factor alpha falls out from this. Scaling to form a PDF,

[tex]f(\boldsymbol x) =
\frac{|\mathbf A|^{1/2}}{(2\pi)^{N/2}}\,
\exp\left(-\,\frac 1 2\, (\boldsymbol x - \boldsymbol b)^T\,\mathbf A\,(\boldsymbol x - \boldsymbol b)\right)[/tex]

What is its mean and covariance? You can grind through the math if you want. The mean is simply the vector b and the covariance is simply the inverse of the matrix A. Denoting the mean as mu and the covariance matrix as Sigma leads to the more standard form

[tex]f(\boldsymbol x) =
\frac1{(2\pi)^{N/2}|\mathbf{\Sigma}|^{1/2}}\,
\exp\left(-\,\frac 1 2\, (\boldsymbol x - \boldsymbol \mu)^T\,\mathbf{\Sigma}^{-1}\,(\boldsymbol x - \boldsymbol \mu)\right)[/tex]
 
Last edited:
  • #8
D H said:
I think Phrak wanted references/links on the expectation maximization algorithm. Now you have a keyword, Phrak! Google away.

Thanks DH. I think xnull and I have multivariate guassian distribution's in common, but depart where EM begins. Both, however utilize N random vectors in some M dimensional basis space of independent probabilities. We seem to have the same goal, of getting a better grasp of how this vector space relates to the space of random variables.
 
Last edited:
  • #9
One use of the EM algorithm (and this is what I suspect xnull is using the EM algorithm for) is to describe a multivariate data distribution as a density mixture, usually in the form of a weighted set of multivariate guassians. This is the approach used in a lot of Bayesian-based supervised and unsupervised learning techniques to create pattern classifiers.

Yes, precisely. I'm using this for unsupervised learning.

Phrak, if you want to get help on the EM algorithm you should start another thread so that it is more easily searchable by Googlers. I'll be sure to keep my eye out in case you do.

Thank you D F. That was precisely the type of description I was looking for.

I do not understand why the integral become separable when its transformed to the principal axis system, nor what the principal axis system would be for this n-dimensional space, nor why the matrix A is diagonal to this axis system. But since I understand where the pdf came from now, and since my lack of understanding these things seem to lie outside the scope of my original question, I'll extend my thanks to you (again!).

You've made one unsupervised robot very happy. Thanks several times D F.
 
  • #10
xnull said:
Phrak, if you want to get help on the EM algorithm you should start another thread so that it is more easily searchable by Googlers. I'll be sure to keep my eye out in case you do.

No problem, xnull. I don't mean to trounce on you thread.
 
  • #11
I edited my previous post. The modification is shown here, with the changes bold:
D H said:
... To be a positive definite quadratic form the matrix A has to be positive definite. In addition to being positive definite, this treatment limits the matrix A to be symmetric since the covariance matrix is symmetric. ...

... The easiest way to evaluate this integral is to transform to the principal axis system, in which case the transformed A matrix is diagonal. That such a transform exists is guaranteed to the positive definite, symmetric nature of the matrix A. The integral is separable in this system ...

These additions are needed to answer xnull's question in post #9.

xnull said:
I do not understand why the integral become separable when its transformed to the principal axis system, nor what the principal axis system would be for this n-dimensional space, nor why the matrix A is diagonal to this axis system. But since I understand where the pdf came from now, and since my lack of understanding these things seem to lie outside the scope of my original question, I'll extend my thanks to you (again!).
You're welcome.

The reason the system is diagonalizable is that all positive definite symmetric matrices are diagonalizable. Moreover, the diagonalization is done with an orthogonal transformation matrix. This is a direct result of the spectral theorem.
  • How to construct this? Simple: Use either the eigen decomposition or singular value decomposition. (For a positive definite symmetric matrix these yield the same result.)

  • What does it look like? The expression [itex](\boldsymbol x - \boldsymbol b)^T\mathbf A(\boldsymbol x - \boldsymbol b) = c[/tex] is the equation of an N-dimensional ellipsoid. The axes might of this ellipsoid might be pointed any which way. They will, however, be orthogonal to one another. The ellipsoid axes form an orthogonal basis for the N-space: Call this the principal axis system. There exists some transformation matrix that transforms from the original system to the principal axis system. The covariance matrix in this principal axis system takes on a simple form: It is diagonal. The transformation matrix that transforms to the principal axis system is the orthogonal matrix that diagonalizes the covariance matrix (so, another means of construction.)

  • Why is the system separable in the principal axis system? Simple: Do the math. The original quadratic form [itex](\boldsymbol x - \boldsymbol b)^T\mathbf A(\boldsymbol x - \boldsymbol b)[/tex], upon transformed to the principal axis system, becomes the much simpler [itex]\sum_i a'_i {x'_i}^2 = a'_1 {x'_1}^2 + a'_2 {x'_2}^2+ \cdots + a'_N {x'_N}^2[/itex]. (Here, the a'i are the diagonal elements of the transformed A matrix and the x'i are the principal independent variables.) Now exponentiate this:

    [tex]\exp\left(-\,\frac 1 2\,(\boldsymbol x - \boldsymbol b)^T\mathbf A(\boldsymbol x - \boldsymbol b)\right) \;\to\; \exp\left(-\,\frac 1 2\,a'_1 {x'_1}^2\right)\exp\left(-\,\frac 1 2\,a'_2 {x'_2}^2\right)\cdots \exp\left(-\,\frac 1 2\,a'_N {x'_N}^2\right)[/tex]

    Integrating over N-space yields

    [tex]\prod_{i=1}^N\sqrt{\frac{2\pi}{a'_i}}
    = \frac{(2\pi)^{N/2}}{|\mathbf A'|^{1/2}}[/tex]

    Since determinant of a matrix is invariant with respect to orthogonal transformation, [itex]|\mathbf A'| = |\mathbf A|[/itex].
 
  • #12
Wow, D H, you went all out. Thank you many times. I really understand this now!

My experiences with this forum (the few) have been exceptional. I posted the same question on two different forums - one focused on math and the other with a focus on statistics. Only this forum responded. And the quality of the responses. Wow.

Which means I'm going to try to stick around and help other people as best I can, where I can. Great community guys.
 

What is a multivariate Gaussian distribution?

A multivariate Gaussian distribution is a probability distribution that describes the relationship between multiple variables in a dataset. It is often used in statistical analysis to model complex data, as it allows for the consideration of multiple variables and their interactions.

How is a multivariate Gaussian distribution different from a univariate Gaussian distribution?

While a univariate Gaussian distribution describes the probability of a single variable, a multivariate Gaussian distribution takes into account the relationships between multiple variables. It is represented by a mean vector and a covariance matrix, which represent the average values and the degree of variation between the variables, respectively.

What are the assumptions of a multivariate Gaussian distribution?

The main assumptions of a multivariate Gaussian distribution are that the variables are normally distributed and that they are linearly related to each other. Additionally, the variables should be continuous and the dataset should have a sufficient sample size.

How is a multivariate Gaussian distribution used in data analysis?

A multivariate Gaussian distribution is used in data analysis to model complex datasets and to make predictions based on the relationships between variables. It can also be used to identify outliers or anomalies in the data.

What are some common applications of multivariate Gaussian distributions?

Multivariate Gaussian distributions are commonly used in fields such as finance, biology, and engineering. They are used to model stock prices, gene expression data, and physical systems, among others. They are also widely used in machine learning algorithms for classification and clustering tasks.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • STEM Academic Advising
Replies
14
Views
553
  • Programming and Computer Science
Replies
1
Views
938
  • Quantum Interpretations and Foundations
Replies
14
Views
2K
  • General Math
Replies
1
Views
4K
  • STEM Academic Advising
Replies
5
Views
1K
  • STEM Academic Advising
Replies
13
Views
2K
Back
Top