I think Phrak wanted references/links on the expectation maximization algorithm. Now you have a keyword, Phrak! Google away.
One use of the EM algorithm (and this is what I suspect xnull is using the EM algorithm for) is to describe a multivariate data distribution as a density mixture, usually in the form of a weighted set of multivariate guassians. This is the approach used in a lot of Bayesian-based supervised and unsupervised learning techniques to create pattern classifiers.
mathman said:
I can't figure out what you are trying to find out. However, multivariate gaussian distribution is the joint distribution function for random variables each of which has a gaussian distribution.
This is overly simplistic. Specifically, it ignores correlations. It is those correlations that necessitate the matrix/vector formulation of the multivariate gaussian:
f(\boldsymbol x) =<br />
\frac 1 {(2\pi)^{N/2}|\mathbf{\Sigma}|^{1/2}}\,<br />
\exp\left(-\,\frac 1 2\, (\boldsymbol x - \boldsymbol \mu)^T\,\Sigma^{-1}\,(\boldsymbol x - \boldsymbol \mu)\right)
Examining each part, the first term makes the integral of the multivariate gaussian PDF over all space be one. This factor is needed because whether some function integrates to unity over the entire space is the number one test of whether that function is a probability measure.
The important part is the exponential. This is the multidimensional analaog of the one dimensional form \exp(-1/2\,(x-\mu)^2/\sigma^2). The expression (x-\mu)^2}/\sigma^2 is a one dimensional positive definite quadratic form. Extending to higher dimensions suggests using a multidimensional positive definite quadratic form, to wit
(\boldsymbol x - \boldsymbol b)^T\mathbf A(\boldsymbol x - \boldsymbol b)
To be a positive definite quadratic form the matrix
A has to be positive definite. In addition to being positive definite, this treatment limits the matrix
A to be symmetric since the covariance matrix is symmetric. To qualify as a PDF the function needs to integrate to one over all space: we need to find a scale factor \alpha such that
\alpha \int \exp\left(-\,\frac 1 2\, (\boldsymbol x - \boldsymbol b)^T\,\mathbf A\,(\boldsymbol x - \boldsymbol b)\right) d\boldsymbol x= 1
The easiest way to evaluate this integral is to transform to the principal axis system, in which case the transformed A matrix is diagonal. That such a transform exists is guaranteed to the positive definite, symmetric nature of the matrix A. The integral is separable in this system, resulting in
\int \exp\left(-\,\frac 1 2\, (\boldsymbol x - \boldsymbol b)^T\,\mathbf A\,(\boldsymbol x - \boldsymbol b)\right) d\boldsymbol x= \frac{(2\pi)^{N/2}}{|\mathbf A|^{1/2}}
The scale factor alpha falls out from this. Scaling to form a PDF,
f(\boldsymbol x) =<br />
\frac{|\mathbf A|^{1/2}}{(2\pi)^{N/2}}\,<br />
\exp\left(-\,\frac 1 2\, (\boldsymbol x - \boldsymbol b)^T\,\mathbf A\,(\boldsymbol x - \boldsymbol b)\right)
What is its mean and covariance? You can grind through the math if you want. The mean is simply the vector
b and the covariance is simply the inverse of the matrix
A. Denoting the mean as mu and the covariance matrix as Sigma leads to the more standard form
f(\boldsymbol x) =<br />
\frac1{(2\pi)^{N/2}|\mathbf{\Sigma}|^{1/2}}\,<br />
\exp\left(-\,\frac 1 2\, (\boldsymbol x - \boldsymbol \mu)^T\,\mathbf{\Sigma}^{-1}\,(\boldsymbol x - \boldsymbol \mu)\right)