Mahalanobis Distance using Eigen-Values of the Covariance Matrix

Click For Summary
SUMMARY

The discussion centers on the Mahalanobis Distance formula, specifically its simplification using Eigen-value decomposition (EVD) of the covariance matrix. The simplified expression fails to yield the correct Mahalanobis Distance when the covariance matrix has zero eigen-values, indicating it is not positive definite or negative definite. Users are advised to consider Principal Component Analysis (PCA) or remove problematic variables to address issues with singular or ill-conditioned covariance matrices.

PREREQUISITES
  • Understanding of Mahalanobis Distance and its formula
  • Familiarity with Eigen-value decomposition (EVD)
  • Knowledge of covariance matrices and their properties
  • Experience with Principal Component Analysis (PCA)
NEXT STEPS
  • Explore the implications of singular covariance matrices in statistical analysis
  • Learn about Principal Component Analysis (PCA) for dimensionality reduction
  • Investigate methods for handling zero eigen-values in covariance matrices
  • Study the gradient calculation of Mahalanobis Distance with respect to projections
USEFUL FOR

Statisticians, data scientists, and machine learning practitioners dealing with multivariate data analysis and covariance matrix issues.

orajput
Messages
1
Reaction score
0
Given the formula of Mahalanobis Distance:

D^2_M = (\mathbf{x} - \mathbf{\mu})^T \mathbf{S}^{-1} (\mathbf{x} - \mathbf{\mu})

If I simplify the above expression using Eigen-value decomposition (EVD) of the Covariance Matrix:

S = \mathbf{P} \Lambda \mathbf{P}^T

Then,

D^2_M = (\mathbf{x} - \mathbf{\mu})^T \mathbf{P} \Lambda^{-1} \mathbf{P}^T (\mathbf{x} - \mathbf{\mu})

Let, the projections of (\mathbf{x}-\mu) on all eigen-vectors present in \mathbf{P} be \mathbf{b}, then:

\mathbf{b} = \mathbf{P}^T(\mathbf{x} - \mathbf{\mu})

And,

D^2_M = \mathbf{b}^T \Lambda^{-1} \mathbf{b}

D^2_M = \sum_i{\frac{b^2_i}{\lambda_i}}

The problem that I am facing right now is as follows:

The covariance matrix \mathbf{S} is calculated on a dataset, in which no. of observations are less than the no. of variables. This causes some zero-valued eigen-values after EVD of \mathbf{S}.

In these cases the above simplified expression does not result in the same Mahalanobis Distance as the original expression, i.e.:

(\mathbf{x} - \mathbf{\mu})^T \mathbf{S}^{-1} (\mathbf{x} - \mathbf{\mu}) \neq \sum_i{\frac{b^2_i}{\lambda_i}} (for non-zero \lambda_i)

My question is: Is the simplified expression still functionally represents the Mahalanobis Distance?

P.S.: Motivation to use the simplified expression of Mahalanbis Distance is to calculate its gradient wrt b.
 
Physics news on Phys.org
Hello,

In order to be invertible, S mustn't have zero eigen values, that is , must be positive definite or negative definite. Apart from that , that expression must work...

All the best

GoodSpirit
 
Hey orajput and welcome to the forums.

For your problem, if you do have a singular or ill-conditioned covariance matrix, I would try and do something like Principal Components, or to remove the offending variable from your system and re-do the analysis.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 38 ·
2
Replies
38
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K