Mahalanobis Distance using Eigen-Values of the Covariance Matrix

In summary, if the covariance matrix has zero eigenvalues, the simplified expression for Mahalanobis Distance may not accurately represent the original expression. To address this, one can try using Principal Components or removing the problematic variable from the dataset.
  • #1
orajput
1
0
Given the formula of Mahalanobis Distance:

[itex]D^2_M = (\mathbf{x} - \mathbf{\mu})^T \mathbf{S}^{-1} (\mathbf{x} - \mathbf{\mu})[/itex]

If I simplify the above expression using Eigen-value decomposition (EVD) of the Covariance Matrix:

[itex]S = \mathbf{P} \Lambda \mathbf{P}^T[/itex]

Then,

[itex]D^2_M = (\mathbf{x} - \mathbf{\mu})^T \mathbf{P} \Lambda^{-1} \mathbf{P}^T (\mathbf{x} - \mathbf{\mu})[/itex]

Let, the projections of [itex](\mathbf{x}-\mu)[/itex] on all eigen-vectors present in [itex]\mathbf{P}[/itex] be [itex]\mathbf{b}[/itex], then:

[itex]\mathbf{b} = \mathbf{P}^T(\mathbf{x} - \mathbf{\mu})[/itex]

And,

[itex]D^2_M = \mathbf{b}^T \Lambda^{-1} \mathbf{b}[/itex]

[itex]D^2_M = \sum_i{\frac{b^2_i}{\lambda_i}}[/itex]

The problem that I am facing right now is as follows:

The covariance matrix [itex]\mathbf{S}[/itex] is calculated on a dataset, in which no. of observations are less than the no. of variables. This causes some zero-valued eigen-values after EVD of [itex]\mathbf{S}[/itex].

In these cases the above simplified expression does not result in the same Mahalanobis Distance as the original expression, i.e.:

[itex](\mathbf{x} - \mathbf{\mu})^T \mathbf{S}^{-1} (\mathbf{x} - \mathbf{\mu}) \neq \sum_i{\frac{b^2_i}{\lambda_i}}[/itex] (for non-zero [itex]\lambda_i[/itex])

My question is: Is the simplified expression still functionally represents the Mahalanobis Distance?

P.S.: Motivation to use the simplified expression of Mahalanbis Distance is to calculate its gradient wrt [itex]b[/itex].
 
Physics news on Phys.org
  • #2
Hello,

In order to be invertible, S mustn't have zero eigen values, that is , must be positive definite or negative definite. Apart from that , that expression must work...

All the best

GoodSpirit
 
  • #3
Hey orajput and welcome to the forums.

For your problem, if you do have a singular or ill-conditioned covariance matrix, I would try and do something like Principal Components, or to remove the offending variable from your system and re-do the analysis.
 

1. What is Mahalanobis Distance and how is it calculated?

Mahalanobis Distance is a measure of the distance between two points in a multivariate space. It takes into account the correlation between variables and the variability of each variable. It is calculated by taking the square root of the sum of the squared differences between the values of each variable, multiplied by the inverse of the covariance matrix.

2. How does Mahalanobis Distance differ from other distance measures?

Mahalanobis Distance differs from other distance measures because it takes into account the correlation between variables, while other measures such as Euclidean Distance do not. This makes Mahalanobis Distance more suitable for datasets with correlated variables.

3. What are the applications of Mahalanobis Distance?

Mahalanobis Distance has various applications in fields such as statistics, machine learning, and pattern recognition. It is commonly used for outlier detection, clustering analysis, and classification problems.

4. What is the role of eigenvalues in calculating Mahalanobis Distance?

Eigenvalues represent the amount of variability in a dataset. In the calculation of Mahalanobis Distance, the eigenvalues of the covariance matrix are used to determine the weights of each variable in the distance calculation. This allows for a more accurate measure of distance in datasets with varying levels of variability.

5. How is Mahalanobis Distance used in real-world scenarios?

Mahalanobis Distance is used in a variety of real-world scenarios, such as in finance for fraud detection, in healthcare for disease diagnosis, and in marketing for customer segmentation. It is also used in image and speech recognition, as well as in anomaly detection in network traffic data.

Similar threads

  • Linear and Abstract Algebra
Replies
1
Views
928
  • High Energy, Nuclear, Particle Physics
Replies
3
Views
880
  • Linear and Abstract Algebra
Replies
1
Views
1K
  • Linear and Abstract Algebra
Replies
8
Views
2K
  • Classical Physics
Replies
1
Views
1K
  • Differential Equations
Replies
2
Views
1K
  • Introductory Physics Homework Help
Replies
10
Views
926
Replies
2
Views
1K
Replies
5
Views
1K
  • Advanced Physics Homework Help
2
Replies
44
Views
3K
Back
Top