Understanding Mahalanobis distance

In summary, the Mahalanobis distance is a metric used in statistics to measure the distance between two points in a multi-dimensional space. It is particularly useful for data with different scales and types, as it takes into account variance and correlation. Its main advantage is that it transforms the data into a more meaningful representation. It is related to ellipsoids and is commonly used in pattern recognition. Resources for understanding the Mahalanobis distance include intuitive explanations and mathematical derivations.
  • #1
Avatrin
245
6
I am currently taking a course in pattern recognition, and several times I have encountered the multivariable normal distribution and thus, Mahalanobis distance. I want to understand Mahalanobis distance; Primarily for understanding the normal distribution, but also to understand the measure itself.

I have read several intuitive explanations here and in books, but how can I do this rigorously? I have had some, but not much, measure theory (in the part of a real analysis course that covered integration theory). I have had one introductory statistics and probability theory course.

What should I read to understand the Mahalanobis distance? Clearly it is related to ellipoids, but how? Again, I don't want a rough intuitive explanation.
 
Physics news on Phys.org
  • #3
MarneMath said:
I'm not really sure what else you're looking for with regards to this. It's simply a metric.
Well, I did write I want to understand it in regard to the N dimensional normal distribution. According to Wikipedia, it tells me how many standard deviations a point x is from the mean of the deviation. The books on pattern recognition don't even really mention it.

So, I guess, what I am looking for is the property of the metric. Why is it used in statistics rather than some other metric?
 
  • #4
Imagine you have units on different scales, then the euclidian distance doesn't really make sense, since you're simply adding the squared units of that measurement. So if it's all the same, then we're good. However, if you have units in different scales and types, the idea of distance becomes a bit more complicated. In fact, I don't really like to think about the Mahalonbis distance as a distance but rather a measurement of intensity.

The number one answer here does a good job of explaining how Mahalonbis does a good job at transforming the data into something reasonable: http://stats.stackexchange.com/questions/62092/bottom-to-top-explanation-of-the-mahalanobis-distance

Overall, though the main advantages are that it considers variance, covariances and unitizes uncorrelated variables for the Euclidian distance.
 

1. What is Mahalanobis distance?

Mahalanobis distance is a statistical measure of the distance between a point and a distribution. It takes into account the covariance and correlation of the variables, making it a more accurate measure than Euclidean distance.

2. How is Mahalanobis distance used in data analysis?

Mahalanobis distance is used to identify outliers, or data points that are significantly different from the rest of the dataset. It can also be used for clustering and classification in machine learning algorithms.

3. What are the advantages of using Mahalanobis distance?

One major advantage of Mahalanobis distance is that it can handle correlated variables, which is a common issue in data analysis. It also takes into account the variability of the data, making it a more accurate measure compared to other distance metrics.

4. How do you calculate Mahalanobis distance?

Mahalanobis distance is calculated by taking the difference between a point and the mean of the distribution, dividing it by the covariance matrix, and then taking the square root of the result.

5. Can Mahalanobis distance be used with any type of data?

Yes, Mahalanobis distance can be used with any type of data, including continuous, categorical, and mixed data. It is a versatile metric that can handle different types of variables and distributions.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
987
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
735
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
732
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
3K
Back
Top