Intuitive explanations for Gaussian distribution function and mahalanobis distance

In summary, the multivariate Gaussian distribution function is a formula for the univariate Gaussian distribution with multiple variables, and the Mahalanobis distance is a measure of the distance between two sets.
  • #1
jjepsuomi
3
0
Hello

I was wondering If anyone could give intuitive explanations for the multivariate Gaussian distribution function and mahalanobis distance? My professor didn't explain these in probability class, they were only defined...

Where did the formula come from? Why is the Gaussian function the way it is? Is there a way to intuitively explain mahalanobis distance?

Thank you for any support
 
Last edited:
Physics news on Phys.org
  • #2


jjepsuomi said:
Where did the formula come from? Why is the Gaussian function the way it is? Is there a way to intuitively explain mahalanobis distance?

Do you understand the 1-dimensional case? Do you know about the use of "Z-scores" in statistics?

I myself don't have a good understanding of why the 1-dimensional Gaussian distribution has its particular formula - to get that intuition, one would need to study why the gaussian shape is a limiting shape for binomial distributions You could get intuition in the way that a person who has inutition for algebra and calculus gets inutition. As to the man-in-the-street type of intuition, have you seen those demonstrations where balls are dropped into a pyramid of dividers and land in a bell shaped pattern?
 
  • #3


Answers to both of your questions are: NO
I know what Z-score is "= How many standard deviations from the mean" but that's all I know about it.

And no I have not seen the pyramid divider, I'll try to look for it. Maybe you know where to look? Thank you! =)
 
  • #4


The video of a conceptual normal distribution machine is clearer than the videos of the physical machines that I found.
 
Last edited by a moderator:
  • #5


jjepsuomi said:
I know what Z-score is "= How many standard deviations from the mean" but that's all I know about it.

People measuring things that have normal distributions would work with a variety of units. They might be measuring things in dollars, Ohms, grams etc. It would be inconvenient to use different formulae for each different unit of measure. The use of the Z-score is a way to convert all such measurements to a standard reference whose units are dimensionless.

Think about what you must do to convert two histograms to the "same scale". If one is measuring dollars on the x-axis and the other is measuring Ohms, there is no law of economics or physics that establishes a definite relation between dollars and Ohms. So you can't convert the measurements to a common unit. Even if distribution were both measuring dollars on the x-axis, there is no law of that tells you that $1000 dollars and $5000 dollars are measurements "a long ways" apart. If you're talking about car prices, they might be. If you're talking about the national GNP, they aren't.

Next, think about how you would convert two 2-dimensional histograms to the same scale. Suppose one histogram has measurement of ordered pairs (weight of person, blood glucose level of person) and the other has measurements of (mileage on car, price of car).
 
  • #6


The roots of the gaussian distribution lies in the method of least squares, used widely in the 17th and 18th century for navigation, astronomy. To explain the method of least squares, arises the Gaussian distribution.
 
  • #7


jjepsuomi said:
Hello

I was wondering If anyone could give intuitive explanations for the multivariate Gaussian distribution function and mahalanobis distance? My professor didn't explain these in probability class, they were only defined...

Where did the formula come from? Why is the Gaussian function the way it is? Is there a way to intuitively explain mahalanobis distance?

Thank you for any support


A Gaussian dist arises when you have the sum of a large number of independent factors, all of which are small. It is the limit as the number of factors become infinitely large, and the size of each factor becomes infinitely small. In actual practice many situations converge to this limit quite quickly.

The basic Gaussian is one dimensional, but it is possible to have any number of dimensions. You can think of a large number of tiny insects flying around at random in an empty space.
 
  • #8


jjepsuomi said:
Hello

I was wondering If anyone could give intuitive explanations for the multivariate Gaussian distribution function and mahalanobis distance? My professor didn't explain these in probability class, they were only defined...

Where did the formula come from? Why is the Gaussian function the way it is? Is there a way to intuitively explain mahalanobis distance?

Thank you for any support
The probability density function for the multivariate case of the Gaussian distribution is basically the formula for the univariate Gaussian distribution expanded to multiple variables.

You can see that [itex]f_{\mathbf x}(x_1,\ldots,x_k)\, =
\frac{1}{(2\pi)^{k/2}|\boldsymbol\Sigma|^{1/2}}
\exp\left(-\frac{1}{2}({\mathbf x}-{\boldsymbol\mu})^T{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu})
\right)[/itex] becomes [itex]f(x) = \frac{1}{\sigma \sqrt{2\pi}}\exp\left(-\frac{1}{2} (\frac{x-\mu}{\sigma})^2\right)[/itex] if Ʃ is a real number (1 X 1 matrix).

Mahalanobis distance is Mahalanobis distance. Tautology is always intuitive. :tongue:
 
  • #9


jjepsuomi said:
Hello

I was wondering If anyone could give intuitive explanations for the multivariate Gaussian distribution function and mahalanobis distance? My professor didn't explain these in probability class, they were only defined...

Where did the formula come from? Why is the Gaussian function the way it is? Is there a way to intuitively explain mahalanobis distance?

Thank you for any support

Imagine a swarm of insects. Choose one insect. How can you tell whether or not it is in the swarm? The first thing you do is find the distance of that insect from the center of the swarm. Then you consider the standard deviation of the swarm, which shows how big the swarm is. If the swarm is big then the insect does not need to be all that close to the center. That's the mahalanobis distance. So if the insect is more than three standard deviations from the center then the insect is very likely not in the swarm.
 

1. What is a Gaussian distribution function?

A Gaussian distribution function, also known as a normal distribution, is a commonly observed probability distribution that follows the shape of a bell curve. It is often used to model natural phenomena and represents a large number of small, random events that add up to a larger value.

2. How is the Gaussian distribution function related to the Mahalanobis distance?

The Mahalanobis distance is a measure of the distance between a data point and the center of a multivariate dataset. It takes into account the correlations between different variables, making it a more accurate measure of distance than Euclidean distance. The Gaussian distribution function is used in calculating the Mahalanobis distance, as it provides a measure of the probability of a data point being at a certain distance from the center of the dataset.

3. What does the shape of the Gaussian distribution function tell us about the data?

The shape of the Gaussian distribution function can tell us about the mean and standard deviation of the data. The peak of the curve represents the mean, while the width of the curve represents the standard deviation. A wider curve indicates a larger standard deviation, meaning that the data is more spread out, while a narrower curve indicates a smaller standard deviation and a more concentrated dataset.

4. How is the Gaussian distribution function used in statistics?

The Gaussian distribution function is used in statistics to model and analyze data. It is the foundation of many statistical methods, such as hypothesis testing, confidence intervals, and linear regression. It is also used in machine learning and data science to represent and analyze data, as well as to make predictions.

5. What are some real-world applications of the Gaussian distribution function and Mahalanobis distance?

The Gaussian distribution function and Mahalanobis distance have many applications in various fields, including finance, engineering, and biology. In finance, they are used to model stock prices and portfolio risk. In engineering, they are used to analyze and improve product quality. In biology, they are used to study population growth and genetic variation. They are also commonly used in data analysis and machine learning to identify anomalies and make accurate predictions.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
5K
  • Programming and Computer Science
Replies
2
Views
680
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Biology and Medical
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Other Physics Topics
Replies
1
Views
1K
Back
Top