Normalizing a PDF: f(x) = e^{ - x^2 }

  • Thread starter Thread starter Watts
  • Start date Start date
  • Tags Tags
    Food
Click For Summary
SUMMARY

The discussion centers on the normalization of the Gaussian function f(x) = e^{-x^2} to create a probability density function (PDF). Two methods of normalization are presented: the conventional method, which involves multiplying by the reciprocal of the integral result \(\sqrt{\pi}\), and an alternative method that places the integral result within the function itself, resulting in P(x) = e^{-\pi x^2}. Participants debate the implications of these methods on the accuracy of the PDF and the importance of empirical fitting to the data. The consensus emphasizes that while both methods can yield a valid PDF, the choice should depend on which better describes the data distribution.

PREREQUISITES
  • Understanding of Gaussian distributions and their properties
  • Familiarity with probability density functions (PDFs)
  • Knowledge of integration techniques in calculus
  • Basic statistics concepts, including mean and variance
NEXT STEPS
  • Research the properties of Gaussian distributions and their applications in statistics
  • Learn about the normalization of probability density functions and its implications
  • Explore empirical fitting techniques for statistical data analysis
  • Study the error function and its role in probability and statistics
USEFUL FOR

Statisticians, data scientists, and researchers involved in data analysis and modeling who seek to understand the nuances of normalizing distributions and selecting appropriate probability density functions for their datasets.

Watts
Messages
37
Reaction score
0
Assume I have a data set that I am trying to find a distribution that describes how the data is distributed. Assume I have found a function say f(x) = e^{ - x^2 } that describes the distribution of data. Statistics tells me that my first move in doing so is to normalize this function so that \int\limits_{ - \infty }^\infty {P(x)dx} = 1. The common approach is to integrate \int\limits_{ - \infty }^\infty {e^{ - x^2 } dx} = \sqrt \pi and multiply the function times the reciprocal of that integral \frac{1}{{\sqrt \pi }} \cdot \int\limits_{ - \infty }^\infty {e^{ - x^2 } dx} = 1. But what if I can normalize it a different way say integrate the function \int\limits_{ - \infty }^\infty {e^{ - x^2 } dx} = \sqrt \pi and place the result of that integral in the parentheses beside the variable P(x) = e^{ - (\sqrt \pi \cdot x)^2 } = e^{ - \pi \cdot x^2 } instead of in front of the function. If you now integrate the function \int\limits_{ - \infty }^\infty {e^{ - \pi \cdot x^2 } dx} = 1 it still is equal to one. So my question is which PDF do I use? I have normalized the same function two different ways. Any thoughts on this?
 
Physics news on Phys.org
I think you are missing the point that the normal distribution has 2 parameters which you are not using, the mean and the variance. If you set your problem up with this in mind, you can get a best fit to the data you have, and there won't be any loose ends.

The density is Ke-(x-m)2/2V, where K is defined to make the integral=1.
 
Last edited:
Normal Distribution

The distribution I have chosen is not a normal distribution but simply a gaussian distribution. My point is that statistics has taught every body to normalize a function one single way to make it a true PDF but this doesn’t necessarily have to be the case.
 
Seems to me it's an empirical question. Which of the two CDFs describe your data better? I would use the answer to this question as my selection criterion.
 
Normalization

Not what distribution is appropriate but which PDF has been normalized correctly. That my not even be the case both methods may be correct one or the other or maybe the other. I have successfully done this with other functions several that occur in several aspects of statistics and physics. All of the text shows this being done one way by integrating the function and multiplying it by the integrals reciprocal to normalize. Maybe the conventional method is wrong and you should normalize it the other way I demonstrated or maybe they are both correct. Maybe mathematics has been normalizing distributions incorrectly all along. I don’t know I am just saying maybe there is more than one way.
 
An implicit proportionality assumption could justify the rote method: by dividing the integral by a constant, the proportion of "less than x" to "less than y" is preserved for all functions.

But if there is no such assumption, and the only theoretical criterion is to get the CDF tend to 1, then empirical fit seems to be an obvious tie breaker.
 
Method

Meaning the PDF (1st or 2nd) that contours to the distribution of my data the best.
 
The distribution I have chosen is not a normal distribution but simply a gaussian distribution.

Standard definition has the term "normal distribution" synonomous with "Gaussian distribution".
 
It is usually the case that when one has an unnormalized density function that the true p.d.f. is known to be proportional to it.

For example, suppose I want to pick a random point in the triangle with vertices (0, 0), (0, 1), and (1, 0), and I'm interested in X, the x-coordinate of the point I chose.

I can immediately write down an unnormalized density function for the random variable X: ρ(a) = 1 - a. This is clear because the probability of X = a is obviously proportional to the length of the vertical line segment with x-coordinate a.

The only way to normalize a density function that is known to be correct in this manner is multiplication by a constant. You'll notice that your second method of normalizing a density function does not preserve proprtionality, so it is generally inappropriate to use.
 
  • #10
So this is where the error function came from?

\text{erf}(x)=\frac{2}{\sqrt{\pi}}\int_{0}^{x}e^{-x^{2}}dx
 
  • #11
Hurkyl said:
The only way to normalize a density function that is known to be correct in this manner is multiplication by a constant. You'll notice that your second method of normalizing a density function does not preserve proprtionality, so it is generally inappropriate to use.

Hurkyl


I see your point but either method reduces the unnormalized density function the same proportion of area to equal 1. If this occurs why doesn’t it preserve proportionality? In sense when you multiple by the reciprocal of area the functions height it is reduced by some quantity at the same time if I place the reciprocal area beside the variable the height is preserved but the distribution width is expanded the same proportion of area. Basically instead of squishing it your stretching it the same proportion of area.
 
  • #12
Let's do an example.

If the random variable X has f(x) = exp(-x²) as an unnormalized density function, then I can state things like the outcome X = 0 happens about 2.7 times more often than the outcome X = 1. We can see this by computing that f(0) / f(1) ~ 2.7.

If we normalize f by multiplying by a constant, then this ratio will remain 2.7.

However, if we use your normalization g(x) = exp(-π x²), then we have g(0) / g(1) ~ 23.1: we see that this normalization greatly distrubs the relative probability!


Now, squishing is a useful concept, but in the following way: you can make the change of variable X = √π Y, and then the random variable Y will have the normalized density function g(x).

Change of variables are used extensively, but the usual goal is to have a random variable whose mean is 0 and whose variance is 1.
 
Last edited:

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 25 ·
Replies
25
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K