Food for Thought

  • Thread starter Watts
  • Start date
38
0

Main Question or Discussion Point

Assume I have a data set that I am trying to find a distribution that describes how the data is distributed. Assume I have found a function say [itex] f(x) = e^{ - x^2 } [/itex] that describes the distribution of data. Statistics tells me that my first move in doing so is to normalize this function so that [itex]\int\limits_{ - \infty }^\infty {P(x)dx} = 1 [/itex]. The common approach is to integrate [itex]\int\limits_{ - \infty }^\infty {e^{ - x^2 } dx} = \sqrt \pi [/itex] and multiply the function times the reciprocal of that integral [itex]\frac{1}{{\sqrt \pi }} \cdot \int\limits_{ - \infty }^\infty {e^{ - x^2 } dx} = 1[/itex]. But what if I can normalize it a different way say integrate the function [itex]\int\limits_{ - \infty }^\infty {e^{ - x^2 } dx} = \sqrt \pi [/itex] and place the result of that integral in the parentheses beside the variable [itex]P(x) = e^{ - (\sqrt \pi \cdot x)^2 } = e^{ - \pi \cdot x^2 }[/itex] instead of in front of the function. If you now integrate the function [itex]\int\limits_{ - \infty }^\infty {e^{ - \pi \cdot x^2 } dx} = 1[/itex] it still is equal to one. So my question is which PDF do I use? I have normalized the same function two different ways. Any thoughts on this?
 

Answers and Replies

mathman
Science Advisor
7,716
398
I think you are missing the point that the normal distribution has 2 parameters which you are not using, the mean and the variance. If you set your problem up with this in mind, you can get a best fit to the data you have, and there won't be any loose ends.

The density is Ke-(x-m)2/2V, where K is defined to make the integral=1.
 
Last edited:
38
0
Normal Distribution

The distribution I have chosen is not a normal distribution but simply a gaussian distribution. My point is that statistics has taught every body to normalize a function one single way to make it a true PDF but this doesn’t necessarily have to be the case.
 
EnumaElish
Science Advisor
Homework Helper
2,285
123
Seems to me it's an empirical question. Which of the two CDFs describe your data better? I would use the answer to this question as my selection criterion.
 
38
0
Normalization

Not what distribution is appropriate but which PDF has been normalized correctly. That my not even be the case both methods may be correct one or the other or maybe the other. I have successfully done this with other functions several that occur in several aspects of statistics and physics. All of the text shows this being done one way by integrating the function and multiplying it by the integrals reciprocal to normalize. Maybe the conventional method is wrong and you should normalize it the other way I demonstrated or maybe they are both correct. Maybe mathematics has been normalizing distributions incorrectly all along. I don’t know I am just saying maybe there is more than one way.
 
EnumaElish
Science Advisor
Homework Helper
2,285
123
An implicit proportionality assumption could justify the rote method: by dividing the integral by a constant, the proportion of "less than x" to "less than y" is preserved for all functions.

But if there is no such assumption, and the only theoretical criterion is to get the CDF tend to 1, then empirical fit seems to be an obvious tie breaker.
 
38
0
Method

Meaning the PDF (1st or 2nd) that contours to the distribution of my data the best.
 
mathman
Science Advisor
7,716
398
The distribution I have chosen is not a normal distribution but simply a gaussian distribution.
Standard definition has the term "normal distribution" synonomous with "Gaussian distribution".
 
Hurkyl
Staff Emeritus
Science Advisor
Gold Member
14,843
17
It is usually the case that when one has an unnormalized density function that the true p.d.f. is known to be proportional to it.

For example, suppose I want to pick a random point in the triangle with vertices (0, 0), (0, 1), and (1, 0), and I'm interested in X, the x-coordinate of the point I chose.

I can immediately write down an unnormalized density function for the random variable X: ρ(a) = 1 - a. This is clear because the probability of X = a is obviously proportional to the length of the vertical line segment with x-coordinate a.

The only way to normalize a density function that is known to be correct in this manner is multiplication by a constant. You'll notice that your second method of normalizing a density function does not preserve proprtionality, so it is generally inappropriate to use.
 
663
0
So this is where the error function came from?

[tex]\text{erf}(x)=\frac{2}{\sqrt{\pi}}\int_{0}^{x}e^{-x^{2}}dx[/tex]
 
38
0
Hurkyl said:
The only way to normalize a density function that is known to be correct in this manner is multiplication by a constant. You'll notice that your second method of normalizing a density function does not preserve proprtionality, so it is generally inappropriate to use.
Hurkyl


I see your point but either method reduces the unnormalized density function the same proportion of area to equal 1. If this occurs why doesn’t it preserve proportionality? In sense when you multiple by the reciprocal of area the functions height it is reduced by some quantity at the same time if I place the reciprocal area beside the variable the height is preserved but the distribution width is expanded the same proportion of area. Basically instead of squishing it your stretching it the same proportion of area.
 
Hurkyl
Staff Emeritus
Science Advisor
Gold Member
14,843
17
Let's do an example.

If the random variable X has f(x) = exp(-x²) as an unnormalized density function, then I can state things like the outcome X = 0 happens about 2.7 times more often than the outcome X = 1. We can see this by computing that f(0) / f(1) ~ 2.7.

If we normalize f by multiplying by a constant, then this ratio will remain 2.7.

However, if we use your normalization g(x) = exp(-π x²), then we have g(0) / g(1) ~ 23.1: we see that this normalization greatly distrubs the relative probability!


Now, squishing is a useful concept, but in the following way: you can make the change of variable X = √π Y, and then the random variable Y will have the normalized density function g(x).

Change of variables are used extensively, but the usual goal is to have a random variable whose mean is 0 and whose variance is 1.
 
Last edited:

Related Threads for: Food for Thought

Replies
4
Views
1K
Replies
3
Views
874
Replies
5
Views
1K
Replies
4
Views
2K
Replies
5
Views
2K
Top