Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Food for Thought

  1. Sep 7, 2005 #1
    Assume I have a data set that I am trying to find a distribution that describes how the data is distributed. Assume I have found a function say [itex] f(x) = e^{ - x^2 } [/itex] that describes the distribution of data. Statistics tells me that my first move in doing so is to normalize this function so that [itex]\int\limits_{ - \infty }^\infty {P(x)dx} = 1 [/itex]. The common approach is to integrate [itex]\int\limits_{ - \infty }^\infty {e^{ - x^2 } dx} = \sqrt \pi [/itex] and multiply the function times the reciprocal of that integral [itex]\frac{1}{{\sqrt \pi }} \cdot \int\limits_{ - \infty }^\infty {e^{ - x^2 } dx} = 1[/itex]. But what if I can normalize it a different way say integrate the function [itex]\int\limits_{ - \infty }^\infty {e^{ - x^2 } dx} = \sqrt \pi [/itex] and place the result of that integral in the parentheses beside the variable [itex]P(x) = e^{ - (\sqrt \pi \cdot x)^2 } = e^{ - \pi \cdot x^2 }[/itex] instead of in front of the function. If you now integrate the function [itex]\int\limits_{ - \infty }^\infty {e^{ - \pi \cdot x^2 } dx} = 1[/itex] it still is equal to one. So my question is which PDF do I use? I have normalized the same function two different ways. Any thoughts on this?
     
  2. jcsd
  3. Sep 7, 2005 #2

    mathman

    User Avatar
    Science Advisor
    Gold Member

    I think you are missing the point that the normal distribution has 2 parameters which you are not using, the mean and the variance. If you set your problem up with this in mind, you can get a best fit to the data you have, and there won't be any loose ends.

    The density is Ke-(x-m)2/2V, where K is defined to make the integral=1.
     
    Last edited: Sep 7, 2005
  4. Sep 7, 2005 #3
    Normal Distribution

    The distribution I have chosen is not a normal distribution but simply a gaussian distribution. My point is that statistics has taught every body to normalize a function one single way to make it a true PDF but this doesn’t necessarily have to be the case.
     
  5. Sep 7, 2005 #4

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    Seems to me it's an empirical question. Which of the two CDFs describe your data better? I would use the answer to this question as my selection criterion.
     
  6. Sep 7, 2005 #5
    Normalization

    Not what distribution is appropriate but which PDF has been normalized correctly. That my not even be the case both methods may be correct one or the other or maybe the other. I have successfully done this with other functions several that occur in several aspects of statistics and physics. All of the text shows this being done one way by integrating the function and multiplying it by the integrals reciprocal to normalize. Maybe the conventional method is wrong and you should normalize it the other way I demonstrated or maybe they are both correct. Maybe mathematics has been normalizing distributions incorrectly all along. I don’t know I am just saying maybe there is more than one way.
     
  7. Sep 8, 2005 #6

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    An implicit proportionality assumption could justify the rote method: by dividing the integral by a constant, the proportion of "less than x" to "less than y" is preserved for all functions.

    But if there is no such assumption, and the only theoretical criterion is to get the CDF tend to 1, then empirical fit seems to be an obvious tie breaker.
     
  8. Sep 8, 2005 #7
    Method

    Meaning the PDF (1st or 2nd) that contours to the distribution of my data the best.
     
  9. Sep 8, 2005 #8

    mathman

    User Avatar
    Science Advisor
    Gold Member

    Standard definition has the term "normal distribution" synonomous with "Gaussian distribution".
     
  10. Sep 8, 2005 #9

    Hurkyl

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    It is usually the case that when one has an unnormalized density function that the true p.d.f. is known to be proportional to it.

    For example, suppose I want to pick a random point in the triangle with vertices (0, 0), (0, 1), and (1, 0), and I'm interested in X, the x-coordinate of the point I chose.

    I can immediately write down an unnormalized density function for the random variable X: ρ(a) = 1 - a. This is clear because the probability of X = a is obviously proportional to the length of the vertical line segment with x-coordinate a.

    The only way to normalize a density function that is known to be correct in this manner is multiplication by a constant. You'll notice that your second method of normalizing a density function does not preserve proprtionality, so it is generally inappropriate to use.
     
  11. Sep 8, 2005 #10
    So this is where the error function came from?

    [tex]\text{erf}(x)=\frac{2}{\sqrt{\pi}}\int_{0}^{x}e^{-x^{2}}dx[/tex]
     
  12. Sep 9, 2005 #11
    Hurkyl


    I see your point but either method reduces the unnormalized density function the same proportion of area to equal 1. If this occurs why doesn’t it preserve proportionality? In sense when you multiple by the reciprocal of area the functions height it is reduced by some quantity at the same time if I place the reciprocal area beside the variable the height is preserved but the distribution width is expanded the same proportion of area. Basically instead of squishing it your stretching it the same proportion of area.
     
  13. Sep 9, 2005 #12

    Hurkyl

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Let's do an example.

    If the random variable X has f(x) = exp(-x²) as an unnormalized density function, then I can state things like the outcome X = 0 happens about 2.7 times more often than the outcome X = 1. We can see this by computing that f(0) / f(1) ~ 2.7.

    If we normalize f by multiplying by a constant, then this ratio will remain 2.7.

    However, if we use your normalization g(x) = exp(-π x²), then we have g(0) / g(1) ~ 23.1: we see that this normalization greatly distrubs the relative probability!


    Now, squishing is a useful concept, but in the following way: you can make the change of variable X = √π Y, and then the random variable Y will have the normalized density function g(x).

    Change of variables are used extensively, but the usual goal is to have a random variable whose mean is 0 and whose variance is 1.
     
    Last edited: Sep 9, 2005
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?