Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Generated random data how can I check if it is a Gaussian?

  1. Oct 15, 2008 #1
    Hi, I have used some code that generates random numbers in a Gaussian distribution.
    when I plot say 100 numbers I don't get the bell curve although all the data seems to be around the correct mean value and changing the variance effects the spread.

    Am I wrong to expect the bell curve or am I not plotting it right? I am not entirely sure what to put on the other axis thanks.
  2. jcsd
  3. Oct 15, 2008 #2
    in order to obtain the bell shaped curve you expect you need to plot the relative frequencies of your sampled data. For this you have to define categories. for example [itex]I_k=(k,k+1][/itex] and then you count how many of the numbers you generated happened to fall in one of these categories and divide the result by the size of your sample; call this number [itex]n_k[/itex]. Then you can plot the points [itex]\{k+1/2, n_k\}[/itex] and you will hopefully see what you expect.

    Note that you can choose the categories as you like. You will notice that you will get btter results if you work with a larger sample and smaller (and thus more) categories.
  4. Oct 15, 2008 #3


    User Avatar
    Homework Helper

    Even if you follow the above suggestions you will never get a perfect bell curve with your, or any sample, whether you graph a histogram or a type of density plot (from R) to represent your data.
    Visual evidence from those graphs will be, at best, as you describe: you can see the general outline of the "bell curve", and the center of your data seems to be close to the mean of the simulated distribution. From graphs alone, you will never have "proof" that your data comes from a normal distribution or whether it comes from a distribution that is "normal like" in the center but has slightly longer tails.
    There are certain tests you could do to examine the hypothesis of normality, however each one has its own assumptions, drawbacks, pros and cons.
  5. Oct 15, 2008 #4


    User Avatar
    Science Advisor

    The typical determination of an approximately normal distribution is to do one of three things:

    1. Create a relative frequency histogram and visually check to see the shape is that of a bell curve

    2. Find the ratio of the interquartile range to standard deviation. If the ratio is approximately 1.34, then the data is an approximately normal distribution

    3. Construct a normal probability plot for the data. If the points fall in an approximately straight line, then the data is an approximately normal distribution

    Keep in mind (like statdad eluded to) that checks for normality as given above are only descriptive in nature. It is possible (but unlikely) that the data are non-normal even when the checks are reasonably satisfied. Thus, one should be careful not to claim that the data is in fact normally distributed. One can only say that it is reasonable to believe that the data are from a normal distribution.

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook