Generated random data how can I check if it is a Gaussian?

In summary, to determine if data is approximately normal, you can check for a bell curve, relative frequency histogram, or normal probability plot.
  • #1
raisin_raisin
27
0
Hi, I have used some code that generates random numbers in a Gaussian distribution.
when I plot say 100 numbers I don't get the bell curve although all the data seems to be around the correct mean value and changing the variance effects the spread.

Am I wrong to expect the bell curve or am I not plotting it right? I am not entirely sure what to put on the other axis thanks.
 
Physics news on Phys.org
  • #2
in order to obtain the bell shaped curve you expect you need to plot the relative frequencies of your sampled data. For this you have to define categories. for example [itex]I_k=(k,k+1][/itex] and then you count how many of the numbers you generated happened to fall in one of these categories and divide the result by the size of your sample; call this number [itex]n_k[/itex]. Then you can plot the points [itex]\{k+1/2, n_k\}[/itex] and you will hopefully see what you expect.

Note that you can choose the categories as you like. You will notice that you will get btter results if you work with a larger sample and smaller (and thus more) categories.
 
  • #3
Even if you follow the above suggestions you will never get a perfect bell curve with your, or any sample, whether you graph a histogram or a type of density plot (from R) to represent your data.
Visual evidence from those graphs will be, at best, as you describe: you can see the general outline of the "bell curve", and the center of your data seems to be close to the mean of the simulated distribution. From graphs alone, you will never have "proof" that your data comes from a normal distribution or whether it comes from a distribution that is "normal like" in the center but has slightly longer tails.
There are certain tests you could do to examine the hypothesis of normality, however each one has its own assumptions, drawbacks, pros and cons.
 
  • #4
raisin_raisin said:
Hi, I have used some code that generates random numbers in a Gaussian distribution.
when I plot say 100 numbers I don't get the bell curve although all the data seems to be around the correct mean value and changing the variance effects the spread.

Am I wrong to expect the bell curve or am I not plotting it right? I am not entirely sure what to put on the other axis thanks.

The typical determination of an approximately normal distribution is to do one of three things:

1. Create a relative frequency histogram and visually check to see the shape is that of a bell curve

2. Find the ratio of the interquartile range to standard deviation. If the ratio is approximately 1.34, then the data is an approximately normal distribution

3. Construct a normal probability plot for the data. If the points fall in an approximately straight line, then the data is an approximately normal distribution

Keep in mind (like statdad eluded to) that checks for normality as given above are only descriptive in nature. It is possible (but unlikely) that the data are non-normal even when the checks are reasonably satisfied. Thus, one should be careful not to claim that the data is in fact normally distributed. One can only say that it is reasonable to believe that the data are from a normal distribution.

CS
 

1. What is a Gaussian distribution?

A Gaussian distribution, also known as a normal distribution, is a type of probability distribution that is commonly used in statistics. It is characterized by a bell-shaped curve and is often used to model natural phenomena in which the data clusters around a central value with a symmetrical distribution.

2. How can I generate random data that follows a Gaussian distribution?

There are various methods for generating random data that follows a Gaussian distribution, such as using a random number generator or using a software program like MATLAB or R. These methods use mathematical algorithms to produce data points that follow the expected distribution.

3. Why is it important to check if my data is Gaussian?

It is important to check if your data follows a Gaussian distribution because many statistical tests and models assume that the data is normally distributed. If your data is not Gaussian, these tests and models may not be appropriate or accurate. Additionally, a non-Gaussian distribution may indicate that there are underlying factors or biases affecting your data.

4. How can I visually check if my data is Gaussian?

One way to visually check if your data is Gaussian is to plot a histogram of the data and see if it resembles a bell-shaped curve. You can also overlay a Gaussian curve on the histogram to compare the shape. Another method is to use a Q-Q plot, which compares the quantiles of your data to the quantiles of a theoretical Gaussian distribution. If the points fall along a straight line, it is an indication that the data is normally distributed.

5. Are there statistical tests to determine if my data is Gaussian?

Yes, there are several statistical tests that can be used to determine if your data follows a Gaussian distribution. These include the Kolmogorov-Smirnov test, the Shapiro-Wilk test, and the Anderson-Darling test. These tests compare your data to the expected distribution and provide a p-value that indicates the likelihood of your data being Gaussian. However, it is important to note that these tests may only be valid for large sample sizes and may not be accurate for small sample sizes.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
312
  • STEM Educators and Teaching
Replies
5
Views
633
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
8K
  • Programming and Computer Science
Replies
1
Views
628
  • Set Theory, Logic, Probability, Statistics
2
Replies
41
Views
4K
Back
Top