How to tell if data is normally distributed?


by jimmy1
Tags: data, distributed
jimmy1
jimmy1 is offline
#1
Jan17-07, 06:04 AM
P: 61
Is there a formal way of telling if my data is normally distributed?
I know I could plot a histogram for the data, and see if it follows a bell shaped curve, but I need something a lot more formal than this.
Is there a way to do it?
Thanks
Phys.Org News Partner Science news on Phys.org
SensaBubble: It's a bubble, but not as we know it (w/ video)
The hemihelix: Scientists discover a new shape using rubber bands (w/ video)
Microbes provide insights into evolution of human language
HallsofIvy
HallsofIvy is offline
#2
Jan17-07, 06:58 AM
Math
Emeritus
Sci Advisor
Thanks
PF Gold
P: 38,900
Try this:

http://www.itl.nist.gov/div898/handb...on2/prc213.htm

or, more generally, google on "normality tests".
EnumaElish
EnumaElish is offline
#3
Jan17-07, 09:22 AM
Sci Advisor
HW Helper
EnumaElish's Avatar
P: 2,483
See also http://en.wikipedia.org/wiki/Normality_test

Sibelius19
Sibelius19 is offline
#4
Apr14-07, 03:15 PM
P: 1

How to tell if data is normally distributed?


I know one characteristic the Normal Distribution must have is the same Mean, Mode and Median, and it can only be unimodal. I'd simply test all of these factors and see if the numbers are the same. Though, I'm not sure if they have to be exact to the tenth. For example, I think if the Mode=71, Mean=70.6, and Median=71.2, and the only mode was 71, then it would be considered normally distributed.

I know you probably already figured this out, but I'm just adding my comment if some else may have problems. Or maybe I'm completely wrong on this and someone can help me.
shehpar
shehpar is offline
#5
Oct28-09, 11:05 PM
P: 9
Quote Quote by jimmy1 View Post
Is there a formal way of telling if my data is normally distributed?
I know I could plot a histogram for the data, and see if it follows a bell shaped curve, but I need something a lot more formal than this.
Is there a way to do it?
Thanks
for normally distributed data,
skewness should be zero
kurtosis should be equal to 3

hope, it will help
statdad
statdad is offline
#6
Oct29-09, 09:05 AM
HW Helper
P: 1,344
The comments about mean=median=mode, skewness = 0, kurtosis =3, are very unlikely to hold for real data. The normal distribution is an idealized model that describes general characteristics very well, but rarely (i would argue never) is exactly correct.

The tests typically allow you to conclude that your data "isn't significantly different" than what you expect from the normal model. Histograms are decidedly poor as an aid, since too much depends on the choices for bin width (and so number of bins) and the sample size.

You might look at the Kolmogorov-Smirnoff test (http://mathworld.wolfram.com/Kolmogo...irnovTest.html)
which compares your sample's empirical distribution to a normal distribution, although it works best when you don't estimate the mean and standard deviation with the sample values.
q-q plots (quantile-quantile plots) are a useful visual tool.

what often occurs is you will see your data set resembling a normal distribution "in the middle", but problems will occur in the extremes (tails) - sadly, that's often the region in which you have the most interest.

Good luck with your investigations.
wvguy8258
wvguy8258 is offline
#7
Nov4-09, 10:32 AM
P: 50
A problem with shapiro wilks and some other tests is that they set the normal distribution as the null hypothesis and then see if the data gives a p-value low enough to reject. The reason this is an issue is because if you have a lot of data points, it is easy to reject the null of normality here. This is a bigger issue with significance testing in general, if you have a really large sample size you'll find all sorts of relationships in the data. This is one reason why people often just inspect the data visually.
statdad
statdad is offline
#8
Nov4-09, 10:57 AM
HW Helper
P: 1,344
Quote Quote by wvguy8258 View Post
A problem with shapiro wilks and some other tests is that they set the normal distribution as the null hypothesis and then see if the data gives a p-value low enough to reject. The reason this is an issue is because if you have a lot of data points, it is easy to reject the null of normality here. This is a bigger issue with significance testing in general, if you have a really large sample size you'll find all sorts of relationships in the data. This is one reason why people often just inspect the data visually.
The comment about downsides of S/W test and tests in general is valid, but while

"This is one reason why people often just inspect the data visually" may be true, it's an incredibly bad thing to do. Again, most data is "normal in the middle" with problems in the tails. With the unreliability of histograms, and with those being so commonly used, the "assumption" of normality is made more often than it should be.

"This is one reason why people should use robust methods" would be a better comment.


Register to reply

Related Discussions
Distributed computing, do you support? General Discussion 4
Unevenly distributed loads Introductory Physics Homework 3
Distributed Computing Computing & Technology 15