Thread Closed

How to tell if data is normally distributed?

 
Share Thread Thread Tools
Jan17-07, 06:04 AM   #1
 

How to tell if data is normally distributed?


Is there a formal way of telling if my data is normally distributed?
I know I could plot a histogram for the data, and see if it follows a bell shaped curve, but I need something a lot more formal than this.
Is there a way to do it?
Thanks
 
PhysOrg.com
PhysOrg
science news on PhysOrg.com

>> 'Whodunnit' of Irish potato famine solved
>> The mammoth's lament: Study shows how cosmic impact sparked devastating climate change
>> Curiosity Mars rover drills second rock target
Jan17-07, 06:58 AM   #2
 
Recognitions:
Gold Membership Gold Member
Science Advisor Science Advisor
Retired Staff Staff Emeritus
Try this:

http://www.itl.nist.gov/div898/handb...on2/prc213.htm

or, more generally, google on "normality tests".
 
Jan17-07, 09:22 AM   #3
 
Recognitions:
Homework Helper Homework Help
Science Advisor Science Advisor
See also http://en.wikipedia.org/wiki/Normality_test
 
Apr14-07, 03:15 PM   #4
 

How to tell if data is normally distributed?


I know one characteristic the Normal Distribution must have is the same Mean, Mode and Median, and it can only be unimodal. I'd simply test all of these factors and see if the numbers are the same. Though, I'm not sure if they have to be exact to the tenth. For example, I think if the Mode=71, Mean=70.6, and Median=71.2, and the only mode was 71, then it would be considered normally distributed.

I know you probably already figured this out, but I'm just adding my comment if some else may have problems. Or maybe I'm completely wrong on this and someone can help me.
 
Oct28-09, 11:05 PM   #5
 
Quote by jimmy1 View Post
Is there a formal way of telling if my data is normally distributed?
I know I could plot a histogram for the data, and see if it follows a bell shaped curve, but I need something a lot more formal than this.
Is there a way to do it?
Thanks
for normally distributed data,
skewness should be zero
kurtosis should be equal to 3

hope, it will help
 
Oct29-09, 09:05 AM   #6
 
Recognitions:
Homework Helper Homework Help
The comments about mean=median=mode, skewness = 0, kurtosis =3, are very unlikely to hold for real data. The normal distribution is an idealized model that describes general characteristics very well, but rarely (i would argue never) is exactly correct.

The tests typically allow you to conclude that your data "isn't significantly different" than what you expect from the normal model. Histograms are decidedly poor as an aid, since too much depends on the choices for bin width (and so number of bins) and the sample size.

You might look at the Kolmogorov-Smirnoff test (http://mathworld.wolfram.com/Kolmogo...irnovTest.html)
which compares your sample's empirical distribution to a normal distribution, although it works best when you don't estimate the mean and standard deviation with the sample values.
q-q plots (quantile-quantile plots) are a useful visual tool.

what often occurs is you will see your data set resembling a normal distribution "in the middle", but problems will occur in the extremes (tails) - sadly, that's often the region in which you have the most interest.

Good luck with your investigations.
 
Nov4-09, 10:32 AM   #7
 
A problem with shapiro wilks and some other tests is that they set the normal distribution as the null hypothesis and then see if the data gives a p-value low enough to reject. The reason this is an issue is because if you have a lot of data points, it is easy to reject the null of normality here. This is a bigger issue with significance testing in general, if you have a really large sample size you'll find all sorts of relationships in the data. This is one reason why people often just inspect the data visually.
 
Nov4-09, 10:57 AM   #8
 
Recognitions:
Homework Helper Homework Help
Quote by wvguy8258 View Post
A problem with shapiro wilks and some other tests is that they set the normal distribution as the null hypothesis and then see if the data gives a p-value low enough to reject. The reason this is an issue is because if you have a lot of data points, it is easy to reject the null of normality here. This is a bigger issue with significance testing in general, if you have a really large sample size you'll find all sorts of relationships in the data. This is one reason why people often just inspect the data visually.
The comment about downsides of S/W test and tests in general is valid, but while

"This is one reason why people often just inspect the data visually" may be true, it's an incredibly bad thing to do. Again, most data is "normal in the middle" with problems in the tails. With the unreliability of histograms, and with those being so commonly used, the "assumption" of normality is made more often than it should be.

"This is one reason why people should use robust methods" would be a better comment.
 
Thread Closed
Thread Tools


Similar Threads for: How to tell if data is normally distributed?
Thread Forum Replies
Distributed computing, do you support? General Discussion 4
Unevenly distributed loads Introductory Physics Homework 3
Distributed Computing Beyond the Standard Model 0
Realize professional data management with TechRepublic's Guide to Data Storage Design Computing & Technology 0
Distributed Computing Computing & Technology 15