# How to tell if data is normally distributed?

## Main Question or Discussion Point

Is there a formal way of telling if my data is normally distributed?
I know I could plot a histogram for the data, and see if it follows a bell shaped curve, but I need something a lot more formal than this.
Is there a way to do it?
Thanks

Related Set Theory, Logic, Probability, Statistics News on Phys.org
EnumaElish
Homework Helper

I know one characteristic the Normal Distribution must have is the same Mean, Mode and Median, and it can only be unimodal. I'd simply test all of these factors and see if the numbers are the same. Though, I'm not sure if they have to be exact to the tenth. For example, I think if the Mode=71, Mean=70.6, and Median=71.2, and the only mode was 71, then it would be considered normally distributed.

I know you probably already figured this out, but I'm just adding my comment if some else may have problems. Or maybe I'm completely wrong on this and someone can help me.

Is there a formal way of telling if my data is normally distributed?
I know I could plot a histogram for the data, and see if it follows a bell shaped curve, but I need something a lot more formal than this.
Is there a way to do it?
Thanks
for normally distributed data,
skewness should be zero
kurtosis should be equal to 3

hope, it will help

Homework Helper
The comments about mean=median=mode, skewness = 0, kurtosis =3, are very unlikely to hold for real data. The normal distribution is an idealized model that describes general characteristics very well, but rarely (i would argue never) is exactly correct.

The tests typically allow you to conclude that your data "isn't significantly different" than what you expect from the normal model. Histograms are decidedly poor as an aid, since too much depends on the choices for bin width (and so number of bins) and the sample size.

You might look at the Kolmogorov-Smirnoff test (http://mathworld.wolfram.com/Kolmogorov-SmirnovTest.html)
which compares your sample's empirical distribution to a normal distribution, although it works best when you don't estimate the mean and standard deviation with the sample values.
q-q plots (quantile-quantile plots) are a useful visual tool.

what often occurs is you will see your data set resembling a normal distribution "in the middle", but problems will occur in the extremes (tails) - sadly, that's often the region in which you have the most interest.

A problem with shapiro wilks and some other tests is that they set the normal distribution as the null hypothesis and then see if the data gives a p-value low enough to reject. The reason this is an issue is because if you have a lot of data points, it is easy to reject the null of normality here. This is a bigger issue with significance testing in general, if you have a really large sample size you'll find all sorts of relationships in the data. This is one reason why people often just inspect the data visually.