How to tell if data is normally distributed?

  • Context: Undergrad 
  • Thread starter Thread starter jimmy1
  • Start date Start date
  • Tags Tags
    Data Distributed
Click For Summary

Discussion Overview

The discussion centers on methods for determining if a dataset is normally distributed. Participants explore formal statistical tests, visual assessments, and the characteristics of normal distribution, addressing both theoretical and practical aspects of normality testing.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants suggest using formal tests for normality, such as the Kolmogorov-Smirnov test and Shapiro-Wilk test, while noting the limitations of these methods.
  • Others argue that characteristics like mean, median, and mode being equal, along with skewness and kurtosis values, are not reliable indicators for real data.
  • A participant mentions that visual inspections, such as histograms and Q-Q plots, can be misleading due to sample size and bin width choices.
  • Concerns are raised about the implications of having large sample sizes, which may lead to rejecting the null hypothesis of normality too easily.
  • There is a discussion about the tendency for data to appear normal in the center but exhibit issues in the tails, which are often of greater interest.
  • Some participants express skepticism about relying solely on visual methods for assessing normality, suggesting that robust statistical methods should be preferred.

Areas of Agreement / Disagreement

Participants express differing views on the reliability of various methods for assessing normality, with no consensus reached on the best approach. There is acknowledgment of the limitations of both formal tests and visual inspections.

Contextual Notes

Limitations include the dependence on sample size, the choice of bin width in histograms, and the potential for misleading results when using visual assessments. The discussion highlights the complexity of determining normality in real-world data.

jimmy1
Messages
60
Reaction score
0
Is there a formal way of telling if my data is normally distributed?
I know I could plot a histogram for the data, and see if it follows a bell shaped curve, but I need something a lot more formal than this.
Is there a way to do it?
Thanks
 
Physics news on Phys.org
I know one characteristic the Normal Distribution must have is the same Mean, Mode and Median, and it can only be unimodal. I'd simply test all of these factors and see if the numbers are the same. Though, I'm not sure if they have to be exact to the tenth. For example, I think if the Mode=71, Mean=70.6, and Median=71.2, and the only mode was 71, then it would be considered normally distributed.

I know you probably already figured this out, but I'm just adding my comment if some else may have problems. Or maybe I'm completely wrong on this and someone can help me.
 
jimmy1 said:
Is there a formal way of telling if my data is normally distributed?
I know I could plot a histogram for the data, and see if it follows a bell shaped curve, but I need something a lot more formal than this.
Is there a way to do it?
Thanks

for normally distributed data,
skewness should be zero
kurtosis should be equal to 3

hope, it will help
 
The comments about mean=median=mode, skewness = 0, kurtosis =3, are very unlikely to hold for real data. The normal distribution is an idealized model that describes general characteristics very well, but rarely (i would argue never) is exactly correct.

The tests typically allow you to conclude that your data "isn't significantly different" than what you expect from the normal model. Histograms are decidedly poor as an aid, since too much depends on the choices for bin width (and so number of bins) and the sample size.

You might look at the Kolmogorov-Smirnoff test (http://mathworld.wolfram.com/Kolmogorov-SmirnovTest.html)
which compares your sample's empirical distribution to a normal distribution, although it works best when you don't estimate the mean and standard deviation with the sample values.
q-q plots (quantile-quantile plots) are a useful visual tool.

what often occurs is you will see your data set resembling a normal distribution "in the middle", but problems will occur in the extremes (tails) - sadly, that's often the region in which you have the most interest.

Good luck with your investigations.
 
A problem with shapiro wilks and some other tests is that they set the normal distribution as the null hypothesis and then see if the data gives a p-value low enough to reject. The reason this is an issue is because if you have a lot of data points, it is easy to reject the null of normality here. This is a bigger issue with significance testing in general, if you have a really large sample size you'll find all sorts of relationships in the data. This is one reason why people often just inspect the data visually.
 
wvguy8258 said:
A problem with shapiro wilks and some other tests is that they set the normal distribution as the null hypothesis and then see if the data gives a p-value low enough to reject. The reason this is an issue is because if you have a lot of data points, it is easy to reject the null of normality here. This is a bigger issue with significance testing in general, if you have a really large sample size you'll find all sorts of relationships in the data. This is one reason why people often just inspect the data visually.

The comment about downsides of S/W test and tests in general is valid, but while

"This is one reason why people often just inspect the data visually" may be true, it's an incredibly bad thing to do. Again, most data is "normal in the middle" with problems in the tails. With the unreliability of histograms, and with those being so commonly used, the "assumption" of normality is made more often than it should be.

"This is one reason why people should use robust methods" would be a better comment.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K