Skew & Kurtosis: Weighting Signficance

  • Thread starter kimberley
  • Start date
In summary: In theory, yes, but the JB Test is robust to sample size. 3. S^2 in the formula is simply the Skew of each sample size squared?--abundance of caution here (please excuse the poli sci background).Yes.4. MOST SIGNIFICANTLY, in the part of the formula, (K-3)^2/4, don't most basic spreadsheet calculators like MS Excel and OpenOffice already account for the -3 when calculating Kurtosis and, therefore, the -3 is redundant?If not, that would be a good addition to the formula.
  • #1
kimberley
14
0
Hello everyone. This is my first post to the forum and I'm pleased to be a member.

ISSUE

I have various samples, 97 in all, and they are of different sample sizes (n4...n100). All of these samples come from the same population data. The distribution of the population data is NOT normal.

Although the population data is not normally distributed, I want to determine which of these 97 samples is closest to representing a normal distribution. I have calcuated the skew & kurtosis for each sample. I then squared the results to make each value of skew & kurtosis a positive number, and then I ranked the skew from smallest to largest and I did the same for the kurtosis. I concluded that the sample with the smallest squared skew and the smallest squared kurtosis best represents the sample that most closely approximates a normal distribution. Something tells me, however, that this ranking system is inadequate because a skew of .01 where sample size equals 9 seems less significant than a skew of .07 where the sample size equals 66. Similarly, a kurtosis of 0 where sample size equals 12 seems like it might be less significant than where kurtosis is .09 and sample size equals 40.

Obviously, I'd like to come up with a way to rank the skew and kurtosis of each sample by weighting them somehow. How would you go about weighting them to determine which sample is the best relative representation of a normal distribution? Thank you in advance.

Kimberley
 
Last edited:
Physics news on Phys.org
  • #2
kimberley said:
I have calcuated the skew & kurtosis for each sample. I then squared the results to make each value of skew & kurtosis a positive number, and then I ranked the skew from smallest to largest and I did the same for the kurtosis. I concluded that the sample with the smallest squared skew and the smallest squared kurtosis best represents the sample that most closely approximates a normal distribution.
Obviously you were using your judgment here. There are tests for normalcy based on the skew & kurt., e.g. the JB test.

Something tells me, however, that this ranking system is inadequate because a skew of .01 where sample size equals 9 seems less significant than a skew of .07 where the sample size equals 66. Similarly, a kurtosis of 0 where sample size equals 12 seems like it might be less significant than where kurtosis is .09 and sample size equals 40.

Obviously, I'd like to come up with a way to rank the skew and kurtosis of each sample by weighting them somehow. How would you go about weighting them to determine which sample is the best relative representation of a normal distribution?
JB test takes the sample size into account; I guess you could look at the sample size as a weight.
 
  • #3
EnumaElish et als.,

Thank you. This is precisely what I was looking for and I really like the JB Test because it combines skew, kurtosis and sample size all into one. I PM'd EE with some follow-up, but thought it best to post it here as well for ease of response and edification.

While the JB Test is clearly what I was looking for, I am a bit unclear on the forumla and want to shore up my understanding.

1. Am I correct that a lower JB statistic (e.g., .14) indicates greater relative normality for a sample than a JB statistic of (e.g., 1.7) for another sample?

2. If the answer to 1. is "Yes", and the first part of the formula places your sample size as the numerator over 6, doesn't this penalize larger sample sizes? In other words, if your sample size is n=100, the first part of the equation is 100/6 whereas a sample size of n=20 would only be 20/6, almost guaranteeing that n=20 is going to have a lower JB statistic as compared to n=100.

3. Is S^2 in the formula simply the Skew of each sample size squared?--abundance of caution here (please excuse the poli sci background).

4. MOST SIGNIFICANTLY, in the part of the formula, (K-3)^2/4, don't most basic spreadsheet calculators like MS Excel and OpenOffice already account for the -3 when calculating Kurtosis and, therefore, the -3 is redundant? If so, shouldn't that part of the formula simply be K^2/4?

Thank you in advance for any further guidance that you can give me.

Kimberley
 
Last edited:
  • #4
kimberley said:
1. Am I correct that a lower JB statistic (e.g., .14) indicates greater relative normality for a sample than a JB statistic of (e.g., 1.7) for another sample?
It indicates a greater probability of being normal.

2. If the answer to 1. is "Yes", and the first part of the formula places your sample size as the numerator over 6, doesn't this penalize larger sample sizes? In other words, if your sample size is n=100, the first part of the equation is 100/6 whereas a sample size of n=20 would only be 20/6, almost guaranteeing that n=20 is going to have a lower JB statistic as compared to n=100.
I need to think about this. Online purchase & review of the JB articles encouraged (see links at the end of the Wiki article).

3. Is S^2 in the formula simply the Skew of each sample size squared?--abundance of caution here (please excuse the poli sci background).
Yes.

4. MOST SIGNIFICANTLY, in the part of the formula, (K-3)^2/4, don't most basic spreadsheet calculators like MS Excel and OpenOffice already account for the -3 when calculating Kurtosis and, therefore, the -3 is redundant? If so, shouldn't that part of the formula simply be K^2/4?
You can use the "Help" feature of Excel, etc., to see what formula each is using. If they already have the -3 built in then you should substitute K* for JB's K - 3, where K* = K - 3. See this link.

Note this is an asymptotic test, so there is already the assumption that the test is more reliable with large samples.

See also: these tests.
 
Last edited:

1. What is skewness and kurtosis?

Skewness and kurtosis are statistical measures that describe the shape of a distribution. Skewness measures the asymmetry of a distribution, while kurtosis measures the peakedness or flatness of a distribution.

2. How are skewness and kurtosis calculated?

Skewness is calculated by taking the third standardized moment of a distribution, while kurtosis is calculated by taking the fourth standardized moment. These measures can also be calculated using formulas that take into account the mean and standard deviation of the distribution.

3. Why are skewness and kurtosis important?

Skewness and kurtosis provide important information about the distribution of a dataset, which can help in data analysis and decision making. They can also be used to identify outliers or unusual data points in a dataset.

4. What is the significance of weighting in skewness and kurtosis?

Weighting in skewness and kurtosis refers to giving more importance to certain data points based on their relative weight in the dataset. This can be useful in situations where certain data points have more influence on the overall distribution than others.

5. How can skewness and kurtosis be interpreted?

Skewness and kurtosis values can be interpreted based on their magnitude. For skewness, a positive value indicates a right-skewed distribution, while a negative value indicates a left-skewed distribution. For kurtosis, a value of 3 indicates a normal distribution, while values greater than 3 indicate a more peaked distribution and values less than 3 indicate a flatter distribution.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
430
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
456
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
470
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
721
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
3K
Back
Top