Undergrad Understanding the transformation of skewness formula

Click For Summary
SUMMARY

The discussion focuses on the formula for sample skewness, defined as Sk = [ (n / [(n-1)(n-2)] ] x [ ∑ (Xi - X)³ / s³ ]. The term n/[(n-1)(n-2)] corrects for downward bias in small samples, ensuring more accurate skewness calculations. As sample size n increases, this term asymptotically approaches 1/n, simplifying the expression for large datasets. Understanding the relationship between sample moments and their corrections is crucial for accurate statistical analysis.

PREREQUISITES
  • Familiarity with sample moments and their calculations
  • Understanding of the Central Limit Theorem
  • Knowledge of variance calculation and the (n-1) correction
  • Basic proficiency in statistical concepts and formulas
NEXT STEPS
  • Study the derivation of sample skewness corrections
  • Learn about the implications of the Central Limit Theorem on skewness
  • Explore the differences between sample moments and population moments
  • Investigate the behavior of skewness in large datasets
USEFUL FOR

Statisticians, data analysts, and researchers interested in understanding skewness in data distributions and its implications for statistical analysis.

Vital
Messages
108
Reaction score
4
Hello.

Here is the formula that computes the sample skewness:

Sk = [ (n / [(n-1)(n-2)] ] x [ ∑ (Xi - X)3 / s3 ] ,
where n is the number of elements in the sample,
Xi is the specific i-th value of the sample, where i starts from 0 and ends at i=n,
X is the arithmetic mean, s - standard deviation

I have two questions about this formula:

1) It is said in the book that the term n/[(n − 1)(n − 2)] in the above equation corrects for a downward bias in small samples. What does it mean and how that correction happens? For example, if n = 5, then n/[(n − 1)(n − 2)] = 0.4167.

I see it as if by using this part of equation we are taking around only 42 percent of the second part of the formula [ ∑ (Xi - X)3 / s3 ]. How does that help to correct for downward bias?

2) Also in the book it is said that as n becomes large, the expression reduces to the mean cubed deviation: Sk ≈ [ (1 / n ] x [ SUM (Xi - X)3 / s3 ]
How does this happen mathematically? I don't see it. For example, n = 1000, then

1000 / ( 999 x 998) how does this turn into 1/n?

Thank you very much.
 
Physics news on Phys.org
This post really needs Latex. Consider using:

https://www.physicsforums.com/help/latexhelp/
and also

https://www.codecogs.com/latex/eqneditor.php
- - - -
Let's work backward:

Question 2:

are you familiar with the fact that
##\lim_{n \to \infty} \frac{n}{n-2} = 1##

or as people say
## \frac{n}{n-2} \approx 1##

for large enough ##n##.
- - - -
now consider

##\frac{1}{n-1} \approx \frac{1}{n}##

for large enough ##n## (why?)

putting these together

## \frac{n}{(n-1)(n-2)} = \big(\frac{n}{(n-2)}\big)\big(\frac{1}{n-1}\big) \approx \big(1\big)\big( \frac{1}{n}\big)= \frac{1}{n}##

for large enough ##n## -- i.e. you can consider the asymptotic behavior separately. (If you prefer: take advantage of positivity and consider the limiting effects while working in logspace and then exponentiate your way back at the end.)
- - - -
Outside the scope thought: the rate of convergence here is actually pretty good. In your example consider ##\frac{1}{1000} \text{vs} \frac{1000}{(999)(998)}## they are actually pretty close to each other -- i.e. the first is ##0.001000## and the second one is ##0.001003## --where I rounded to the 6th decimal place.

There are certain results that are asymptotically correct but require exponentially more data to be a valid approximation -- these are things to be suspicious of -- but they don't apply here because convergence is pretty good.
- - - -
Question 1:

Are you familiar with the ##(n-1)## correction used in calculating sample variance? I think you should know this inside and out before considering sample skewness -- (a) because it is simpler and (b) because variance is much more important than skew -- in particular for the Central Limit Theorem, but also because estimates get less and less reliable the further up the moment curve that you go when you have noisy data.

The corrections for skew are the same logic, just one moment higher -- i.e. if you look at the moments involved, it's 3rd moment, 1st moment and 2nd moment. But the first and second moment are not 'pure' -- they are each sample moments and hence there's a data point / degree of freedom being chewed up -- the ## \frac{n}{(n-1)(n-2)}## corrects for that. I'm sure you can dig around and find the exact derivation for these skew corrections, but I don't think it's going to be that insightful. And more to the point: as noted in Question 2, asymptotically these corrections don't matter. Put differently: if you are dealing with small amounts of data, you need to pay attention to this stuff. But if you are dealing with medium or big data, it really doesn't matter.
 
Last edited:
The (n-1)(n-2) term corrects for the fact that the expression uses the sample mean, rather than the true mean. Another way of looking at is that the mathematical expectation of the expression using (n-1)(n-2) is the theoretical value for the given distribution.
 
StoneTemplePython said:
...

Thank you very much for this truly helpful post. I am sorry I didn't reply earlier. Now I understand the concept, and yet, I wasn't familiar with corrections.
Thank you once again.
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 42 ·
2
Replies
42
Views
5K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 24 ·
Replies
24
Views
6K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 7 ·
Replies
7
Views
4K