Skewness: large sample, but few unique observations

  • Context: Graduate 
  • Thread starter Thread starter BigBugBuzz
  • Start date Start date
Click For Summary
SUMMARY

This discussion focuses on assessing the skewness of two samples, each containing 100 observations, to determine if they originate from the same population. The skewness is calculated using the formula γ = Σ(X - μ)³ / nσ³, with a sample calculation of g = nΣZ³ / (n-1)(n-2). The challenge arises from one sample having significantly fewer unique values, raising concerns about the validity of skewness measures. To address this, participants suggest normalizing the data and employing parametric methods or non-parametric tests such as the Mann-Whitney or Kolmogorov-Smirnov tests for comparison.

PREREQUISITES
  • Understanding of skewness and its mathematical definition
  • Familiarity with sample size impact on statistical measures
  • Knowledge of parametric and non-parametric statistical tests
  • Experience with data normalization techniques
NEXT STEPS
  • Research the calculation of skewness and its implications in statistical analysis
  • Learn about the Mann-Whitney U test for comparing two independent samples
  • Explore the Kolmogorov-Smirnov test for assessing the equality of distributions
  • Investigate data normalization methods and their effects on statistical tests
USEFUL FOR

Statisticians, data analysts, and researchers interested in comparing distributions and understanding the impact of sample uniqueness on skewness and statistical validity.

BigBugBuzz
Messages
3
Reaction score
0
Hi Everyone,

I wonder if anyone can help me here.

Suppose I have two samples with, say, 100 observations in each, and I am not sure if the samples are drawn from the same population.

I wish to determine:

(A) The skewness of the distributions
(B) If the skewness of each distribution are likely to be equal.

The problem is (perhaps it is a problem, but of this I am uncertain) that within one of the distributions, many observations are the same. So, while there may be 100 observations in each sample, the number of unique values in one of the samples is much smaller than in the other.

I realize that the number of observations has an impact on skewness measures, so that a correction must be performed for small sample sizes, but is the fact that there are few UNIQUE values in a sample problematic too. If so, how could I proceed?

Please note that the two distributions are generated by two distinct processes (two settings in a simulation). There is no problem, from the point of view of my theory, that one of these processes constrains the diversity of outcomes, but is there something I must correct for, besides simply sample size?
 
Physics news on Phys.org
BigBugBuzz said:
Hi Everyone,

I wonder if anyone can help me here.

Suppose I have two samples with, say, 100 observations in each, and I am not sure if the samples are drawn from the same population.

I wish to determine:

(A) The skewness of the distributions
(B) If the skewness of each distribution are likely to be equal.

Skewness is defined as:

[tex]\gamma=\sum(X-\mu)^{3}/n\sigma^{3}[/tex].

calculating it from a sample:

[tex]g=\frac{n\sum Z^{3}}{(n-1)(n-2)}[/tex]

You can use [tex]SE=\sqrt{6/n}[/tex] to calculate the confidence intervals.

The problem is (perhaps it is a problem, but of this I am uncertain) that within one of the distributions, many observations are the same. So, while there may be 100 observations in each sample, the number of unique values in one of the samples is much smaller than in the other.

I'm not sure what you mean by "unique values" Do you mean values that only occur once in the sample? You get what you get with random sampling or random generation. If you think your random generator is faulty, that's another question.

Please note that the two distributions are generated by two distinct processes (two settings in a simulation). There is no problem, from the point of view of my theory, that one of these processes constrains the diversity of outcomes, but is there something I must correct for, besides simply sample size?

I don't know what your theory is, but if you want to test if two "samples" come from the same population you can either normalize your data and apply standard parametric methods or use non-parametric methods on the data as it is; such as the Mann-Whitney or Kolmogorov-Smirnov tests.
 
Last edited:

Similar threads

  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K