Skewness: large sample, but few unique observations

  • Thread starter BigBugBuzz
  • Start date
In summary, the individual is seeking help in determining the skewness of two distributions with 100 observations each to see if they are likely to be equal. They are concerned about the impact of having few unique values in one sample and whether this will affect their results. They also mention that the two distributions are generated by different processes and are wondering if there is anything else they need to consider besides sample size. The expert suggests using the formula for calculating skewness and adjusting for sample size, and also mentions the possibility of using non-parametric tests to compare the two distributions.
  • #1
BigBugBuzz
3
0
Hi Everyone,

I wonder if anyone can help me here.

Suppose I have two samples with, say, 100 observations in each, and I am not sure if the samples are drawn from the same population.

I wish to determine:

(A) The skewness of the distributions
(B) If the skewness of each distribution are likely to be equal.

The problem is (perhaps it is a problem, but of this I am uncertain) that within one of the distributions, many observations are the same. So, while there may be 100 observations in each sample, the number of unique values in one of the samples is much smaller than in the other.

I realize that the number of observations has an impact on skewness measures, so that a correction must be performed for small sample sizes, but is the fact that there are few UNIQUE values in a sample problematic too. If so, how could I proceed?

Please note that the two distributions are generated by two distinct processes (two settings in a simulation). There is no problem, from the point of view of my theory, that one of these processes constrains the diversity of outcomes, but is there something I must correct for, besides simply sample size?
 
Physics news on Phys.org
  • #2
BigBugBuzz said:
Hi Everyone,

I wonder if anyone can help me here.

Suppose I have two samples with, say, 100 observations in each, and I am not sure if the samples are drawn from the same population.

I wish to determine:

(A) The skewness of the distributions
(B) If the skewness of each distribution are likely to be equal.

Skewness is defined as:

[tex]\gamma=\sum(X-\mu)^{3}/n\sigma^{3}[/tex].

calculating it from a sample:

[tex]g=\frac{n\sum Z^{3}}{(n-1)(n-2)}[/tex]

You can use [tex] SE=\sqrt{6/n}[/tex] to calculate the confidence intervals.

The problem is (perhaps it is a problem, but of this I am uncertain) that within one of the distributions, many observations are the same. So, while there may be 100 observations in each sample, the number of unique values in one of the samples is much smaller than in the other.

I'm not sure what you mean by "unique values" Do you mean values that only occur once in the sample? You get what you get with random sampling or random generation. If you think your random generator is faulty, that's another question.

Please note that the two distributions are generated by two distinct processes (two settings in a simulation). There is no problem, from the point of view of my theory, that one of these processes constrains the diversity of outcomes, but is there something I must correct for, besides simply sample size?

I don't know what your theory is, but if you want to test if two "samples" come from the same population you can either normalize your data and apply standard parametric methods or use non-parametric methods on the data as it is; such as the Mann-Whitney or Kolmogorov-Smirnov tests.
 
Last edited:

What is skewness and why is it important?

Skewness is a measure of the asymmetry of a distribution. It tells us how much a distribution deviates from a symmetrical bell curve. It is important because it helps us understand the shape of a distribution and can provide insights into the underlying data.

How is skewness calculated for a large sample with few unique observations?

Skewness can be calculated using various mathematical formulas, such as the Pearson's moment coefficient of skewness or the Bowley's coefficient of skewness. These formulas take into account the number of observations and the frequency of each unique value.

What does a positive or negative skewness value indicate?

A positive skewness value indicates that the tail of the distribution is longer on the right side, while a negative skewness value indicates a longer tail on the left side. This means that the data is skewed towards higher or lower values, respectively.

Can skewness be used to determine the presence of outliers?

Yes, skewness can be used as an indicator of outliers in a dataset. A highly skewed distribution may suggest the presence of extreme values that are significantly different from the rest of the data.

How can skewness be corrected in a dataset?

Skewness can be corrected by applying mathematical transformations, such as logarithmic or square root transformations, to the data. These transformations can help make the data more symmetrical and reduce the skewness. It is important to note that transformations should only be applied if they make sense in the context of the data and the research question.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
414
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
447
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
859
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
993
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
879
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
710
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
834
Back
Top