Testing/Comparing Distributions

WWGD · Aug 21, 2014

Hi all, I was going over the poll

https://www.physicsforums.com/showthread.php?t=766275, and I was wondering how one would go about testing whether the distribution of PF's member nationalities is "the same" (up to some confidence level) than the distribution of the world's population.

Would this be a sort-of ANOVA (subtracting the proportions of members that live in the same region, to test for equality and decide --statistically--which pairs (PF region, World region) are equally-distributed) , but testing for equality of proportions (e.g., % of PF from Asia vs. World's P ), or would it make more sense by some reasonable standard to use some goodness-of-fit test; maybe a χ^2 with the world's distribution proportions as the expected ones ?

I think he χ^2 would just tell us about the distributions in general, but would not help us decide --statistically --which regions are similarly-distributed and which are not, and the ANOVA equivalent (if there is one) of the differences of proportions would tell us about differences in distribution between regions .

FactChecker · Aug 23, 2014

WWGD said:

I think he χ^2 would just tell us about the distributions in general, but would not help us decide --statistically --which regions are similarly-distributed and which are not, and the ANOVA equivalent (if there is one) of the differences of proportions would tell us about differences in distribution between regions .

The Chi-squared goodness of fit test can test the sample numbers versus the theoretical expected numbers. For an individual region, the binomial distribution should work. Use the standard deviation of the binomial to see if the sample number from that region is within the confidence interval. The Chi-squared should also work. I suspect that the binomial and the Chi-squared are identical in that case.

WWGD · Aug 23, 2014

Thanks; an issue I was considering was, if we used a Chi-squared, and we rejected it at a given confidence level, we may lose information, in that there may be individual nationalities which do match well, but some outliers --maybe a single one -- may lead us to reject the hypothesis. is there a way of dealing with this?

FactChecker · Aug 23, 2014

That's an interesting question. Suppose the entire sample fails the Chi-squared. You might take out the region with the worst matching percentage and apply the test to the remaining sample. That would be like throwing out an outlier. If that test fails, remove the samples of the next worst region. Continue removing region samples till the remaining data passes a Chi-squared. I don't know what a statistician would think of that process. I would not expect the PF poll to fit the general population numbers well.

blue_raver22 · Aug 30, 2014

I would approach this question by first clarifying the specific hypothesis being tested. In this case, it seems that the question is whether the distribution of PF's member nationalities is significantly different from the distribution of the world's population.

To test this hypothesis, I would use a chi-square goodness-of-fit test. This test compares the observed frequency of each category (in this case, the number of members from each region) to the expected frequency based on the overall distribution (in this case, the distribution of the world's population). The test will provide a p-value, which can be used to determine the statistical significance of any differences between the observed and expected frequencies.

If the p-value is less than the chosen significance level (usually 0.05), we can reject the null hypothesis and conclude that there is a significant difference between the two distributions. However, if the p-value is greater than the significance level, we fail to reject the null hypothesis and can conclude that there is no significant difference between the distributions.

Additionally, to determine which regions are contributing most to any significant differences, I would conduct post-hoc analyses such as pairwise comparisons or follow-up chi-square tests between specific regions.

Overall, using a chi-square goodness-of-fit test would be the most appropriate approach to test for differences in distribution between PF's member nationalities and the world's population.

Testing/Comparing Distributions

1. What is the purpose of testing and comparing distributions in statistics?

2. What are the common methods used to test and compare distributions?

3. How do you interpret the results of a distribution comparison test?

4. What are some factors that can affect the results of a distribution comparison test?

5. Can distribution comparison tests be used for non-parametric data?

Similar threads

Hot Threads

Recent Insights