# Testing/Comparing Distributions

1. Aug 21, 2014

### WWGD

Hi all, I was going over the poll

https://www.physicsforums.com/showthread.php?t=766275, and I was wondering how one would go about testing whether the distribution of PF's member nationalities is "the same" (up to some confidence level) than the distribution of the world's population.

Would this be a sort-of ANOVA (subtracting the proportions of members that live in the same region, to test for equality and decide --statistically--which pairs (PF region, World region) are equally-distributed) , but testing for equality of proportions (e.g., % of PF from Asia vs. World's P ), or would it make more sense by some reasonable standard to use some goodness-of-fit test; maybe a χ^2 with the world's distribution proportions as the expected ones ?

I think he χ^2 would just tell us about the distributions in general, but would not help us decide --statistically --which regions are similarly-distributed and which are not, and the ANOVA equivalent (if there is one) of the differences of proportions would tell us about differences in distribution between regions .

Last edited: Aug 21, 2014
2. Aug 23, 2014

### FactChecker

The Chi-squared goodness of fit test can test the sample numbers versus the theoretical expected numbers. For an individual region, the binomial distribution should work. Use the standard deviation of the binomial to see if the sample number from that region is within the confidence interval. The Chi-squared should also work. I suspect that the binomial and the Chi-squared are identical in that case.

3. Aug 23, 2014

### WWGD

Thanks; an issue I was considering was, if we used a Chi-squared, and we rejected it at a given confidence level, we may lose information, in that there may be individual nationalities which do match well, but some outliers --maybe a single one -- may lead us to reject the hypothesis. is there a way of dealing with this?

4. Aug 23, 2014

### FactChecker

That's an interesting question. Suppose the entire sample fails the Chi-squared. You might take out the region with the worst matching percentage and apply the test to the remaining sample. That would be like throwing out an outlier. If that test fails, remove the samples of the next worst region. Continue removing region samples till the remaining data passes a Chi-squared. I don't know what a statistician would think of that process. I would not expect the PF poll to fit the general population numbers well.