Testing/Comparing Distributions

In summary: It's possible that there are some nationalities which match very well, but others which don't. I suspect that a χ^2 would give a different result than the Chi-squared.
  • #1
WWGD
Science Advisor
Gold Member
6,998
10,416
Hi all, I was going over the poll

https://www.physicsforums.com/showthread.php?t=766275, and I was wondering how one would go about testing whether the distribution of PF's member nationalities is "the same" (up to some confidence level) than the distribution of the world's population.

Would this be a sort-of ANOVA (subtracting the proportions of members that live in the same region, to test for equality and decide --statistically--which pairs (PF region, World region) are equally-distributed) , but testing for equality of proportions (e.g., % of PF from Asia vs. World's P ), or would it make more sense by some reasonable standard to use some goodness-of-fit test; maybe a χ^2 with the world's distribution proportions as the expected ones ?

I think he χ^2 would just tell us about the distributions in general, but would not help us decide --statistically --which regions are similarly-distributed and which are not, and the ANOVA equivalent (if there is one) of the differences of proportions would tell us about differences in distribution between regions .
 
Last edited:
Physics news on Phys.org
  • #2
WWGD said:
I think he χ^2 would just tell us about the distributions in general, but would not help us decide --statistically --which regions are similarly-distributed and which are not, and the ANOVA equivalent (if there is one) of the differences of proportions would tell us about differences in distribution between regions .

The Chi-squared goodness of fit test can test the sample numbers versus the theoretical expected numbers. For an individual region, the binomial distribution should work. Use the standard deviation of the binomial to see if the sample number from that region is within the confidence interval. The Chi-squared should also work. I suspect that the binomial and the Chi-squared are identical in that case.
 
  • #3
Thanks; an issue I was considering was, if we used a Chi-squared, and we rejected it at a given confidence level, we may lose information, in that there may be individual nationalities which do match well, but some outliers --maybe a single one -- may lead us to reject the hypothesis. is there a way of dealing with this?
 
  • #4
That's an interesting question. Suppose the entire sample fails the Chi-squared. You might take out the region with the worst matching percentage and apply the test to the remaining sample. That would be like throwing out an outlier. If that test fails, remove the samples of the next worst region. Continue removing region samples till the remaining data passes a Chi-squared. I don't know what a statistician would think of that process. I would not expect the PF poll to fit the general population numbers well.
 
  • #5


I would approach this question by first clarifying the specific hypothesis being tested. In this case, it seems that the question is whether the distribution of PF's member nationalities is significantly different from the distribution of the world's population.

To test this hypothesis, I would use a chi-square goodness-of-fit test. This test compares the observed frequency of each category (in this case, the number of members from each region) to the expected frequency based on the overall distribution (in this case, the distribution of the world's population). The test will provide a p-value, which can be used to determine the statistical significance of any differences between the observed and expected frequencies.

If the p-value is less than the chosen significance level (usually 0.05), we can reject the null hypothesis and conclude that there is a significant difference between the two distributions. However, if the p-value is greater than the significance level, we fail to reject the null hypothesis and can conclude that there is no significant difference between the distributions.

Additionally, to determine which regions are contributing most to any significant differences, I would conduct post-hoc analyses such as pairwise comparisons or follow-up chi-square tests between specific regions.

Overall, using a chi-square goodness-of-fit test would be the most appropriate approach to test for differences in distribution between PF's member nationalities and the world's population.
 

1. What is the purpose of testing and comparing distributions in statistics?

The purpose of testing and comparing distributions in statistics is to determine if two or more sets of data come from the same underlying distribution. This is important because it allows us to make inferences and draw conclusions about the population from which the data was collected.

2. What are the common methods used to test and compare distributions?

Some common methods used to test and compare distributions include the Kolmogorov-Smirnov test, the Chi-Square test, and visual methods such as histograms and box plots. These methods help to determine if there are significant differences between the distributions.

3. How do you interpret the results of a distribution comparison test?

The interpretation of the results of a distribution comparison test depends on the specific method used. In general, a p-value of less than 0.05 indicates that there is a significant difference between the distributions, while a p-value greater than 0.05 suggests that there is no significant difference.

4. What are some factors that can affect the results of a distribution comparison test?

Some factors that can affect the results of a distribution comparison test include sample size, the type of data being compared (e.g. continuous, categorical), and the assumptions of the test being used. It is important to carefully consider these factors when interpreting the results of a distribution comparison test.

5. Can distribution comparison tests be used for non-parametric data?

Yes, distribution comparison tests can be used for non-parametric data. Non-parametric tests are designed to compare distributions without making assumptions about the underlying distribution of the data. However, it is important to use the appropriate non-parametric test for the type of data being compared.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
429
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
808
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
297
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
834
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
717
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
800
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
Back
Top