Quantify difference between discrete distributions

In summary, the conversation discusses different methods for quantifying the difference between two discrete distributions. These include the Kolmogorov-Smirnov test, chi squared test, and calculating the correlation coefficient. It is also mentioned that the significance of the observed distance can be measured by finding the maximum difference between probabilities and using a p-value. However, it is important to keep in mind that many statistical packages assume a continuous null distribution and may not be accurate for discrete distributions. Ultimately, the best method depends on the situation and what the user is trying to determine.
  • #1
bradyj7
122
0
Hello,

I am trying to quantify the difference between two discrete distributions. I have been reading online and there seems to be a few different ways such as a Kolmogorov-Smirnov test and a chi squared test.

My first question is which of these is the correct method for comparing the distributions below?

The distributions are discrete distributions with 24 bins.

My second question is that, it pretty obvious looking at the distributions that they will be statistically significantly different, but is there a method to quantify how different they are? I'm not sure, but a percentage or distance perhaps?

I've been told that if you use the Kolmogorov-Smirnov test, a measure of how different the distributions are will be the p-value. Is that correct?

I appreciate any help and comments

Kind Regards

https://dl.dropbox.com/u/54057365/All/phy.JPG
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
Note: I am not a statistician. One simple test would be to calculate the correlation coefficient. This is a measure of the statistical difference.
 
  • #3
bradyj7 said:
My second question is that, it pretty obvious looking at the distributions that they will be statistically significantly different, but is there a method to quantify how different they are? I'm not sure, but a percentage or distance perhaps?

I've been told that if you use the Kolmogorov-Smirnov test, a measure of how different the distributions are will be the p-value. Is that correct?

KS and related distances compare the distribution of two random variables X and Y by finding the maximum difference between Prob[X is in A] with Prob[Y is A] over various sets A. For KS, the sets A are of the form (-inf,b] and for Kuiper they are (a,b]. Both of these distances can be expressed in terms of the cdf so to visualise it you just need to plot the cumulative probabilities.

Now to measure the significance of the observed distance, keep in mind that many statistical packages assume that the null distribution is continuous and may return inaccurate p-values, e.g. in R the standard function ks.test{stats} assumes continuous distributions but ks.test{dgof} allows discrete distributions.
 
  • #4
mathman said:
Note: I am not a statistician. One simple test would be to calculate the correlation coefficient. This is a measure of the statistical difference.

Calculation of correlation coefficient is "Not" a test. Neither it is a measure of statistical distance. Simple corr. coeff. is only a measure of linear association between two variables.

OP may look for Matusita distance.
 
  • #5
It depends on what is important to you. It's purely subjective. I don't think that there is a standard method. If you tell us what you are trying to determine, we might be better able to help.

Given what we have I can only guess you have a model for the situation and are trying to decide whether or not it is a good one. I'm not sure statistics can help you with that. It could help with deciding which of two models fits the data better. For that you could use the sum of the squares of the differences between the model and the data. There isn't a deep reason for this, it is just the standard method that everyone uses, so you may as well use it unless you have some reason not to.
 

What is the difference between two discrete distributions?

The difference between two discrete distributions is the variation in the values of their data points. This can be seen in the shape of the distribution curves, the location of the mean and median, and the spread of the data.

How do you quantify the difference between two discrete distributions?

The most common way to quantify the difference between two discrete distributions is by calculating the distance between their respective probability density functions (PDFs). This can be done using metrics such as the Kolmogorov-Smirnov statistic or the Jensen-Shannon divergence.

Can you use statistical tests to compare discrete distributions?

Yes, statistical tests such as the Chi-square test or the two-sample t-test can be used to compare two discrete distributions. These tests can determine if the observed differences between the distributions are significant or due to chance.

Is it possible to visualize the difference between two discrete distributions?

Yes, there are various ways to visualize the difference between two discrete distributions. These include creating histograms or boxplots of the data, plotting the PDFs of each distribution on the same graph, or using a QQ plot to compare the quantiles of the two distributions.

What factors can influence the difference between two discrete distributions?

The difference between two discrete distributions can be influenced by various factors, including the sample size, the shape of the distribution curves, the location of the mean and median, and the presence of outliers. Additionally, the chosen method for quantifying the difference can also impact the results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
474
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
837
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
895
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
3K
Back
Top