Quantify difference between discrete distributions

Click For Summary

Discussion Overview

The discussion revolves around methods to quantify the difference between two discrete distributions, specifically focusing on statistical tests such as the Kolmogorov-Smirnov test and the chi-squared test. Participants explore how to measure the significance and extent of the differences observed in the distributions, which consist of 24 bins.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant inquires about the appropriate method for comparing two discrete distributions, mentioning the Kolmogorov-Smirnov test and the chi-squared test.
  • Another participant suggests calculating the correlation coefficient as a simple test, but later clarifies that it does not measure statistical distance.
  • There is a discussion about the Kolmogorov-Smirnov test, where one participant explains that it compares the maximum difference between the cumulative distribution functions (CDFs) of two random variables.
  • Concerns are raised about the applicability of the Kolmogorov-Smirnov test for discrete distributions, noting that standard statistical packages may return inaccurate p-values for such cases.
  • A participant proposes the Matusita distance as a potential measure for quantifying differences between distributions.
  • Another participant emphasizes the subjective nature of determining what is important in measuring differences and suggests that the context of the model being evaluated is crucial for selecting an appropriate method.
  • There is mention of using the sum of the squares of the differences between a model and the data as a standard method for assessing model fit, although its justification is described as not deeply rooted.

Areas of Agreement / Disagreement

Participants express differing opinions on the best methods for quantifying differences between distributions, with no consensus reached on a single approach. Some methods are challenged or refined, but the discussion remains unresolved regarding the most appropriate statistical techniques.

Contextual Notes

Participants highlight limitations regarding the assumptions of statistical tests, particularly the Kolmogorov-Smirnov test's reliance on continuous distributions, which may not be suitable for the discrete distributions in question.

bradyj7
Messages
117
Reaction score
0
Hello,

I am trying to quantify the difference between two discrete distributions. I have been reading online and there seems to be a few different ways such as a Kolmogorov-Smirnov test and a chi squared test.

My first question is which of these is the correct method for comparing the distributions below?

The distributions are discrete distributions with 24 bins.

My second question is that, it pretty obvious looking at the distributions that they will be statistically significantly different, but is there a method to quantify how different they are? I'm not sure, but a percentage or distance perhaps?

I've been told that if you use the Kolmogorov-Smirnov test, a measure of how different the distributions are will be the p-value. Is that correct?

I appreciate any help and comments

Kind Regards

https://dl.dropbox.com/u/54057365/All/phy.JPG
 
Last edited by a moderator:
Physics news on Phys.org
Note: I am not a statistician. One simple test would be to calculate the correlation coefficient. This is a measure of the statistical difference.
 
bradyj7 said:
My second question is that, it pretty obvious looking at the distributions that they will be statistically significantly different, but is there a method to quantify how different they are? I'm not sure, but a percentage or distance perhaps?

I've been told that if you use the Kolmogorov-Smirnov test, a measure of how different the distributions are will be the p-value. Is that correct?

KS and related distances compare the distribution of two random variables X and Y by finding the maximum difference between Prob[X is in A] with Prob[Y is A] over various sets A. For KS, the sets A are of the form (-inf,b] and for Kuiper they are (a,b]. Both of these distances can be expressed in terms of the cdf so to visualise it you just need to plot the cumulative probabilities.

Now to measure the significance of the observed distance, keep in mind that many statistical packages assume that the null distribution is continuous and may return inaccurate p-values, e.g. in R the standard function ks.test{stats} assumes continuous distributions but ks.test{dgof} allows discrete distributions.
 
mathman said:
Note: I am not a statistician. One simple test would be to calculate the correlation coefficient. This is a measure of the statistical difference.

Calculation of correlation coefficient is "Not" a test. Neither it is a measure of statistical distance. Simple corr. coeff. is only a measure of linear association between two variables.

OP may look for Matusita distance.
 
It depends on what is important to you. It's purely subjective. I don't think that there is a standard method. If you tell us what you are trying to determine, we might be better able to help.

Given what we have I can only guess you have a model for the situation and are trying to decide whether or not it is a good one. I'm not sure statistics can help you with that. It could help with deciding which of two models fits the data better. For that you could use the sum of the squares of the differences between the model and the data. There isn't a deep reason for this, it is just the standard method that everyone uses, so you may as well use it unless you have some reason not to.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 20 ·
Replies
20
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 24 ·
Replies
24
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K