quantify difference between discrete distributionsby bradyj7 Tags: difference, discrete, distributions, quantify 

#1
Feb1413, 08:48 AM

P: 122

Hello,
I am trying to quantify the difference between two discrete distributions. I have been reading online and there seems to be a few different ways such as a KolmogorovSmirnov test and a chi squared test. My first question is which of these is the correct method for comparing the distributions below? The distributions are discrete distributions with 24 bins. My second question is that, it pretty obvious looking at the distributions that they will be statistically significantly different, but is there a method to quantify how different they are? I'm not sure, but a percentage or distance perhaps? I've been told that if you use the KolmogorovSmirnov test, a measure of how different the distributions are will be the pvalue. Is that correct? I appreciate any help and comments Kind Regards 



#2
Feb1413, 03:55 PM

Sci Advisor
P: 5,941

Note: I am not a statistician. One simple test would be to calculate the correlation coefficient. This is a measure of the statistical difference.




#3
Feb1613, 09:10 PM

P: 523

Now to measure the significance of the observed distance, keep in mind that many statistical packages assume that the null distribution is continuous and may return inaccurate pvalues, e.g. in R the standard function ks.test{stats} assumes continuous distributions but ks.test{dgof} allows discrete distributions. 



#4
Feb2013, 12:23 PM

P: 239

quantify difference between discrete distributionsOP may look for Matusita distance. 



#5
Feb2113, 12:13 AM

P: 571

It depends on what is important to you. It's purely subjective. I don't think that there is a standard method. If you tell us what you are trying to determine, we might be better able to help.
Given what we have I can only guess you have a model for the situation and are trying to decide whether or not it is a good one. I'm not sure statistics can help you with that. It could help with deciding which of two models fits the data better. For that you could use the sum of the squares of the differences between the model and the data. There isn't a deep reason for this, it is just the standard method that everyone uses, so you may as well use it unless you have some reason not to. 


Register to reply 
Related Discussions  
Cdf of a discrete random variable and convergence of distributions...  Set Theory, Logic, Probability, Statistics  3  
Discrete power law distributions  Set Theory, Logic, Probability, Statistics  1  
iterative expectation of continuous and discrete distributions  Calculus & Beyond Homework  2  
Discrete Distributions  Calculus & Beyond Homework  1  
Statistics Help Requested (Discrete Distributions)  Calculus & Beyond Homework  1 