Variance and goodness of fit tests

  • Thread starter Thread starter bioman
  • Start date Start date
  • Tags Tags
    Fit Variance
AI Thread Summary
The discussion centers on assessing the goodness of fit of data to an exponential distribution, focusing on the discrepancy between the theoretical variance and the variance of the observed data. Despite a high regression coefficient and a good visual fit in the histogram, the observed variance is consistently almost double the theoretical variance. This raises questions about the reliability of variance as a measure of fit, particularly in light of potential measurement errors affecting the data. The conversation suggests that measurement error could inflate the observed variance, leading to the noted discrepancies. Exploring additional goodness of fit tests is recommended to further validate the model's accuracy.
bioman
Messages
11
Reaction score
0
I'm trying to see how well my data fit a certain probability distribution (an exponential distribution) and I basically want to know how reliable is it to compare the the theoretical variance of the distribution and the variance of the data, to assess the goodness of fit of data to a distribution.

For example, when I plot a histogram of the data and overlay the theoretical distribution there is an extremely good fit, and this good fit is verified by a very high (~0.95) non-linear regression coefficent.

The odd thing is though when I compute the variance of the data, it is completely different to the variance of the theoretical distribution, almost double it all the time. Should this be happening, seeming as I get a very good fit with the histogram and regression??

It's just I have a very large sample size, ~10,000, so I taught if everything else fits well then the variance of the data should match the distribution??
So basically how reliable is the variance?
 
Physics news on Phys.org
What is the model you are estimating? Are you estimating Pr(x < z) = 1 - exp(-λz) as a function of z, then test λ = 1 (or test λ = λ*)? If not, why not?
 
Ok, perhaps I wasn't too clear.
My data comes from a model, which says in theory that data should follow an exponential distribution with mean \mu. So I'm simply just trying to assess goodness of fit of the data to an exponential distribution with mean \mu. Plotting the data (histogram) and exponential distribution together gives a very good fit (also very high regression coefficent), so I was presuming that the variance of the data should follow the variance described by the exponential distribution ie. \mu^2, but it doesn't, and is almost double the 'predicted' variance most of the time, so I was wondering whether this is normal?? ie. should I expect the variance in the data to equal the predicted variance from the exponential distribution, seeming as the graphs give a very good fit?
 
My guess is that your data have an error component and it is inflating the variance. Not knowing anything else, I'll call it the measurement error.

Suppose I am going to draw 4 values from some distribution. The expected values of my draws (e.g. the order statistics) are x(i) = -2, -1, 1, 2. The realized values have a random component r, driven by the underlying theoretical distribution. The realized values also have a measurement error ε, so y*(i) = y(i) + ε(i) = x(i) + r(i) + ε(i). Suppose the realizations are y*(i) = -2.18, -1.88, 1.54, 1.65. The correlation between x and y* is 0.96, so you might say that there is a "good fit," but var(y*) = 4.4 vs. var(x) = 3.3.

In the absence of a measurement error, suppose y(i) = -2.09, -1.44, 1.27, 1.83 (which values are "unobservable" to mere humans, but the probabilistic creatures who hang out in this forum can see them :smile:). Then Corr(x,y) = 0.99, so the fit is somewhat better; more importantly var(y) = 3.78, which is less than var(y*) and much closer to var(x).

You may want to look at other tests for goodness of fit.
 
Last edited:
Back
Top