Variance and goodness of fit tests

bioman · May 14, 2007

I'm trying to see how well my data fit a certain probability distribution (an exponential distribution) and I basically want to know how reliable is it to compare the the theoretical variance of the distribution and the variance of the data, to assess the goodness of fit of data to a distribution.

For example, when I plot a histogram of the data and overlay the theoretical distribution there is an extremely good fit, and this good fit is verified by a very high (~0.95) non-linear regression coefficient.

The odd thing is though when I compute the variance of the data, it is completely different to the variance of the theoretical distribution, almost double it all the time. Should this be happening, seeming as I get a very good fit with the histogram and regression??

It's just I have a very large sample size, ~10,000, so I taught if everything else fits well then the variance of the data should match the distribution??
So basically how reliable is the variance?

EnumaElish · May 14, 2007

What is the model you are estimating? Are you estimating Pr(x < z) = 1 - exp(-λz) as a function of z, then test λ = 1 (or test λ = λ^*)? If not, why not?

bioman · May 14, 2007

Ok, perhaps I wasn't too clear.
My data comes from a model, which says in theory that data should follow an exponential distribution with mean [tex]\mu[/tex]. So I'm simply just trying to assess goodness of fit of the data to an exponential distribution with mean [tex]\mu[/tex]. Plotting the data (histogram) and exponential distribution together gives a very good fit (also very high regression coefficient), so I was presuming that the variance of the data should follow the variance described by the exponential distribution ie. [tex]\mu[/tex]^2, but it doesn't, and is almost double the 'predicted' variance most of the time, so I was wondering whether this is normal?? ie. should I expect the variance in the data to equal the predicted variance from the exponential distribution, seeming as the graphs give a very good fit?

EnumaElish · May 15, 2007

My guess is that your data have an error component and it is inflating the variance. Not knowing anything else, I'll call it the measurement error.

Suppose I am going to draw 4 values from some distribution. The expected values of my draws (e.g. the order statistics) are x(i) = -2, -1, 1, 2. The realized values have a random component r, driven by the underlying theoretical distribution. The realized values also have a measurement error ε, so y^*(i) = y(i) + ε(i) = x(i) + r(i) + ε(i). Suppose the realizations are y^*(i) = -2.18, -1.88, 1.54, 1.65. The correlation between x and y^* is 0.96, so you might say that there is a "good fit," but var(y^*) = 4.4 vs. var(x) = 3.3.

In the absence of a measurement error, suppose y(i) = -2.09, -1.44, 1.27, 1.83 (which values are "unobservable" to mere humans, but the probabilistic creatures who hang out in this forum can see them

). Then Corr(x,y) = 0.99, so the fit is somewhat better; more importantly var(y) = 3.78, which is less than var(y^*) and much closer to var(x).

You may want to look at other tests for goodness of fit.

Variance and goodness of fit tests

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad Understanding permutations and combinations in a coin toss experiment

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect