Variance and goodness of fit tests

  • Context: Undergrad 
  • Thread starter Thread starter bioman
  • Start date Start date
  • Tags Tags
    Fit Variance
Click For Summary

Discussion Overview

The discussion revolves around assessing the goodness of fit of data to an exponential distribution, particularly focusing on the relationship between the theoretical variance of the distribution and the variance of the observed data. Participants explore the implications of a high regression coefficient and a significant discrepancy between the variances.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant questions the reliability of using variance to assess goodness of fit, noting a significant difference between the variance of the data and the theoretical variance of the exponential distribution.
  • Another participant asks for clarification on the model being estimated, suggesting a specific form of the cumulative distribution function for the exponential distribution.
  • A participant clarifies that they are assessing the goodness of fit for data expected to follow an exponential distribution with mean μ, expressing confusion over the variance discrepancy despite a good visual fit and high regression coefficient.
  • One participant hypothesizes that measurement error may be inflating the variance of the observed data, providing an illustrative example of how measurement error can affect variance and correlation.
  • The same participant suggests exploring other tests for goodness of fit as a potential avenue for further analysis.

Areas of Agreement / Disagreement

Participants express differing views on the reliability of variance as a measure of goodness of fit, with some proposing that measurement error could be a factor influencing the observed variance. No consensus is reached regarding the expected relationship between the variances.

Contextual Notes

Participants acknowledge the potential influence of measurement error on variance calculations, but the discussion does not resolve the underlying assumptions or the implications of this error on the goodness of fit assessment.

bioman
Messages
11
Reaction score
0
I'm trying to see how well my data fit a certain probability distribution (an exponential distribution) and I basically want to know how reliable is it to compare the the theoretical variance of the distribution and the variance of the data, to assess the goodness of fit of data to a distribution.

For example, when I plot a histogram of the data and overlay the theoretical distribution there is an extremely good fit, and this good fit is verified by a very high (~0.95) non-linear regression coefficient.

The odd thing is though when I compute the variance of the data, it is completely different to the variance of the theoretical distribution, almost double it all the time. Should this be happening, seeming as I get a very good fit with the histogram and regression??

It's just I have a very large sample size, ~10,000, so I taught if everything else fits well then the variance of the data should match the distribution??
So basically how reliable is the variance?
 
Physics news on Phys.org
What is the model you are estimating? Are you estimating Pr(x < z) = 1 - exp(-λz) as a function of z, then test λ = 1 (or test λ = λ*)? If not, why not?
 
Ok, perhaps I wasn't too clear.
My data comes from a model, which says in theory that data should follow an exponential distribution with mean \mu. So I'm simply just trying to assess goodness of fit of the data to an exponential distribution with mean \mu. Plotting the data (histogram) and exponential distribution together gives a very good fit (also very high regression coefficient), so I was presuming that the variance of the data should follow the variance described by the exponential distribution ie. \mu^2, but it doesn't, and is almost double the 'predicted' variance most of the time, so I was wondering whether this is normal?? ie. should I expect the variance in the data to equal the predicted variance from the exponential distribution, seeming as the graphs give a very good fit?
 
My guess is that your data have an error component and it is inflating the variance. Not knowing anything else, I'll call it the measurement error.

Suppose I am going to draw 4 values from some distribution. The expected values of my draws (e.g. the order statistics) are x(i) = -2, -1, 1, 2. The realized values have a random component r, driven by the underlying theoretical distribution. The realized values also have a measurement error ε, so y*(i) = y(i) + ε(i) = x(i) + r(i) + ε(i). Suppose the realizations are y*(i) = -2.18, -1.88, 1.54, 1.65. The correlation between x and y* is 0.96, so you might say that there is a "good fit," but var(y*) = 4.4 vs. var(x) = 3.3.

In the absence of a measurement error, suppose y(i) = -2.09, -1.44, 1.27, 1.83 (which values are "unobservable" to mere humans, but the probabilistic creatures who hang out in this forum can see them :smile:). Then Corr(x,y) = 0.99, so the fit is somewhat better; more importantly var(y) = 3.78, which is less than var(y*) and much closer to var(x).

You may want to look at other tests for goodness of fit.
 
Last edited:

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 5 ·
Replies
5
Views
9K
  • · Replies 11 ·
Replies
11
Views
3K
Replies
2
Views
5K