Interpreting a very small reduced chi squared value

  • Context: Graduate 
  • Thread starter Thread starter X-Kirk
  • Start date Start date
  • Tags Tags
    Chi Value
Click For Summary
SUMMARY

The discussion centers on the interpretation of a very small reduced chi-squared value (0.007) obtained from a chi-squared goodness-of-fit test applied to a linear model using Excel's LINEST function. The user has eight data points and is confused about the implications of the low chi-squared value, which typically indicates overestimated errors. The conversation highlights the necessity of understanding the underlying assumptions of the chi-squared test, particularly regarding the distribution of errors and the importance of using the correct degrees of freedom in calculations.

PREREQUISITES
  • Understanding of chi-squared goodness-of-fit tests
  • Familiarity with Excel's LINEST function for linear regression
  • Knowledge of statistical concepts such as degrees of freedom and reduced chi-squared
  • Basic grasp of error analysis and uncertainty estimation in measurements
NEXT STEPS
  • Research the implications of reduced chi-squared values in statistical modeling
  • Learn about the assumptions of normal distribution in error analysis
  • Explore alternative methods for estimating uncertainties in linear regression
  • Study the differences between fitting models and hypothesis testing in statistical analysis
USEFUL FOR

Researchers, data analysts, and experimental physicists who are applying statistical methods to analyze experimental data and seeking to understand the implications of goodness-of-fit tests in their analyses.

X-Kirk
Messages
4
Reaction score
0
So I have been analyzing data a took in an experiment recently and have been using a chi squared as a "goodness of fit" test to against a linear model.
I am using excel and used the LINEST function (least square fitting method) to get an idea for a theoretical gradient and intercept for my data. Using these I found a series of normalized residuals:
Ri =\frac{obs-exp}{error}
and my χ2 is the sum of these normalized residuals. I have then calculated my reduced χ2 by dividing this value by my number of degrees of freedom minus 2 (as i have a gradient and intercept).
However my reduced χ2 is much less than 1 (Specifically 0.007). I understand that this typically means I have over estimated my errors. I have checked these in great detail now, my errors are all in measurements which have very clearly defined uncertainty that I can't change. In light of this I am confused about how I should interpret this result. What does this test tell me?

I only have 8 data points for this particular set which I know is not a lot. Is it still reasonable to use chi square like this for such a small data set or might that be the reason I am not getting a very good result?
 
Physics news on Phys.org
X-Kirk said:
Ri =\frac{obs-exp}{error}

Confusing description. What is exp (do you mean estimated), and error? Please state the hypothesis to test and against what alternative. Have you checked that your test statistic is distributed as chi sq under null hyp.

Though I have not realized your problem and method exactly, but for goodness of fit, there should be some classes. Then freq chi sq is of the form Ʃ[(obs freq-exp freq)2)/exp freq].
 
Last edited:
Sorry exp means expected/theoretical value.

The experiment deals with the resistivity of a super conductor at different temperatures. I have the characteristic result where the resistivity is zero until reaching a critical temperature, then a rapid near vertical increase followed by a straight line above the critical temperature like so:http://www-outreach.phy.cam.ac.uk/physics_at_work/2011/exhibit/images/irc1.jpg

What i am trying to do it take the section of the graph above Tc and use chi squared to tell me how good a fit it is to linear. Then from there I could determine the best fitting gradient and its uncertainty. Is this clearer?

My understanding was the equation you quoted there was specifically for situations involving counting where the distribution of measurements is a Poisson probability distribution? Whereas I am doing a least-squares fit to a straight line with non-uniform error bars. I may have misunderstood which method to use?
 
Last edited by a moderator:
X-Kirk said:
So I have been analyzing data a took in an experiment recently and have been using a chi squared as a "goodness of fit" test to against a linear model.
I am using excel and used the LINEST function (least square fitting method) to get an idea for a theoretical gradient and intercept for my data. Using these I found a series of normalized residuals:
Ri =\frac{obs-exp}{error}
and my χ2 is the sum of these normalized residuals. I have then calculated my reduced χ2 by dividing this value by my number of degrees of freedom minus 2 (as i have a gradient and intercept).

It appears you are following the same procedure as in the Wikipedia article http://en.wikipedia.org/wiki/Goodness_of_fit shown in the Example for Regression Analysis.

Why didn't you divide by 2-1=1 instead of by 2?

That example says you need to know the standard deviation of the population of errors. The \sigma does not denote a value that you calculate from your sample of data.


What i am trying to do it take the section of the graph above Tc and use chi squared to tell me how good a fit it is to linear. Then from there I could determine the best fitting gradient and its uncertainty. Is this clearer?

Since you determined the slope of the line that minimized the sum of the squares of the errors, I don't understand what you mean by finding the best fitting the gradient after you do a chi-square test.

The question what the "uncertainty" of the gradient means is complicated, although many curve-fitting software packages purport to give a number for it. I think the "uncertainty" of the gradient amounts to an estimate of its standard devation - but this is not straightforward. You don't have several samples of the best fitting gradient. You only have one value of it. So trying to compute its standard deviation requires some assumptions.
 
The reference book I am working from (Measurements and their Uncertainties by I.G. Hughs & &. Hase) defines reduced chi squared as:
χ\nu2=\frac{1}{\nu}Ʃ\frac{(y<sub>obs</sub>-y<sub>exp</sub>)<sup>2</sup>}{α<sup>2</sup>}
where α is the uncertainty on the individual yobs.
and \nu is the number of data points minus your degrees of freedom. On the wikipedia page this is effectively the same thing as N-n-1, here I have 8 data points, and 2 DoF so my \nu=6. I don't see why I need the standard deviation?

I determined a rough estimate for the slope using a poor least square fitting method. Then using the solver app built into excel i minimized my chi squared by varying my estimates for gradient and intercept slightly. This gives me the best approximation (that I know of) to a straight line. Then I can use solver again to vary the gradient and until my chi squared is equal to χ2+1. The difference between these gradients gives me the uncertainty.

The original question was what does it mean to have such a small χ2? Is there any situation in which this is appropriate or does it always mean you have over estimated your uncertainties?
 
X-Kirk said:
My understanding was the equation you quoted there was specifically for situations involving counting where the distribution of measurements is a Poisson probability distribution? Whereas I am doing a least-squares fit to a straight line with non-uniform error bars. I may have misunderstood which method to use?
My method has nothing to do with Poisson or any other inherent error distribution.
Your method is not applicable unless the errors follow a normal distribution.
Do you have three series of data: fitted, theoretical and observed?
One descriptive way to to know the fit quality is the sq of correln coeff between fitted and obs values. This value is the fraction of total variance explained by regression.
 
Last edited:
X-Kirk said:
I don't see why I need the standard deviation?
What do you mean by "uncertainty", if you don't take it to mean "standard deviation"?

The difference between these gradients gives me the uncertainty.
Is that what your reference book says to do to find "the uncertainty" or is this your own concept?

The original question was what does it mean to have such a small χ2? Is there any situation in which this is appropriate or does it always mean you have over estimated your uncertainties?

Chi-square is a statistic, so it is a random variable. If you have chosen the correct model then there is a still a small probability that Chi-square can take on an extremely large or small value. So a particular value of chi-square doesn't "mean" anything, in the sense of guaranteeing a particular conclusion or hypothesis. If we set-up the formalities of a statistical "hypothesis test" then we can say precisely what various statistics tell us. I don't know if that's what you are trying to do.

Here are two scenarios. Which is more likely to produce a small chi-square value?

1) Compare a model, known (or assumed) to be correct, to the data and compute the reduced chi-square statistic.
2) Take the data and fit whatever model minimizes the reduced chi-square statistic for that particular data.

Scenario 2) always gives you the smaller chi-square value, since you can adjust the model to fit the particular data. You can pick the model to be the correct model if it minimizes chi-square or you can pick the model to be a wrong model if the wrong model minimizes chi-square.

The Wikipedia article is talking about scenario 1). You are doing scenario 2). I don't know what your book is doing, but I suspect it is scenario 1).

Applying statistics and curve-fiting to a real world problem is a subjective process. To pose it as a mathematical problem that has a definite solution takes more information or assumptions that most people care to deal with. The way that I'd explain the subjective statements (in the Wikipedia article and in other textbooks) about "over-fitting" and "under fitting" is this: If you compare the correct model to the data then comparing its predictions to typical experimental data will most likely give you "an average amount of error", not an extremely large amount or an extremely small amount. To make practical use of this idea, you have to know what "an average amount of error" is. You can't know what "an average amount of error is" if all you have is the data. To know what "average error" is , you need to know more facts, such as the precision of measuring equipment used in an experiment. This is why the Wikipedia article says you must know \sigma (instead of estimating \sigma from the raw data).
 
Last edited:

Similar threads

  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 23 ·
Replies
23
Views
3K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 20 ·
Replies
20
Views
4K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 17 ·
Replies
17
Views
3K