Reduced chi square test in physics

1. Jul 15, 2016

JulienB

1. The problem statement, all variables and given/known data

Hi everybody! Our experiments teacher asked us to perform a reduced chi square test in order to estimate how good a model fits to our measured data. The experiment was the melde's experiment (vibration of a string) and we measured the frequency $f_n$ for $n=1$ to $9$. The string had a fixed length $l=0.6$m and we had an unknown tension $F_0$ acting on the string. We therefore had to perform a linear regression in order to find $c_{transverse}$.

To find the standard deviation, we measured $f_9$ six times, calculated its standard deviation and applied it to all $f_n$ as an estimation. Therefore, $\sigma = 2.2358$Hz for each measured frequency. We got nevertheless a different overall gaussian uncertainty for each frequency due to the apparatus, but after rounding every uncertainty is equal to $\Delta f_n = 2$Hz.

The (linear) fit model is of the form $f(n) = a \cdot n$ with $a = 159.4$ (see fit in attached picture). Here is a table with the relevant data, what we measured and what the fit model predicts. Note that in the following calculations I'm using the rounded data, but I imagine it should normally not be rounded in the reduced chi square test. The reason is that I don't have the original data with me right now, but I'll use it instead as soon as I get it. Until then I assume it makes only little difference.

$$\begin{array}{l ccccccccc} \hline n & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\ \hline f_{n,measured} & 158 & 316 & 475 & 637 & 793 & 963 & 1111 & 1281 & 1432 \\ f_{n,expected} & 159.4 & 318.8 & 478.2 & 637.6 & 797 & 956.4 & 1115.8 & 1275.2 & 1434.6 \\ \hline \end{array}$$

2. Relevant equations

So here is my first confusion: the equations to calculate chi square are not the same depending on the source! Maybe there's an explanation I don't understand. I find mostly

$\chi^2 = \sum \frac{(\mbox{observed} - \mbox{expected})^2}{\mbox{expected}}$

but sometimes it is

$\chi^2 = \sum \bigg( \frac{\mbox{observed} - \mbox{expected}}{\sigma} \bigg)^2$

and those don't seem equivalent to me, unless I misunderstand what "observed" and "expected" mean... In some sources "expected" refers to what the model predicts, but sometimes it seems to refer to the mean value! I would assume "expected" = mean value when random events are measured, but it is hard to find a confirmation through Google. Anyway the reduced chi square test is then

$\chi_{red}^2 = \frac{\chi^2}{\mbox{DoF}}$

3. The attempt at a solution

Okay I have not much to write in this section. I have tried MANY ways to make this work, but I can't trust any result. If I stick to my first understanding of $\chi^2$, I get

$\chi^2 = \sum \frac{(f_{n,measured} - f_{n,expected})^2}{f_{n,expected}} = 0.1762$

which doesn't really look like what I expected. As a matter of fact, the reduced chi square test gives a very unsatisfying value of

$\chi_{red}^2 = 0.0220$

where I considered $\mbox{DoF} = 8$ (is that right?). The value indicates I would be overfitting the data, or that the error was overestimated. But how is the error even playing a role here? I find it also hard to believe that it would be overestimated. I must be doing something wrong, but I can't find where.

Julien.

Attached Files:

• image.png
File size:
84 KB
Views:
64
2. Jul 15, 2016

JulienB

I'm adding a more general question to this post: when searching infos about the chi square tests, what stroke me is that they mostly refer to "counts", that is how many measurements fall in a bin. It seems to have been designed this way for population tests, and it makes sense to me. I performed a chi square test to check which type of distribution suited data from a radioactivity experiment, and that made total sense to me. But in such a physics experiment, where my hypothesis is to check if the linear fit model I have represents well the measured data, I'm confused and I don't understand how 1 would be the ideal result for the reduced chi square test and not simply 0.

3. Jul 15, 2016

JulienB

Actually when I use the formula $\chi_{red}^2 = \frac{1}{\mbox{DoF}} \cdot \sum \bigg( \frac{\mbox{obs} - \mbox{exp}}{\sigma} \bigg)^2$, something reasonable comes out. I get $\chi^2 = 35.85$ but with 8 degrees of freedom that gives $\chi_{red}^2 = 4.4813$. This makes sense to me, because it would suggest that we may have set our uncertainties too low. When looking at the numbers, that's probably true: $\sigma = 2.2358$Hz and still the Gaussian uncertainties are even smaller than that. Would you guys agree with that assumption?

Thanks a lot.

Julien.

Last edited: Jul 15, 2016
4. Jul 15, 2016

gleem

$\chi_{red}^2 = \frac{1}{\mbox{DoF}} \cdot \sum \bigg( \frac{\mbox{obs} - \mbox{exp}}{\sigma} \bigg)^2$ is what you should use. It is the general definition of reduced χ2

A large χred2 can mean the fit is not that good, your error estimate is low as you say , both, or you have a random outlier . Have you check a χred2 table for a χred2 > 4 for 8 DofF would be extremely rare if the measurements corresponded to the assumed model and the uncertainties were reasonable.

5. Jul 15, 2016

JulienB

@gleem Thanks for your answer. I did not check a table yet, but I will do it asap. Have you checked my attached picture though? The fit looks good (and the relationship we're trying to demonstrate should be working), but my uncertainties are very low ($\pm 2$Hz and using a lousy equipment with for example a piece of paper under the "string holder" to make it straight). Is there another way than "my fit looks good" to find out whether the model or the uncertainties are responsible for the unsatisfying $\chi^2$? I'm planning on writing a matlab function that could differentiate both cases, if that's possible mathematically.

Julien.

6. Jul 15, 2016

gleem

I looked at your fit and the only curious aspect is that the first four data point have very little variation from the fit and the residuals have the same sign not that this is that unlikely. It might just be coincidental but you determined your experimental error from n=9 and its deviation is consistent with the first four point which contribute signifcantly less (only 18% of the total) to χ2 than n=5 - 9. The question could be why do you have so much variation for n= 5 to 9? Could there be other sources of error?

7. Jul 16, 2016

JulienB

Hi @gleem and thanks again for your answer. First I've redone the reduced chi square test with the not rounded values, and I logically get a better value $\chi^2 /$Dof $= 3.4932$ since the not rounded errors are higher than $2$Hz.

Regarding what you said about the fit, I am a bit clueless. The error is calculated via a pythagorean addition of the standard deviation and the machine error. The machine uncertainty is depending on the range, which means that over $1000$Hz the uncertainty gets higher, but the difference is so small it is almost negligible (for reference $\Delta f_6 = 2.2902$ and $\Delta f_9 = 2.3008$) and that applies only for $n=7$ to $9$. There could of course be other sources of error, for example when placing the magnetic detector under the antinodes of the oscillation, but those can hardly be quantified. I already gave my homework, but just out of curiosity for future reports, would it be acceptable to say "the reduced chi square test seems to indicate we have underestimated our uncertainty, possibly because of sources of error that were not considered. It would be more realistic to define an error of $1$% for each measurement." The percentage I gave is only an example (that gives a $\chi^2 /$ Dof $= 0.35964$), but though it would not be the first time i would estimate an uncertainty, it sounds not quite rigorous enough since the high value of the reduced chi square test could also imply a poor fitting model.

Julien.

8. Jul 16, 2016

gleem

Are you sure. If they affect the reading then they must be evaluated. In determining your error for n=9 did you reposition the sensor for each measurement or not. If you reposition the sensor you can account for the positioning variation.

Good uncertainty estimate is crucial to the assessment of experimental results so spend enough time in their determination. Remember too large or too small of an uncertainty determination could obscure valid conclusions or lead to false conclusions.

Regarding uncertainty estimates, use sound statistical and physical reasoning to estimate uncertainties. If for some reason they do not seem correct go back and try to find the problem. Search for sources of error that might have been overlooked , reevaluate assumptions concerning errors , look for problems in the execution of the experiments themselves, question the performance of your equipment, reexamine any calculations that were performed to obtain the final values used in the fit.

The calibration of your instruments will affect your readings and you must account for that. You are assuming only random uncertainties. A systematic shift in your data which is inconsistent with your model. Maybe they are more terms needed to model your data. If so then they must be explained. If you take data over a long period of time there maybe drift if you do not recalibrate or check periodically. There may be non linearities in the reading or temperature effects or sensitivity to electrical power, unexpected interference. Experimental science can be very challenging.

If the data is copacetic, and all errors systematically and randomly are accounted for, statistical tests can only support conclusions about the data they cannot prove anything except maybe that you might have a problem. In the case of the X2 test error estimates and the variation of the data from the expected value are intertwined so you must look elsewhere for an explanation of an unacceptable value either too large or too small.

One final thing the F-test can be used to compare two or more X2 tests to see if they are statistically different as when you do one fit with 2 parameter and another with 3. This test will tell you if the added parameter has a significant affect on the fit or if you compare two fits of the same type of data with the same parameters you can determine if they are from the same population ie. statistically consistent.