Goodness of Fit Test Without Uncertainties

1. Sep 24, 2011

nicholls

In what way can one do a goodness of fit test on a set of data that contains no uncertainties?

I am doing a simulated data analysis assignment for a physics lab course and they provided me with data for a supposed xray spectrum for some material. The data consists of energy bins with a count of photon detections for each bin (or something like that). There is no uncertainty on these values.

I can fit a couple gaussian distributions to it, but without uncertainties I am confused as to how to approach a goodness of fit test. I do not have much statistics knowledge so any help would be great!

2. Sep 25, 2011

Stephen Tashi

It isn't clear what you mean by "uncertainties". It also isn't clear why you can fit a couple of gaussian distributions to the data. Did you use two different methods of fitting?

The use of "bins" in a statistics problem hints at using a test like Pearson's Chi-Squared test. However, since this is a physics class, it would be wise to consult your course materials to see what they have in mind.

3. Sep 25, 2011

nicholls

Basically, the data I am given is a histogram where the x values are energy bins of width 0.05 KeV, and the y values are the counts of photon detections per energy bin. There is no uncertainity given for the y values.

The data consists of two peaks, both gaussian in shape. Thus, I have managed to fit using python, two gaussian functions to the two peaks, along with a linear function describing the background noise.

I would now like to perform a goodness of fit test on this fit. However, without explicit y uncertainties I am not sure how to proceed. Someone told me, although they did not give me a good explanation, that for the y uncertainty I can use 1/sqrt(N) where N is the count or y value for each bin.

Does this make more sense?

4. Sep 25, 2011

Stephen Tashi

You still haven't managed to define what "uncertainty" is. It may be obvious to physicists what it means in this context, but this is the math section and it isn't obvious to me what it means in the context of statistical tests. Are you talking about "standard deviation"? "probability"? "precision of measurement"?

If you have fit a distribution (which in your case is a mixture of distributions) to the data then the distribution you fit can be used to do the Pearson Chi-squared test or any other test which relies on computing the probability that $y_i$ values fell in bin $i$.

5. Sep 25, 2011

nicholls

What I mean by uncertainty is that there are no error bars on the y values. I'm not exactly sure but I suppose you would just call them standard deviations. Thus with 0 uncertainty, the chi-squared value of the fit tends to infinite.

6. Sep 25, 2011

Stephen Tashi

Look at the article on Pearson's Chi-square test in the wikipedia: http://en.wikipedia.org/wiki/Pearson's_chi-squared_test

The test statistic is $\chi^2 = \sum_{i=1}^n \frac { (O_i - E_i)^2} { E_i}$

There is nothing in the computation of that test statistic that involves error bars, so I don't undestand why it would tend to be infinite.

7. Sep 25, 2011

Stephen Tashi

One thing we need to clarify is physical model for how the data is generated.

Are we assuming that photon energy is a random variable and that each photon energy measurement is an independent realization of that random variable?

Or is the model some deterministic process?

8. Sep 25, 2011

nicholls

Pearson's chi squared seems to do the trick. Thanks for showing me that.

I'm not quite sure whether to say that the photon energy is a random variable although I am inclined to say yes.

You could have a quick read of the outline of the experiment if you want:

http://www.physics.utoronto.ca/~phy326/simdata/SDA%20Assignment.pdf" [Broken]

I am doing experiment A which is on the second page.

Last edited by a moderator: May 5, 2017