# How to compare two data sets with statistics?

1. Dec 9, 2013

### elegysix

I have two questions:

I have a set of data, a measured spectrum. When I model the spectrum with a function, I calculate r2=1-($\sum$(y-ymodel)2/$\sum$(y-yavg)2).

Q1) However, I have reference data now, which is what the spectrum should be. So is it right to use the same calculation on it for r2, but instead of using ymodel, using yreference?

Q2) The model function I was fitting to the data is
Sλ = 2πhc25(ehc/λkT-1)
Is it correct to calculate goodness of fit in that way for such a distribution?

Here is a plot of my two data sets

thanks!

2. Dec 10, 2013

### Simon Bridge

Q1> what does it mean: "what the spectrum should be"
There is what the spectrum is and what the model predicts - surely it "should be" whatever it actually is.

Q2> To decide what to do you need, first, to define the problem.
What is it you are trying to find out?

If you want to see if the model is a good fit to the data, then a goodness fit is probably warranted.
Make sure that the approach you use answers the questions you are asking.

What I am reading above is that you have not asked a clear enough question to know how to proceed.

Suspect you may need these:
http://home.comcast.net/~szemengtan/ [Broken]
... "Inverse Problems" towards the bottom of the page.

Those data plots are seriously cool btw.

Last edited by a moderator: May 6, 2017
3. Dec 10, 2013

### elegysix

Thanks, sorry for being unclear.
Forget that I mentioned a "model"

"what the spectrum should be" is the ASTMG173.
We captured the solar spectrum and want to compare it with a reference spectrum (the ASTMG173) to show that our measurements are accurate.

the question is - how can I properly use statistics to say how well these two data sets match?

Is it appropriate to use this calculation: $r^{2} = 1 - \frac{\sum(y_{r} - y_{s})^{2} }{\sum(y_{r} - \bar{y_{r}})^{2} }$

where $y_{r}$ is the reference y data, and $y_{s}$ is our measured y data, and $\bar{y_{r}}$ is the mean of the reference y data.

thanks

4. Dec 11, 2013

### Simon Bridge

So you are testing the measuring method, to show that it is sound?

You want to use the coefficient of determination test?
I think you have the roles of the data-sets reversed.

There are other goodness of fit tests - i.e. chi-squared - what lead you to choose this one?

5. Dec 11, 2013

### elegysix

yes.

Not necessarily. I want to use whatever test is appropriate for this.

I am not familiar with the others, that is why I made this thread. Which test should I use? what would you use?

thanks

6. Dec 11, 2013

### Simon Bridge

I see ... I cannot see anything immediately ruling out a CoD test.
I would use Chi-squared... but that's me.

Really you are comparing two data-sets and asking if they are close enough to come from the same forward function rather than checking a data set against a theoretical model of a forward function.

The inverse problems papers I linked you to (post #2) gives a lot of detail on different rationales for goodness of fit in different circumstances.