How to compare two data sets with statistics?

Click For Summary
The discussion focuses on comparing a measured solar spectrum with a reference spectrum (ASTMG173) to assess measurement accuracy. The main questions involve the appropriate use of the coefficient of determination (r²) for this comparison and whether it is suitable for the given data sets. Participants suggest that while r² can be used, alternative goodness-of-fit tests like chi-squared may be more appropriate depending on the context. There is an emphasis on clearly defining the problem and the goal of the analysis before selecting a statistical method. The conversation highlights the importance of choosing the right statistical test to determine how well the two data sets match.
elegysix
Messages
404
Reaction score
15
I have two questions:

I have a set of data, a measured spectrum. When I model the spectrum with a function, I calculate r2=1-(\sum(y-ymodel)2/\sum(y-yavg)2).

Q1) However, I have reference data now, which is what the spectrum should be. So is it right to use the same calculation on it for r2, but instead of using ymodel, using yreference?

Q2) The model function I was fitting to the data is
Sλ = 2πhc25(ehc/λkT-1)
Is it correct to calculate goodness of fit in that way for such a distribution?


Here is a plot of my two data sets

unnamed.jpg


thanks!
 
Physics news on Phys.org
Q1> what does it mean: "what the spectrum should be"
There is what the spectrum is and what the model predicts - surely it "should be" whatever it actually is.

Q2> To decide what to do you need, first, to define the problem.
What is it you are trying to find out?

If you want to see if the model is a good fit to the data, then a goodness fit is probably warranted.
Make sure that the approach you use answers the questions you are asking.

What I am reading above is that you have not asked a clear enough question to know how to proceed.

Suspect you may need these:
http://home.comcast.net/~szemengtan/
... "Inverse Problems" towards the bottom of the page.

Those data plots are seriously cool btw.
 
Last edited by a moderator:
Thanks, sorry for being unclear.
Forget that I mentioned a "model"

"what the spectrum should be" is the ASTMG173.
We captured the solar spectrum and want to compare it with a reference spectrum (the ASTMG173) to show that our measurements are accurate.

the question is - how can I properly use statistics to say how well these two data sets match?

Is it appropriate to use this calculation: r^{2} = 1 - \frac{\sum(y_{r} - y_{s})^{2} }{\sum(y_{r} - \bar{y_{r}})^{2} }

where y_{r} is the reference y data, and y_{s} is our measured y data, and \bar{y_{r}} is the mean of the reference y data.

thanks
 
So you are testing the measuring method, to show that it is sound?

You want to use the coefficient of determination test?
I think you have the roles of the data-sets reversed.

There are other goodness of fit tests - i.e. chi-squared - what lead you to choose this one?
 
Simon Bridge said:
So you are testing the measuring method, to show that it is sound?
yes.

Simon Bridge said:
You want to use the coefficient of determination test?
Not necessarily. I want to use whatever test is appropriate for this.


Simon Bridge said:
There are other goodness of fit tests - i.e. chi-squared - what lead you to choose this one?
I am not familiar with the others, that is why I made this thread. Which test should I use? what would you use?

thanks
 
I see ... I cannot see anything immediately ruling out a CoD test.
I would use Chi-squared... but that's me.

Really you are comparing two data-sets and asking if they are close enough to come from the same forward function rather than checking a data set against a theoretical model of a forward function.

The inverse problems papers I linked you to (post #2) gives a lot of detail on different rationales for goodness of fit in different circumstances.
 
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 5 ·
Replies
5
Views
6K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 20 ·
Replies
20
Views
3K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K