# Statistics: How to assess the resemblance of two curves?

1. Nov 8, 2015

### 24forChromium

There are two series on a graph, series A is the prediction of a value over time, series B is a curve of observed values over time. How can one quantify how much series A resemble series B?

2. Nov 8, 2015

3. Nov 9, 2015

### 24forChromium

From what I have seen, this technique produces a value that is dependent on the magnitude of the points on the curve, for example, for time=10s, if the measured value is 10 and the predicted is 9, then the squared residual for that particular time is 1, but if the measured is 1000, and the predicted is 998, then the squared residual would be 4, how can I transform this into a format such that the squared residual or whatever the reported value would be, give consideration to the magnitude of the measured variable?

4. Nov 9, 2015

### micromass

Staff Emeritus
https://en.wikipedia.org/wiki/Coefficient_of_determination

5. Nov 9, 2015

### andrewkirk

The coefficient of determination ('R-squared') that Micromass linked uses the sum of squared errors (SSE) together with a measure of the spread of the observed values to calculate the measure of fit, so that gives some of the consideration you are seeking. the R-squared is the most commonly used measure of fit in simple regressions.

If for some case-specific reason you wanted to give even stronger consideration to the values being predicted you could replace the SSE, which is an equally-weighted sum, by a weighted sum that gave more weight to the squares that you wanted to have more influence. For example you might replace the SSE by $\sum_{k=1}^n (y_k-\hat{y}_k)^2|y_k|$ if you wanted to put more emphasis on observations of larger values. You'd need to also change your method for calculating R-squared though, to reflect the different weighting scheme.

6. Nov 9, 2015

### 24forChromium

Okay, I will be honest with you, I didn't understand very much of what you were saying because you brought up a lot of concepts that I have never heard of. Please don't take this as me blaming you for not explaining well, you have already gave me a method that I had never thought of.

The things I don't understand in the first paragraph include:
-Is the coefficient of determination (R-squared) the same as the sum of all the squares of individual differences between two curves like what you told me?
-What is "Micromass" / "Micromass linked" is it a certain computer program?
-Is sum of squared errors (SSE) the same as sum of squared differences?
-"A measure of spread": Is this some general expression for the property of the measured data? Such as its average magnitude?
-I suppose the "measure of fit" just means the reported value for the "goodness of fit" generated by a software?
-Simple regression: such as linear relationship between dependent and independent?

With the magnitude of my ignorance, I did not believe that the weighting described in the second paragraph would be much help to me so I pretty much just skimmed over it, sorry if that would be a disrespect.

In conclusion, what I understand is your message is that some software can give reports on the resemblance of two curves with considerations of their properties automatically. Trouble is, not only is my understanding of technology rather basic, I am required to give clear explanation for the meaning of the "goodness of fit" that I report, I would appreciate it if you would show me a way to calculate (manually, dare I say?) the goodness of fit, maybe something like:
(Sum of squared differences) / (Average values of prediction)*(Average of actual data)
Of course that was just a wild guess that even I am skeptical about, but I hope it demonstrate my intention.

7. Nov 9, 2015

### Dr. Courtney

Put the predicted values and the observed values in two columns of a spreadsheet and use the correl function.

Say the predicted values are A1 to A10 and the observed values are B1 to B10. Compute the correlation with =correl(A1:A10,B1:B10).

The coefficient of determination discussed above (r squared) is this value (r) squared. This value (r) is the correlation between the predicted and observed values. It ranges from -1 to 1.

8. Nov 9, 2015

### 24forChromium

I would like to just make sure that this tells me how similar A1-A10 resemble B1-B10; and that the (r squared) is in some way "independent" of the magnitude of values for A and B.
Also, I have heard that by multiplying the r^2 by 100%, one can claim that blank percent of the variations in the observed dependent can be explained by the model, would this be true when the input is not the independent and the dependent but rather the theoretical and empirical?

9. Nov 10, 2015

### Dr. Courtney

Yes. For example, your could change the units of A and B without changing the value of the resulting r squared.

This is a common interpretation, but there are a lot of subtleties relating to the uncertainties, whether errors are random, and whether measurement errors are normally distributed.

You can report a correlation coefficient (r) and/or a coefficient of determination (r squared) simply as quantification of how much two series resemble each other (your original question) without trying to make the deeper interpretation.