Statistics: How to assess the resemblance of two curves?

24forChromium · Nov 8, 2015

There are two series on a graph, series A is the prediction of a value over time, series B is a curve of observed values over time. How can one quantify how much series A resemble series B?

andrewkirk · Nov 8, 2015

The simplest and most usual method of measuring goodness of fit is the sum of squared errors.

24forChromium · Nov 9, 2015

andrewkirk said:

The simplest and most usual method of measuring goodness of fit is the sum of squared errors.

From what I have seen, this technique produces a value that is dependent on the magnitude of the points on the curve, for example, for time=10s, if the measured value is 10 and the predicted is 9, then the squared residual for that particular time is 1, but if the measured is 1000, and the predicted is 998, then the squared residual would be 4, how can I transform this into a format such that the squared residual or whatever the reported value would be, give consideration to the magnitude of the measured variable?

micromass · Nov 9, 2015

24forChromium said:

From what I have seen, this technique produces a value that is dependent on the magnitude of the points on the curve, for example, for time=10s, if the measured value is 10 and the predicted is 9, then the squared residual for that particular time is 1, but if the measured is 1000, and the predicted is 998, then the squared residual would be 4, how can I transform this into a format such that the squared residual or whatever the reported value would be, give consideration to the magnitude of the measured variable?

https://en.wikipedia.org/wiki/Coefficient_of_determination

andrewkirk · Nov 9, 2015

24forChromium said:

how can I transform this into a format such that the squared residual or whatever the reported value would be, give consideration to the magnitude of the measured variable?

The coefficient of determination ('R-squared') that Micromass linked uses the sum of squared errors (SSE) together with a measure of the spread of the observed values to calculate the measure of fit, so that gives some of the consideration you are seeking. the R-squared is the most commonly used measure of fit in simple regressions.

If for some case-specific reason you wanted to give even stronger consideration to the values being predicted you could replace the SSE, which is an equally-weighted sum, by a weighted sum that gave more weight to the squares that you wanted to have more influence. For example you might replace the SSE by ##\sum_{k=1}^n (y_k-\hat{y}_k)^2|y_k|## if you wanted to put more emphasis on observations of larger values. You'd need to also change your method for calculating R-squared though, to reflect the different weighting scheme.

24forChromium · Nov 9, 2015

andrewkirk said:

The coefficient of determination ('R-squared') that Micromass linked uses the sum of squared errors (SSE) together with a measure of the spread of the observed values to calculate the measure of fit, so that gives some of the consideration you are seeking. the R-squared is the most commonly used measure of fit in simple regressions.

If for some case-specific reason you wanted to give even stronger consideration to the values being predicted you could replace the SSE, which is an equally-weighted sum, by a weighted sum that gave more weight to the squares that you wanted to have more influence. For example you might replace the SSE by ##\sum_{k=1}^n (y_k-\hat{y}_k)^2|y_k|## if you wanted to put more emphasis on observations of larger values. You'd need to also change your method for calculating R-squared though, to reflect the different weighting scheme.

Okay, I will be honest with you, I didn't understand very much of what you were saying because you brought up a lot of concepts that I have never heard of. Please don't take this as me blaming you for not explaining well, you have already gave me a method that I had never thought of.

The things I don't understand in the first paragraph include:
-Is the coefficient of determination (R-squared) the same as the sum of all the squares of individual differences between two curves like what you told me?
-What is "Micromass" / "Micromass linked" is it a certain computer program?
-Is sum of squared errors (SSE) the same as sum of squared differences?
-"A measure of spread": Is this some general expression for the property of the measured data? Such as its average magnitude?
-I suppose the "measure of fit" just means the reported value for the "goodness of fit" generated by a software?
-Simple regression: such as linear relationship between dependent and independent?

With the magnitude of my ignorance, I did not believe that the weighting described in the second paragraph would be much help to me so I pretty much just skimmed over it, sorry if that would be a disrespect.

In conclusion, what I understand is your message is that some software can give reports on the resemblance of two curves with considerations of their properties automatically. Trouble is, not only is my understanding of technology rather basic, I am required to give clear explanation for the meaning of the "goodness of fit" that I report, I would appreciate it if you would show me a way to calculate (manually, dare I say?) the goodness of fit, maybe something like:
(Sum of squared differences) / (Average values of prediction)*(Average of actual data)
Of course that was just a wild guess that even I am skeptical about, but I hope it demonstrate my intention.

Dr. Courtney · Nov 9, 2015

Put the predicted values and the observed values in two columns of a spreadsheet and use the correl function.

Say the predicted values are A1 to A10 and the observed values are B1 to B10. Compute the correlation with =correl(A1:A10,B1:B10).

The coefficient of determination discussed above (r squared) is this value (r) squared. This value (r) is the correlation between the predicted and observed values. It ranges from -1 to 1.

24forChromium · Nov 9, 2015

Dr. Courtney said:

Put the predicted values and the observed values in two columns of a spreadsheet and use the correl function.

Say the predicted values are A1 to A10 and the observed values are B1 to B10. Compute the correlation with =correl(A1:A10,B1:B10).

The coefficient of determination discussed above (r squared) is this value (r) squared. This value (r) is the correlation between the predicted and observed values. It ranges from -1 to 1.

I would like to just make sure that this tells me how similar A1-A10 resemble B1-B10; and that the (r squared) is in some way "independent" of the magnitude of values for A and B.
Also, I have heard that by multiplying the r^2 by 100%, one can claim that blank percent of the variations in the observed dependent can be explained by the model, would this be true when the input is not the independent and the dependent but rather the theoretical and empirical?

Dr. Courtney · Nov 10, 2015

24forChromium said:

I would like to just make sure that this tells me how similar A1-A10 resemble B1-B10; and that the (r squared) is in some way "independent" of the magnitude of values for A and B.

Yes. For example, your could change the units of A and B without changing the value of the resulting r squared.

24forChromium said:

Also, I have heard that by multiplying the r^2 by 100%, one can claim that blank percent of the variations in the observed dependent can be explained by the model, would this be true when the input is not the independent and the dependent but rather the theoretical and empirical?

This is a common interpretation, but there are a lot of subtleties relating to the uncertainties, whether errors are random, and whether measurement errors are normally distributed.

You can report a correlation coefficient (r) and/or a coefficient of determination (r squared) simply as quantification of how much two series resemble each other (your original question) without trying to make the deeper interpretation.

Statistics: How to assess the resemblance of two curves?

1. What is the purpose of assessing the resemblance of two curves in statistics?

2. What are some common methods for assessing the resemblance of two curves?

3. How do you interpret the results of a correlation analysis?

4. What is the significance of the p-value in regression analysis for assessing the resemblance of two curves?

5. How does the choice of data smoothing technique affect the assessment of resemblance between two curves?

Similar threads

Hot Threads

Recent Insights