- #1
24forChromium
- 155
- 7
There are two series on a graph, series A is the prediction of a value over time, series B is a curve of observed values over time. How can one quantify how much series A resemble series B?
andrewkirk said:The simplest and most usual method of measuring goodness of fit is the sum of squared errors.
https://en.wikipedia.org/wiki/Coefficient_of_determination24forChromium said:From what I have seen, this technique produces a value that is dependent on the magnitude of the points on the curve, for example, for time=10s, if the measured value is 10 and the predicted is 9, then the squared residual for that particular time is 1, but if the measured is 1000, and the predicted is 998, then the squared residual would be 4, how can I transform this into a format such that the squared residual or whatever the reported value would be, give consideration to the magnitude of the measured variable?
The coefficient of determination ('R-squared') that Micromass linked uses the sum of squared errors (SSE) together with a measure of the spread of the observed values to calculate the measure of fit, so that gives some of the consideration you are seeking. the R-squared is the most commonly used measure of fit in simple regressions.24forChromium said:how can I transform this into a format such that the squared residual or whatever the reported value would be, give consideration to the magnitude of the measured variable?
andrewkirk said:The coefficient of determination ('R-squared') that Micromass linked uses the sum of squared errors (SSE) together with a measure of the spread of the observed values to calculate the measure of fit, so that gives some of the consideration you are seeking. the R-squared is the most commonly used measure of fit in simple regressions.
If for some case-specific reason you wanted to give even stronger consideration to the values being predicted you could replace the SSE, which is an equally-weighted sum, by a weighted sum that gave more weight to the squares that you wanted to have more influence. For example you might replace the SSE by ##\sum_{k=1}^n (y_k-\hat{y}_k)^2|y_k|## if you wanted to put more emphasis on observations of larger values. You'd need to also change your method for calculating R-squared though, to reflect the different weighting scheme.
I would like to just make sure that this tells me how similar A1-A10 resemble B1-B10; and that the (r squared) is in some way "independent" of the magnitude of values for A and B.Dr. Courtney said:Put the predicted values and the observed values in two columns of a spreadsheet and use the correl function.
Say the predicted values are A1 to A10 and the observed values are B1 to B10. Compute the correlation with =correl(A1:A10,B1:B10).
The coefficient of determination discussed above (r squared) is this value (r) squared. This value (r) is the correlation between the predicted and observed values. It ranges from -1 to 1.
24forChromium said:I would like to just make sure that this tells me how similar A1-A10 resemble B1-B10; and that the (r squared) is in some way "independent" of the magnitude of values for A and B.
24forChromium said:Also, I have heard that by multiplying the r^2 by 100%, one can claim that blank percent of the variations in the observed dependent can be explained by the model, would this be true when the input is not the independent and the dependent but rather the theoretical and empirical?
The purpose of assessing the resemblance of two curves in statistics is to determine how similar or dissimilar they are. This can help us identify patterns and relationships between variables, make predictions, and draw conclusions about the data.
Some common methods for assessing the resemblance of two curves include visual inspection, correlation analysis, and regression analysis. These methods involve comparing the shape, direction, and strength of the two curves.
The results of a correlation analysis will provide a correlation coefficient, which is a numerical value between -1 and 1. A positive correlation coefficient indicates a positive relationship between the two curves, meaning they tend to increase or decrease together. A negative correlation coefficient indicates a negative relationship, meaning one curve tends to increase while the other decreases. A correlation coefficient close to 0 indicates no relationship between the curves.
The p-value in regression analysis is used to determine the significance of the relationship between the two curves. A p-value less than 0.05 is considered statistically significant, indicating that the relationship between the curves is not due to chance. On the other hand, a p-value greater than 0.05 suggests that the relationship is not significant and may be due to chance.
Data smoothing techniques, such as moving averages or polynomial fitting, can affect the assessment of resemblance between two curves by altering the shape and direction of the curves. It is important to carefully choose the appropriate smoothing technique for the data to accurately assess the resemblance between the curves.