https://www.khanacademy.org/math/probability/regression/regression-correlation/v/r-squared-or-coefficient-of-determination 1. The problem statement, all variables and given/known data So, when determining how effectively a best-fit line describes the variance of a given set of measured data, the Coefficient of Determination is the value that represents this information. Essentially, we look at the total error associated with our measured data, and find out the percentage of error that is present that our line doesn't describe. In doing so, we then subtract from 1 this value, and we resolve the percentage of variance our line does describe. 2. Relevant equations That is, r² = 1 - (Error each measured value is from line)/(Total error) ** Where r² is the Coefficient of Determination. Just notation. The actual formula requires a but more background information, which would make this post very, very long. 3. The attempt at a solution It struck me as nonsense that we can determine the total error associated with our measurements(y-values) given only the difference between them an a seemingly arbitrary value such as the average of the Y values. This would make sense if the y value was a constant, say 6. You could measure the total error by taking the difference of each measured y and the value 6. The average, at least to me, really does not represent anything. So, how can a measured value of y over the average of all measured y's represent an error of anything? If the measured y's were for the same x value, then a variation in y could be measured as an error. But if the y has a relationship with x such that it increases as x increases, how does y/y_bar represent error in any sense? ----------------------------------------- For example: You are given an unknown resistance. You decide to experimentally determine the resistance of the component by measuring its i-V (current, voltage) curve (response). Given that X is voltage, and Y is current, you may measure something like this: _In an ideal case:_ X = 10V, Y = 1Amp X = 20V, Y = 2Amp X = 30V, Y = 3Amp If you plot this curve, there is quite obviously a linear relationship. And, if you are familiar with Ohm's relationship(LAW, if you like), we have the resistance = 10Ohms. -- The point is, as Voltage increases, current increases as well for any constant resistance R. So, we have a positively sloping linear relationship. So, from the ideal case above. y_bar = 2 Amps. So, given what we have in this video: The total error associated with our measured values(current, Y), is given by: (y1-y_bar)^2 + (y2-y_bar)^2 + (y3-y_bar)^2 = (1-2)^2 + (2-2)^2 + (3-2)^2 = 2 Given an ideal world, where the resistance was EXACTLY equal to 10Ohms, and we measured precisely the expected values of current needed to resolve this, how can we say that the measured data had a total error associated with our measured values of current equal to 2?