Coefficient of Determination( Line Best Fit Error)

Click For Summary

Homework Help Overview

The discussion revolves around the Coefficient of Determination (r²) and its role in assessing how well a best-fit line represents the variance of a set of measured data. Participants explore the relationship between measured values and the average, questioning the interpretation of total error in the context of linear relationships.

Discussion Character

  • Conceptual clarification, Assumption checking

Approaches and Questions Raised

  • Participants discuss the definition and implications of the Coefficient of Determination, with some questioning how total error is calculated based on the average of measured values. Others express confusion over the terminology used in instructional materials and seek clarification on the relationship between measured values and the average.

Discussion Status

The conversation is ongoing, with participants offering insights into the interpretation of statistical measures and their application to real data. Some guidance has been provided regarding the terminology and conceptual understanding of error versus variation, but no consensus has been reached.

Contextual Notes

There are indications of confusion stemming from instructional videos and the terminology used to describe statistical concepts, particularly regarding the relationship between measured values and their averages in the context of error analysis.

sherrellbc
Messages
83
Reaction score
0
https://www.khanacademy.org/math/probability/regression/regression-correlation/v/r-squared-or-coefficient-of-determination

Homework Statement


So, when determining how effectively a best-fit line describes the variance of a given set of measured data, the Coefficient of Determination is the value that represents this information. Essentially, we look at the total error associated with our measured data, and find out the percentage of error that is present that our line doesn't describe. In doing so, we then subtract from 1 this value, and we resolve the percentage of variance our line does describe.

Homework Equations


That is, r² = 1 - (Error each measured value is from line)/(Total error)
** Where r² is the Coefficient of Determination. Just notation.
The actual formula requires a but more background information, which would make this post very, very long.

The Attempt at a Solution



It struck me as nonsense that we can determine the total error associated with our measurements(y-values) given only the difference between them an a seemingly arbitrary value such as the average of the Y values.

This would make sense if the y value was a constant, say 6. You could measure the total error by taking the difference of each measured y and the value 6. The average, at least to me, really does not represent anything. So, how can a measured value of y over the average of all measured y's represent an error of anything? If the measured y's were for the same x value, then a variation in y could be measured as an error. But if the y has a relationship with x such that it increases as x increases, how does y/y_bar represent error in any sense?

-----------------------------------------
For example:

You are given an unknown resistance. You decide to experimentally determine the resistance of the component by measuring its i-V (current, voltage) curve (response).

Given that X is voltage, and Y is current, you may measure something like this:

_In an ideal case:_
X = 10V, Y = 1Amp
X = 20V, Y = 2Amp
X = 30V, Y = 3Amp
If you plot this curve, there is quite obviously a linear relationship. And, if you are familiar with Ohm's relationship(LAW, if you like), we have the resistance = 10Ohms.

-- The point is, as Voltage increases, current increases as well for any constant resistance R. So, we have a positively sloping linear relationship.

So, from the ideal case above.
y_bar = 2 Amps.
So, given what we have in this video:
The total error associated with our measured values(current, Y), is given by:
(y1-y_bar)^2 + (y2-y_bar)^2 + (y3-y_bar)^2 = (1-2)^2 + (2-2)^2 + (3-2)^2 = 2

Given an ideal world, where the resistance was EXACTLY equal to 10Ohms, and we measured precisely the expected values of current needed to resolve this, how can we say that the measured data had a total error associated with our measured values of current equal to 2?
 
Physics news on Phys.org
sherrellbc said:
So, when determining how effectively a best-fit line describes the variance of a given set of measured data,
It doesn't describe the variance of the data. It describes a correlation.
the Coefficient of Determination is the value that represents this information. Essentially, we look at the total error associated with our measured data, and find out the percentage of error that is present that our line doesn't describe.
Not at all. There need not be any error in the data. The coefficient states the error in taking the straight line to be a match for the data. If the straight line happens to be exactly what the data should have looked like, and all the discrepancies came from the measurements, then it would represent the error in the data.
 
haruspex said:
It doesn't describe the variance of the data. It describes a correlation.

Not at all. There need not be any error in the data. The coefficient states the error in taking the straight line to be a match for the data. If the straight line happens to be exactly what the data should have looked like, and all the discrepancies came from the measurements, then it would represent the error in the data.

I didn't mean variance in terms of actual "variance" (sigma square), but rather how much it varied. If you watch the video, the whole thing it confusing with how it's worded.

And by, "Measure of error not described by the line," I mean after the line is in place how much error is associated with our data points and the line's value.
-If all data points were exactly on the line, the error would be zero.

My whole confusing was who the "entire" error can be summed up the by the difference between each y and average of all y values. ----------------
If you watch the video, Sal writes and explains this at about 6:20. Normally these videos are quite informative, but this particular video was just confusing and did not make sense.
-I understand the logic, but now how the total error is the difference between each y and y_bar(the average y value).

Would you mind watching the video? Or least from 6:20 forward a little bit to see what I am missing?
 
Last edited:
OK, I think I see your problem. The ##\Sigma (y_i-\bar{y})^2## is not a measure of error in the data. It probably shouldn't be called an 'error' at all in this context, juswt the variation in Y, but the formula arises so often in relation to statistical measures of error that it is often referred to as 'standard error'. You could think of it as the minimum error that would apply if you were to try represent the Y values as constant (instead of depending on X).
The video then discusses how much that error is reduced by using a sloped line instead of a horizontal one. The greater the proportion of the error that is eliminated, the more confident you can be that the mx+b fit is appropriate.
 

Similar threads

Replies
17
Views
3K
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
Replies
3
Views
2K
  • · Replies 11 ·
Replies
11
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
732
  • · Replies 8 ·
Replies
8
Views
7K