Line of regression substitution

AI Thread Summary
Linear regression allows for substituting a value of x to estimate y, but not the reverse due to the nature of correlation and regression analysis. The product of the slopes from two linear regressions (y on x and x on y) equals the R² value, which indicates the strength of their correlation. If x and y are uncorrelated, both slopes can be zero, leading to poor estimates when trying to invert the regression. The minimization of errors differs between estimating y from x and vice versa, as each regression targets different axes. Understanding these principles is crucial for accurate predictions in statistical modeling.
Einstein44
Messages
125
Reaction score
31
Homework Statement
This is relatively straight forward, but I somehow forgot why this is:
Why is it that you can not substitute y to find x? I remember that this was the case, but I can't seem to remember why this actually is.
Relevant Equations
$$y=ax+b$$
This is the equation you get for a line of regression of a data set using the GDC...
I am not exactly sure in what context this is, as I cannot remember much about this and I couldn't find anything on the internet that mentioned this. I just hope someone understands what I mean :)
.
 
Physics news on Phys.org
I don't understand your question. In general the point of a linear regression is so you can substitute in a value of x to find a good guess for y. You won't get exactly the right answer just because you usually assume there's some noise in your prediction, is that what you mean?
 
Office_Shredder said:
I don't understand your question. In general the point of a linear regression is so you can substitute in a value of x to find a good guess for y. You won't get exactly the right answer just because you usually assume there's some noise in your prediction, is that what you mean?
Nevermind, I believe I phrased this wrong. I meant why you cannot substitute y to estimate x. Because I remember the prof saying that you can substitute x to estimate y, but not the other way around. And I forgot the reason and didn't find anything on this on the internet, so I thought maybe someone knows what I mean.
 
Oh yeah. I think the way to think about this is that you can consider two linear regressions (I'm going to assume the constant term comes out zero for both)

##y=\beta_x x##
##x= \beta_y y##.

It's tempting to think that ##\beta_x \beta_y =1##. But it's not, in fact in general the product of the betas is ##R^2## value of the linear regression, and only equals 1 when the two variables are perfectly correlated. As a simple example, suppose x and y are totally uncorrelated. Then ##\beta_x=\beta_y=0##. If they are only slightly correlated, you might get that ##\beta_x## and ##\beta_y## are both small and almost zero. Then trying to invert your linear regression is going to give you a very bad result for an estimate.
 
The regression shown was calculated to minimize the sum-squared-errors of the y estimates versus the y sample values. Those errors are the distances parallel to the Y-axis. If you want to estimate x, you would want a regression line that minimizes the sum-squared-errors of the x estimates versus the x sample values. Those errors are the distances parallel to the X-axis. So the minimization would be different.
 
As Schreder said, product of slopes is ## R^2##, where ##R## is the correlation coefficient.
Slopes are given as ##R \frac {s_{xx}}{s_{yy}}##, so that the products *

Barring cases where either denominator is ##0, R \frac {s_{yy}}{s_{xx}} * R \frac {s_{xx}}{s_{xx}} =R^2##,

Notice that for nonlinear relations, the relation between the two may be invertible only locally , e.g., for
Hooke's law ##y =kx^2 ##

* Barring cases when either is 0, which means data is constant.
 
Back
Top