Line of regression substitution

Click For Summary

Homework Help Overview

The discussion revolves around the concept of linear regression, specifically the substitution of values in regression equations. Participants explore the reasoning behind why one can substitute a value of x to estimate y, but not the other way around, and the implications of correlation on these estimates.

Discussion Character

  • Conceptual clarification, Assumption checking, Mathematical reasoning

Approaches and Questions Raised

  • Participants question the validity of substituting y to estimate x and discuss the relationship between the slopes of linear regressions. There is an exploration of the conditions under which the product of the slopes equals the R-squared value, and the implications of correlation on regression estimates.

Discussion Status

Some participants have provided insights into the mathematical relationships involved in linear regression and the differences in error minimization between estimating y and estimating x. The discussion is ongoing with various interpretations being explored.

Contextual Notes

Participants note the importance of correlation and the conditions under which linear regression applies, including the potential for local invertibility in nonlinear relationships. There is an acknowledgment of the assumptions made in the context of the discussion.

Einstein44
Messages
125
Reaction score
31
Homework Statement
This is relatively straight forward, but I somehow forgot why this is:
Why is it that you can not substitute y to find x? I remember that this was the case, but I can't seem to remember why this actually is.
Relevant Equations
$$y=ax+b$$
This is the equation you get for a line of regression of a data set using the GDC...
I am not exactly sure in what context this is, as I cannot remember much about this and I couldn't find anything on the internet that mentioned this. I just hope someone understands what I mean :)
.
 
Physics news on Phys.org
I don't understand your question. In general the point of a linear regression is so you can substitute in a value of x to find a good guess for y. You won't get exactly the right answer just because you usually assume there's some noise in your prediction, is that what you mean?
 
Office_Shredder said:
I don't understand your question. In general the point of a linear regression is so you can substitute in a value of x to find a good guess for y. You won't get exactly the right answer just because you usually assume there's some noise in your prediction, is that what you mean?
Nevermind, I believe I phrased this wrong. I meant why you cannot substitute y to estimate x. Because I remember the prof saying that you can substitute x to estimate y, but not the other way around. And I forgot the reason and didn't find anything on this on the internet, so I thought maybe someone knows what I mean.
 
Oh yeah. I think the way to think about this is that you can consider two linear regressions (I'm going to assume the constant term comes out zero for both)

##y=\beta_x x##
##x= \beta_y y##.

It's tempting to think that ##\beta_x \beta_y =1##. But it's not, in fact in general the product of the betas is ##R^2## value of the linear regression, and only equals 1 when the two variables are perfectly correlated. As a simple example, suppose x and y are totally uncorrelated. Then ##\beta_x=\beta_y=0##. If they are only slightly correlated, you might get that ##\beta_x## and ##\beta_y## are both small and almost zero. Then trying to invert your linear regression is going to give you a very bad result for an estimate.
 
  • Like
Likes   Reactions: Einstein44
The regression shown was calculated to minimize the sum-squared-errors of the y estimates versus the y sample values. Those errors are the distances parallel to the Y-axis. If you want to estimate x, you would want a regression line that minimizes the sum-squared-errors of the x estimates versus the x sample values. Those errors are the distances parallel to the X-axis. So the minimization would be different.
 
  • Like
Likes   Reactions: Einstein44
As Schreder said, product of slopes is ## R^2##, where ##R## is the correlation coefficient.
Slopes are given as ##R \frac {s_{xx}}{s_{yy}}##, so that the products *

Barring cases where either denominator is ##0, R \frac {s_{yy}}{s_{xx}} * R \frac {s_{xx}}{s_{xx}} =R^2##,

Notice that for nonlinear relations, the relation between the two may be invertible only locally , e.g., for
Hooke's law ##y =kx^2 ##

* Barring cases when either is 0, which means data is constant.
 
  • Like
Likes   Reactions: Einstein44

Similar threads

Replies
4
Views
2K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 13 ·
Replies
13
Views
4K
Replies
3
Views
3K