Line of regression substitution

Click For Summary
SUMMARY

The discussion centers on the limitations of substituting values in linear regression, specifically why one can substitute x to estimate y, but not vice versa. It is established that the product of the slopes (βx and βy) in two linear regressions equals the R² value, which only equals 1 when the variables are perfectly correlated. The conversation emphasizes that estimating x from y requires a different regression line that minimizes errors parallel to the x-axis, rather than the y-axis, highlighting the importance of correlation and the nature of the data.

PREREQUISITES
  • Understanding of linear regression concepts
  • Familiarity with R² and correlation coefficients
  • Knowledge of error minimization techniques in regression analysis
  • Basic algebraic manipulation of regression equations
NEXT STEPS
  • Study the derivation and implications of the R² value in regression analysis
  • Learn about error minimization techniques for estimating x from y
  • Explore the differences between simple and multiple linear regression
  • Investigate nonlinear regression models and their properties
USEFUL FOR

Data analysts, statisticians, and students studying regression analysis who seek to deepen their understanding of the limitations and applications of linear regression models.

Einstein44
Messages
125
Reaction score
31
Homework Statement
This is relatively straight forward, but I somehow forgot why this is:
Why is it that you can not substitute y to find x? I remember that this was the case, but I can't seem to remember why this actually is.
Relevant Equations
$$y=ax+b$$
This is the equation you get for a line of regression of a data set using the GDC...
I am not exactly sure in what context this is, as I cannot remember much about this and I couldn't find anything on the internet that mentioned this. I just hope someone understands what I mean :)
.
 
Physics news on Phys.org
I don't understand your question. In general the point of a linear regression is so you can substitute in a value of x to find a good guess for y. You won't get exactly the right answer just because you usually assume there's some noise in your prediction, is that what you mean?
 
Office_Shredder said:
I don't understand your question. In general the point of a linear regression is so you can substitute in a value of x to find a good guess for y. You won't get exactly the right answer just because you usually assume there's some noise in your prediction, is that what you mean?
Nevermind, I believe I phrased this wrong. I meant why you cannot substitute y to estimate x. Because I remember the prof saying that you can substitute x to estimate y, but not the other way around. And I forgot the reason and didn't find anything on this on the internet, so I thought maybe someone knows what I mean.
 
Oh yeah. I think the way to think about this is that you can consider two linear regressions (I'm going to assume the constant term comes out zero for both)

##y=\beta_x x##
##x= \beta_y y##.

It's tempting to think that ##\beta_x \beta_y =1##. But it's not, in fact in general the product of the betas is ##R^2## value of the linear regression, and only equals 1 when the two variables are perfectly correlated. As a simple example, suppose x and y are totally uncorrelated. Then ##\beta_x=\beta_y=0##. If they are only slightly correlated, you might get that ##\beta_x## and ##\beta_y## are both small and almost zero. Then trying to invert your linear regression is going to give you a very bad result for an estimate.
 
  • Like
Likes   Reactions: Einstein44
The regression shown was calculated to minimize the sum-squared-errors of the y estimates versus the y sample values. Those errors are the distances parallel to the Y-axis. If you want to estimate x, you would want a regression line that minimizes the sum-squared-errors of the x estimates versus the x sample values. Those errors are the distances parallel to the X-axis. So the minimization would be different.
 
  • Like
Likes   Reactions: Einstein44
As Schreder said, product of slopes is ## R^2##, where ##R## is the correlation coefficient.
Slopes are given as ##R \frac {s_{xx}}{s_{yy}}##, so that the products *

Barring cases where either denominator is ##0, R \frac {s_{yy}}{s_{xx}} * R \frac {s_{xx}}{s_{xx}} =R^2##,

Notice that for nonlinear relations, the relation between the two may be invertible only locally , e.g., for
Hooke's law ##y =kx^2 ##

* Barring cases when either is 0, which means data is constant.
 
  • Like
Likes   Reactions: Einstein44

Similar threads

Replies
4
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 13 ·
Replies
13
Views
4K
Replies
3
Views
3K