Register to reply 
Linear Regression: reversing the roles of X and Y 
Share this thread: 
#1
May2509, 03:20 AM

P: 1,270

Simple linear regression:
Y = β0 + β1 *X + ε , where ε is random error Fitted (predicted) value of Y for each X is: ^ Y = b0 + b1 *X (e.g. Y hat = 7.2 + 2.6 X) Consider ^ X = b0' + b1' *Y [the b0,b1,b0', and b1' are leastsquare estimates of the β's] Prove whether or not we can get the values of bo,b1 from bo',b1'. If not, why not? Completely clueless...Any help is greatly appreciated! 


#2
May2509, 05:12 AM

Math
Emeritus
Sci Advisor
Thanks
PF Gold
P: 39,316

Start with [itex]y= b_0+ b_1x[/itex] and solve for x.



#3
May2509, 02:06 PM

HW Helper
P: 1,361




#4
May2509, 03:09 PM

P: 1,270

Linear Regression: reversing the roles of X and Y
Y hat is a fitted (predicted) value of Y based on fixed values of X. Y hat = b0 + b1 *X with b0 and b1 being the leastsquare estimates. For X hat, we are predicting the value of X from values of Y which would produce a different set of parameters, b0' and b1'. Is there any general mathematical relationship linking b0', b1' and b0, b1? Thanks for answering! 


#5
May2709, 07:23 AM

P: 1,270

Any help?
I think this is called "inverse regression"... 


#6
May2709, 07:37 AM

HW Helper
P: 1,361

" Is there any general mathematical relationship linking b0', b1' and b0, b1?"
No. If you put some severe restrictions on the Ys and Xs you could come up with a situation in which the two sets are equal, but in general  no. Also, note that in the situation where x is fixed (nonrandom), regressing x on Y makes no sense  the dependent variable in regression must be random. This may be offtopic for you, but Graybill ("Theory and Application of the Linear Model": my copy is from 1976, a horridgreen cover) discusses a similar problem on pages 275283: the problem in the book deals with this: If we observe a value of a random variable Y (say y0) in a regression model, how can we estimate the corresponding value of x? 


#7
May2709, 09:01 PM

P: 330

Kingwinner: the easiest first step is to try an example. Start with a random set of (X,Y) pairs and regress Y on X and see what the coefficients b0,b1 are. Then regress X on Y and see what the coefficients b0',b1' are. Do you see any simple relationship between b0,b1 and b0',b1'? (i.e. can you get b0',b1' by solving the equation y=b0+b1x for x?)



#8
May2809, 05:49 AM

Math
Emeritus
Sci Advisor
Thanks
PF Gold
P: 39,316

It can be shown that the line such that the sum of the vertical distances from points to the line, the line such that the sum of the horizontal distances from points to the line, and the line such that the sum of distances perpendicular to the line are all the same line. That says that reversing x and y will give the same line.



#9
Jun809, 11:10 PM

P: 462

When linear regression is used to find the line in slopeintercept form, this is not the case...as a glaring example, consider that a vertical line cannot be represented, whereas a horizontal one can. If your data set is more vertical than horizontal, you will get a much better fit by reversing the order of X and Y series. I quickly written a program to randomly generate some data points and compare visually the line that minimizes Y error ( yellow) to the line that minimizes X error ( purple) and the line that minimizes pointtoline distance (red). As you can see from this example, they are not always the same line. In order to eliminate the possibility that these differences are simply due to rounding errors, I repeated the experiment using floating point precision, double precision, and 320bits of floating point precision using GMP bignum. The results are the same in all cases, indicating that precision does not play a factor here. Here's my source code:



#10
Jun809, 11:42 PM

Sci Advisor
HW Helper
P: 2,483

Assuming no singularity (vertical or horizontal) exists in the data, the standardized slope coefficient b/s.e.(b) as well as the goodness of fit statistic (R squared) will be identical between a vertical regression (Y = b0 + b1 X + u) and the corresponding horizontal regression (X = a0 + a1 Y + v) .



#11
Jun909, 12:01 AM

P: 462

You can observe the same effect using Excel's built in linear regression. The graphs are rotated and stretched, but notice that the lines go through different points in relation to each other. The singularity is not present in either of the examples 


#12
Jun909, 12:46 AM

Sci Advisor
HW Helper
P: 2,483

Can you provide either the standard errors (of the coefficients) or the t statistics?



#13
Jun909, 01:16 AM

P: 462

By showing that numerical precision was not responsible for their differences, this proves that the parameters of the recovered lines are indeed different (ie, different equations). At least, I cannot think of any other possible way of interpreting those results. Let me know if you can... 


#14
Jun909, 06:23 AM

HW Helper
P: 1,361

Since [tex] R^2 [/tex] is simply the square of the correlation coefficient, that quantity will be the same whether you regress Y on x or X on y. Sorry  hitting post too soon is the result of posting before morning coffee. The slopes of Y on x and X on y won't be equal (unless you have an incredible stroke of luck), but the tstatistics in each case, used for testing [tex] H\colon \beta = 0 [/tex], will be, since the test statistic for the slope can be written a a function of [tex] r^2 [/tex]. 


#15
Jun909, 08:34 AM

Sci Advisor
HW Helper
P: 2,483

(i) Y is random, (ii) b estimates are a function of Y, (iii) therefore estimated b's are random.



#16
Jun909, 08:37 AM

HW Helper
P: 1,361




#17
Jun909, 08:40 AM

Sci Advisor
HW Helper
P: 2,483

No, I posted too soon. I was responding to junglebeast's comment "the coefficients are not random variables."



#18
Jun909, 10:15 AM

P: 462

If I were to generate X and Y and repeat the experiment multiple times, then yes, I could make m and b into random variables  but this would be meaningless, because the "distribution" of m would have no mean and infinite variance, and that is not a distribution which the student ttest can be applied to in any meaningful way. You claimed that all three equations were equivalent. I showed that, applying all three equations gives very different results. The only thing that differences an analytical solution from an empirical one is the precision of arithmetic. By demonstrating that increased precision does not change the results, this proves that the mathematical expressions in my program are not equivalent. This is why I made my source visible. If the source does compute linear regression properly, then this proves that flipping the order in regression is not mathematically equivalent. Further, I think I can show that algebraically that it is not equal to reverse the role of X and Y. Let (m1, b1) be the line found by minimizing Yerror, and let (m2,b2) be the line found by minimizing Xerror (after reversing the roles of X and Y), [tex] \begin{align} y &= m1 x + b1\\ y &= m2 x + b2 \end{align} [/tex] By applying http://en.wikipedia.org/wiki/Linear_least_squares, we have [tex] \begin{align} m1 &= \frac{\sum y \sum x  n \sum x y}{ (\sum x)^2  n (\sum x^2)}\\ b1 &= \frac{ \sum x \sum x y  \sum y (\sum x^2)}{ (\sum x)^2  n (\sum x^2)} \end{align} [/tex] We can also directly calculate the equation after reversing the roles of X and Y, although this also flips the line, so let's refer to that line as (m2b, b2b): [tex] \begin{align} m2b &= \frac{\sum y \sum x  n \sum x y}{ (\sum y)^2  n (\sum y^2)}\\ b2b &= \frac{ \sum x \sum x y  \sum x (\sum y^2)}{ (\sum y)^2  n (\sum y^2)} \end{align} [/tex] Now we need to flip (m2b, b2b) into the same form as (m1,b1) for comparison. This rearrangement can be done by reversing x and y and putting back into slopeintercept form, [tex] \begin{align} y &= \left(\frac{1}{m2b}\right)x + \left(\frac{b2b}{m2b}\right) \\ &= m2 x + b2 \\ \end{align} [/tex] Thus, looking just at the slope, [tex] m2 &= \frac{ (\sum y)^2  n (\sum y^2)}{\sum y \sum x  n \sum x y}\\ [/tex] We can see that m1 is not equal to m2  so we do not obtain the same equation after reversing the roles of X and Y. 


Register to reply 
Related Discussions  
Linear Regression Models (2)  Set Theory, Logic, Probability, Statistics  2  
Linear regression problem  Engineering, Comp Sci, & Technology Homework  1  
Linear regression  Calculus & Beyond Homework  1  
Linear regression in R  Programming & Computer Science  3  
Linear regression where am i going wrong?  Calculus & Beyond Homework  1 