# Linear Regression: reversing the roles of X and Y

1. May 25, 2009

### kingwinner

Simple linear regression:
Y = β0 + β1 *X + ε , where ε is random error

Fitted (predicted) value of Y for each X is:
^
Y = b0 + b1 *X (e.g. Y hat = 7.2 + 2.6 X)

Consider
^
X = b0' + b1' *Y

[the b0,b1,b0', and b1' are least-square estimates of the β's]

Prove whether or not we can get the values of bo,b1 from bo',b1'. If not, why not?

Completely clueless...Any help is greatly appreciated!

2. May 25, 2009

### HallsofIvy

Staff Emeritus
Start with $y= b_0+ b_1x$ and solve for x.

3. May 25, 2009

I'm a little confused about your question? Are you asking whether regressing X on Y will always give the same coefficients, or whether it is ever possible to get the same ones?

4. May 25, 2009

### kingwinner

(X_i, Y_i), i=1,2,...n

Y hat is a fitted (predicted) value of Y based on fixed values of X.
Y hat = b0 + b1 *X with b0 and b1 being the least-square estimates.

For X hat, we are predicting the value of X from values of Y which would produce a different set of parameters, b0' and b1'. Is there any general mathematical relationship linking b0', b1' and b0, b1?

5. May 27, 2009

### kingwinner

Any help?

I think this is called "inverse regression"...

6. May 27, 2009

" Is there any general mathematical relationship linking b0', b1' and b0, b1?"
No. If you put some severe restrictions on the Ys and Xs you could come up with a situation in which the two sets are equal, but in general - no.

Also, note that in the situation where x is fixed (non-random), regressing x on Y makes no sense - the dependent variable in regression must be random.

This may be off-topic for you, but Graybill ("Theory and Application of the Linear Model": my copy is from 1976, a horrid-green cover) discusses a similar problem on pages 275-283: the problem in the book deals with this: If we observe a value of a random variable Y (say y0) in a regression model, how can we estimate the corresponding value of x?

7. May 27, 2009

### mXSCNT

Kingwinner: the easiest first step is to try an example. Start with a random set of (X,Y) pairs and regress Y on X and see what the coefficients b0,b1 are. Then regress X on Y and see what the coefficients b0',b1' are. Do you see any simple relationship between b0,b1 and b0',b1'? (i.e. can you get b0',b1' by solving the equation y=b0+b1x for x?)

8. May 28, 2009

### HallsofIvy

Staff Emeritus
It can be shown that the line such that the sum of the vertical distances from points to the line, the line such that the sum of the horizontal distances from points to the line, and the line such that the sum of distances perpendicular to the line are all the same line. That says that reversing x and y will give the same line.

9. Jun 8, 2009

### junglebeast

I would be interested to see that proof...

When linear regression is used to find the line in slope-intercept form, this is not the case...as a glaring example, consider that a vertical line cannot be represented, whereas a horizontal one can. If your data set is more vertical than horizontal, you will get a much better fit by reversing the order of X and Y series.

I quickly written a program to randomly generate some data points and compare visually the line that minimizes Y error ( yellow) to the line that minimizes X error ( purple) and the line that minimizes point-to-line distance (red). As you can see from this example, they are not always the same line.

In order to eliminate the possibility that these differences are simply due to rounding errors, I repeated the experiment using floating point precision, double precision, and 320-bits of floating point precision using GMP bignum. The results are the same in all cases, indicating that precision does not play a factor here.

Here's my source code:
Code (Text):

#include "linalg\linear_least_squares.h"
#include "vision\drawing.h"
#include "stat\rng.h"
#include "linalg\null_space.h"
#include "bignum\bigfloat.h"

template<typename Real>
void linear_regression( const std::vector<Real> &X, const std::vector<Real> &Y,
Real &m, Real &b)
{
Real sX=0, sY=0, sXY=0, sXX=0;
for(unsigned i=0; i<X.size(); ++i)
{
Real x = X[i], y = Y[i];
sX += x;
sY += y;
sXY += x*y;
sXX += x*x;
}
Real n = X.size();

m = (sY*sX - n*sXY)/( sX*sX - n*sXX );
b = (sX*sXY - sY*sXX)/( sX*sX - n*sXX );
}

int main()
{
using namespace heinlib;
using namespace cimgl;

bigfloat::set_default_precision(300);

typedef bigfloat Real;

bigfloat rr;
printf("precision = %d\n", rr.precision() );

CImg<float> image(250, 250, 1, 3, 0);

std::vector<Real> X, Y;

int N = 10;
for(unsigned i=0; i<N; ++i)
{
Real x = random<Real>::uniform(0, 250);
Real y = random<Real>::uniform(0, 250);

image.draw_circle( x, y, 3, color<float>::white() );

X.push_back(x);
Y.push_back(y);
}

Real m1, b1,  m2, b2;
linear_regression(X, Y, m1, b1 );
linear_regression(Y, X, m2, b2 );

//flip second line
b2 = -b2/m2;
m2 = 1/m2;

cimg_draw_line( image, m1, b1, color<float>::yellow() );
cimg_draw_line( image, m2, b2, color<float>::purple() );

//find the means of X and Y
Real mX = 0, mY = 0;
for(unsigned i=0; i<N; ++i)
{
Real x = X[i], y = Y[i];
mX += x;  mY += y;
}
mX /= N;
mY /= N;

//find least squares line by distance to line..
Real sXX=0, sYY=0, sXY=0;
for(unsigned i=0; i<N; ++i)
{
Real x = X[i] - mX,
y = Y[i] - mY;
sXX += x*x;
sYY += y*y;
sXY += x*y;
}

static_matrix<2,2,Real> A = { sXX, sXY,
sXY, sYY };
static_matrix<2,1,Real> Norm;

null_space_SVD(A, Norm);

//general form
static_matrix<3,1,Real> line = { Norm[0], Norm[1], -( mX*Norm[0] + mY*Norm[1] ) };
cimg_draw_line( image, line, color<float>::red() );

CImgDisplay disp(image);
system("pause");
}

10. Jun 8, 2009

### EnumaElish

Assuming no singularity (vertical or horizontal) exists in the data, the standardized slope coefficient b/s.e.(b) as well as the goodness of fit statistic (R squared) will be identical between a vertical regression (Y = b0 + b1 X + u) and the corresponding horizontal regression (X = a0 + a1 Y + v) .

11. Jun 9, 2009

### junglebeast

Well, you can say that...but you haven't given any formal proof or evidence of the claim, and it is contrary to the example I just showed, which I have made the source available for you to see.

You can observe the same effect using Excel's built in linear regression. The graphs are rotated and stretched, but notice that the lines go through different points in relation to each other.

The singularity is not present in either of the examples

Last edited: Jun 9, 2009
12. Jun 9, 2009

### EnumaElish

Can you provide either the standard errors (of the coefficients) or the t statistics?

13. Jun 9, 2009

### junglebeast

Your question does not even make sense, as the coefficients are not random variables. The coefficients are mathematical solutions to the line equation for a fixed data set of points.

By showing that numerical precision was not responsible for their differences, this proves that the parameters of the recovered lines are indeed different (ie, different equations). At least, I cannot think of any other possible way of interpreting those results. Let me know if you can...

14. Jun 9, 2009

The coefficients in a regression are statistics, so it certainly does make sense to talk about their standard errors.

Since $$R^2$$ is simply the square of the correlation coefficient, that quantity will be the same whether you regress Y on x or X on y.

Sorry - hitting post too soon is the result of posting before morning coffee.

The slopes of Y on x and X on y won't be equal (unless you have an incredible stroke of luck), but the t-statistics in each case, used for testing
$$H\colon \beta = 0$$, will be, since the test statistic for the slope can be written a a function of $$r^2$$.

Last edited: Jun 9, 2009
15. Jun 9, 2009

### EnumaElish

(i) Y is random, (ii) b estimates are a function of Y, (iii) therefore estimated b's are random.

16. Jun 9, 2009

Er, I was agreeing with you earlier (if this post is aimed at me)

17. Jun 9, 2009

### EnumaElish

No, I posted too soon. I was responding to junglebeast's comment "the coefficients are not random variables."

18. Jun 9, 2009

### junglebeast

Initially, I generated X and Y randomly to make a fixed data set. Then I performed 9 tests on that fixed data set to get measurements of m and b. All of these m and b are comparable because they relate to the same data set.

If I were to generate X and Y and repeat the experiment multiple times, then yes, I could make m and b into random variables -- but this would be meaningless, because the "distribution" of m would have no mean and infinite variance, and that is not a distribution which the student t-test can be applied to in any meaningful way.

You claimed that all three equations were equivalent. I showed that, applying all three equations gives very different results. The only thing that differences an analytical solution from an empirical one is the precision of arithmetic. By demonstrating that increased precision does not change the results, this proves that the mathematical expressions in my program are not equivalent. This is why I made my source visible. If the source does compute linear regression properly, then this proves that flipping the order in regression is not mathematically equivalent.

Further, I think I can show that algebraically that it is not equal to reverse the role of X and Y. Let (m1, b1) be the line found by minimizing Y-error, and let (m2,b2) be the line found by minimizing X-error (after reversing the roles of X and Y),

\begin{align} y &= m1 x + b1\\ y &= m2 x + b2 \end{align}

By applying http://en.wikipedia.org/wiki/Linear_least_squares, we have

\begin{align} m1 &= \frac{\sum y \sum x - n \sum x y}{ (\sum x)^2 - n (\sum x^2)}\\ b1 &= \frac{ \sum x \sum x y - \sum y (\sum x^2)}{ (\sum x)^2 - n (\sum x^2)} \end{align}

We can also directly calculate the equation after reversing the roles of X and Y, although this also flips the line, so let's refer to that line as (m2b, b2b):

\begin{align} m2b &= \frac{\sum y \sum x - n \sum x y}{ (\sum y)^2 - n (\sum y^2)}\\ b2b &= \frac{ \sum x \sum x y - \sum x (\sum y^2)}{ (\sum y)^2 - n (\sum y^2)} \end{align}

Now we need to flip (m2b, b2b) into the same form as (m1,b1) for comparison. This rearrangement can be done by reversing x and y and putting back into slope-intercept form,

\begin{align} y &= \left(\frac{1}{m2b}\right)x + \left(-\frac{b2b}{m2b}\right) \\ &= m2 x + b2 \\ \end{align}

Thus, looking just at the slope,

$$m2 &= \frac{ (\sum y)^2 - n (\sum y^2)}{\sum y \sum x - n \sum x y}\\$$

We can see that m1 is not equal to m2 -- so we do not obtain the same equation after reversing the roles of X and Y.

Last edited: Jun 9, 2009
19. Jun 9, 2009

### sylas

Here's another way to see it. Consider three points, in an L shape, as follows:

To get a line that minimizes vertical distances, it will mass midway between the two points at the same x-coordinate, and through the other. To get a line that minimizes horizontal distances, it will pass midway between the two points with the same y-coordinate, and through the other. These lines are shown above.

Therefore the regression line is not in general the same as the inverse regression line.

20. Jun 9, 2009

### EnumaElish

"We can see that m1 is not equal to m2 -- so we do not obtain the same equation after reversing the roles of X and Y."

I see your point. I can't speak for HallsOfIvy, but the phrase "same equation" can be interpreted differently:

1. One might think from a statistical point of view what matters is not the slope estimate, but the standardized estimate of the slope (i.e. the t statistic of the slope parameter). This statistic is direction-free (vertical vs. horizontal).

2. If one can derive the parameters of the horizontal equation from the parameters of the vertical equation, then in an informational sense the two sets of estimates can be thought identical.