B Can We Use Regression Line x on y if y is the Dependent Variable?

songoku
Messages
2,467
Reaction score
382
TL;DR Summary
Let say I have 10 bivariate data (x and y) where x is the independent variable and x is the dependent variable.

I want to estimate the value of x from a certain given value of y. Which regression line should I use, regression line y on x or regression line x on y?
I have note that states regression line x on y is used when we want to calculate x for given y but in this case y is dependent variable. I am pretty sure I can use either line if the value of product moment correlation coefficient (r) is close to 1 but for the case, let say r = 0.6, can we use regression line x on y even though y is dependent variable? Or should we use regression line y on x to calculate the value of x?

Thanks
 
Physics news on Phys.org
The important thing is which is measured/known most precisely. That should be the independent variable. The assumption of OLS regression is that all of the error is in the dependent variable.
 
Dale said:
The important thing is which is measured/known most precisely. That should be the independent variable. The assumption of OLS regression is that all of the error is in the dependent variable.
I see, so I should use regression line y on x even if I want to estimate x.

But I am sorry, I have another question. Can I argue that for the case y is given by the question (y is still the dependent variable), y is known more precisely so the appropriate regression line is x on y?

Thanks
 
songoku said:
I see, so I should use regression line y on x even if I want to estimate x.
Yes, where y is the thing which has a large error and x is measured almost exactly.

songoku said:
But I am sorry, I have another question. Can I argue that for the case y is given by the question (y is still the dependent variable), y is known more precisely so the appropriate regression line is x on y?
This isn’t a matter of argument. How well can you measure the values of y? How well can you measure the values of x? The answer to those questions determines the method you should use.
 
If you want to estimate x based on the values of y, you should do a regression of x on y (x dependent and y independent). Linear regression would minimize the sum-squared-errors of the sampled ##x_i## verses the estimated ##\hat{x_i}(y_i)##. Doing a regression the other way would minimize the wrong sum-squared-errors and all the related statistics would be wrong.
 
Dale said:
Yes, where y is the thing which has a large error and x is measured almost exactly.

This isn’t a matter of argument. How well can you measure the values of y? How well can you measure the values of x? The answer to those questions determines the method you should use.
I understand

FactChecker said:
If you want to estimate x based on the values of y, you should do a regression of x on y (x dependent and y independent). Linear regression would minimize the sum-squared-errors of the sampled ##x_i## verses the estimated ##\hat{x_i}(y_i)##. Doing a regression the other way would minimize the wrong sum-squared-errors and all the related statistics would be wrong.
How about if the data I have is only x as independent variable and y is dependent variable and I need to estimate x for given y?

Thanks
 
songoku said:
How about if the data I have is only x as independent variable and y is dependent variable and I need to estimate x for given y?
That is what I am talking about. They are both linear regression problems. However, the coefficients you get from the two linear regressions are not the same or even easily related. The errors in the sum-squared-error that are minimized in the linear regressions are projections onto different axes. (That is, minimizing ##\sum (y_i-\hat{y_i})^2## is not the same as minimizing ##\sum (x_i-\hat{x_i})^2##.)
So you should do a linear regression with X as a linear function of Y.
The issue is not how well the X and Y values can be measured, it is how well the values fit the selected model. That is the sum-squared-error that is being minimized.
 
FactChecker said:
That is what I am talking about. They are both linear regression problems. However, the coefficients you get from the two linear regressions are not the same or even easily related. The errors in the sum-squared-error that are minimized in the linear regressions are projections onto different axes. (That is, minimizing ##\sum (y_i-\hat{y_i})^2## is not the same as minimizing ##\sum (x_i-\hat{x_i})^2##.)
So you should do a linear regression with X as a linear function of Y.
The issue is not how well the X and Y values can be measured, it is how well the values fit the selected model. That is the sum-squared-error that is being minimized.
I understand your explanation but why it seems to me that your suggestion is different from @Dale 's? Or maybe I misinterpret something?

Dale said:
How well can you measure the values of y? How well can you measure the values of x? The answer to those questions determines the method you should use.
Dale said:
The important thing is which is measured/known most precisely. That should be the independent variable. The assumption of OLS regression is that all of the error is in the dependent variable.

From those replies, the one that becomes independent variable is the one that can be measured more precisely, which is ##x## in my case so the regression line that should be used is ##y## on ##x##, even though I want to estimate ##x## from ##y##

But from your reply (@FactChecker ), I should use regression line ##x## on ##y## because I want to estimate ##x## for given ##y## so that the value of the estimation suits the model (the error in ##x## is minimized) even though my independent variable is ##x## (I can't change ##y## to be independent variable)

Am I correct to think that there are two different suggestions for my hypothetical case?

Thanks
 
songoku said:
From those replies, the one that becomes independent variable is the one that can be measured more precisely, which is x in my case so the regression line that should be used is y on x, even though I want to estimate x from y
Yes, this is correct.

@FactChecker can confirm, but I don’t think that he is disagreeing with me. He is just showing you why the two choices are not equivalent.
 
Back
Top