Question about regression line

songoku · Aug 14, 2021

I have note that states regression line x on y is used when we want to calculate x for given y but in this case y is dependent variable. I am pretty sure I can use either line if the value of product moment correlation coefficient (r) is close to 1 but for the case, let say r = 0.6, can we use regression line x on y even though y is dependent variable? Or should we use regression line y on x to calculate the value of x?

Thanks

Dale · Aug 14, 2021

The important thing is which is measured/known most precisely. That should be the independent variable. The assumption of OLS regression is that all of the error is in the dependent variable.

songoku · Aug 14, 2021

Dale said:

The important thing is which is measured/known most precisely. That should be the independent variable. The assumption of OLS regression is that all of the error is in the dependent variable.

I see, so I should use regression line y on x even if I want to estimate x.

But I am sorry, I have another question. Can I argue that for the case y is given by the question (y is still the dependent variable), y is known more precisely so the appropriate regression line is x on y?

Thanks

Dale · Aug 14, 2021

songoku said:

I see, so I should use regression line y on x even if I want to estimate x.

Yes, where y is the thing which has a large error and x is measured almost exactly.

songoku said:

But I am sorry, I have another question. Can I argue that for the case y is given by the question (y is still the dependent variable), y is known more precisely so the appropriate regression line is x on y?

This isn’t a matter of argument. How well can you measure the values of y? How well can you measure the values of x? The answer to those questions determines the method you should use.

FactChecker · Aug 14, 2021

If you want to estimate x based on the values of y, you should do a regression of x on y (x dependent and y independent). Linear regression would minimize the sum-squared-errors of the sampled ##x_i## verses the estimated ##\hat{x_i}(y_i)##. Doing a regression the other way would minimize the wrong sum-squared-errors and all the related statistics would be wrong.

songoku · Aug 14, 2021

Dale said:

Yes, where y is the thing which has a large error and x is measured almost exactly.

This isn’t a matter of argument. How well can you measure the values of y? How well can you measure the values of x? The answer to those questions determines the method you should use.

I understand

FactChecker said:

If you want to estimate x based on the values of y, you should do a regression of x on y (x dependent and y independent). Linear regression would minimize the sum-squared-errors of the sampled ##x_i## verses the estimated ##\hat{x_i}(y_i)##. Doing a regression the other way would minimize the wrong sum-squared-errors and all the related statistics would be wrong.

How about if the data I have is only x as independent variable and y is dependent variable and I need to estimate x for given y?

Thanks

FactChecker · Aug 15, 2021

songoku said:

How about if the data I have is only x as independent variable and y is dependent variable and I need to estimate x for given y?

That is what I am talking about. They are both linear regression problems. However, the coefficients you get from the two linear regressions are not the same or even easily related. The errors in the sum-squared-error that are minimized in the linear regressions are projections onto different axes. (That is, minimizing ##\sum (y_i-\hat{y_i})^2## is not the same as minimizing ##\sum (x_i-\hat{x_i})^2##.)
So you should do a linear regression with X as a linear function of Y.
The issue is not how well the X and Y values can be measured, it is how well the values fit the selected model. That is the sum-squared-error that is being minimized.

songoku · Aug 15, 2021

FactChecker said:

That is what I am talking about. They are both linear regression problems. However, the coefficients you get from the two linear regressions are not the same or even easily related. The errors in the sum-squared-error that are minimized in the linear regressions are projections onto different axes. (That is, minimizing ##\sum (y_i-\hat{y_i})^2## is not the same as minimizing ##\sum (x_i-\hat{x_i})^2##.)
So you should do a linear regression with X as a linear function of Y.
The issue is not how well the X and Y values can be measured, it is how well the values fit the selected model. That is the sum-squared-error that is being minimized.

I understand your explanation but why it seems to me that your suggestion is different from @Dale 's? Or maybe I misinterpret something?

Dale said:

How well can you measure the values of y? How well can you measure the values of x? The answer to those questions determines the method you should use.

Dale said:

The important thing is which is measured/known most precisely. That should be the independent variable. The assumption of OLS regression is that all of the error is in the dependent variable.

From those replies, the one that becomes independent variable is the one that can be measured more precisely, which is ##x## in my case so the regression line that should be used is ##y## on ##x##, even though I want to estimate ##x## from ##y##

But from your reply (@FactChecker ), I should use regression line ##x## on ##y## because I want to estimate ##x## for given ##y## so that the value of the estimation suits the model (the error in ##x## is minimized) even though my independent variable is ##x## (I can't change ##y## to be independent variable)

Am I correct to think that there are two different suggestions for my hypothetical case?

Thanks

Dale · Aug 15, 2021

songoku said:

From those replies, the one that becomes independent variable is the one that can be measured more precisely, which is x in my case so the regression line that should be used is y on x, even though I want to estimate x from y

Yes, this is correct.

@FactChecker can confirm, but I don’t think that he is disagreeing with me. He is just showing you why the two choices are not equivalent.

Dale · Aug 16, 2021

@FactChecker was, in fact, disagreeing with me. Based on some feedback received I have split that discussion off from this one:

https://www.physicsforums.com/threads/switching-regression-axes.1006148/

Question about regression line

What is a regression line?

How is a regression line calculated?

What is the purpose of a regression line?

What is the difference between a regression line and a trendline?

Can a regression line be used to make predictions outside of the data range?

Similar threads

Hot Threads

Recent Insights