FactChecker said:
In general, if you are trying to get the line, ##\hat{X}=a'Y+b'##, that best estimates X based on Y from a set of ##(x_i,y_i)## data, then it is better to minimize the correct thing, which is ##\sum (x_i-\hat{x_i})^2##, not ##\sum (y_i-\hat{y_i})^2##.
Again, you can just test that sort of claim by running a Monte Carlo simulation. So, similar to what I did before, consider the true values of ##y## going from 0 to 1 in steps of 0.01 and the true values of ##x=2y+5##. I then added 0 mean Gaussian white noise to ##x## and ##y## with ##\sigma_x=0.01## and ##\sigma_y=0.5##. Next I did two fits, a "forward" fit of ##x=a y + b + \epsilon## and an "inverse" fit of ##y = a' x + b'+ \epsilon## where the desired fit parameters were then determined by ##a=1/a'## and ##b=-b'/a'##. I repeated this process 10000 times.
So, if we look at the sum of square residuals on the data. We see that indeed as you have stated the forward fit has a substantially smaller sum of squared residuals to the data.
However, if we look at the sum of squares residuals to the true regression line we see a very different outcome
So the forward fit is closer to the data, but the inverse fit is closer to the true relationship in a least-squares sense. In other words, it is fitting to the noise rather than to the actual relationship.
More importantly, if we look at the fit parameters we see that for both the slope and the intercept parameters, the forward fit is rather strongly biased whereas the inverse fit parameters appear unbiased.
Finally, we can compare the fit lines with the true regression. Notice how reliably wrong the forward fit is.
So the forward fit is the "best estimate" only in one very narrow sense. However, that does not mean that it is generally a better choice.
The issue is that the narrow sense in which it is better relies on an assumption which is strongly violated because ##\sigma_y## is so large. With the violation of this assumption the usual fit is no longer an unbiased minimum varisnce estimator. It is therefore better to switch to the inverse model which does not violate the assumption. Even though the resulting fits are suboptimal in the narrow sense, they are better under a much broader set of criteria and importantly the parameter estimates are unbiased.
Another alternative is to use an "errors in variables" model that does not make the assumption that the "independent" variable has no errors. But as we see, when one variable approximately satisfies the assumption then you can use that one and a standard least-squares fit and then invert the model.