Comparing Approaches: Linear Regression of Y on X vs X on Y

  • Context: High School 
  • Thread starter Thread starter FactChecker
  • Start date Start date
  • Tags Tags
    Axes Regression
Click For Summary

Discussion Overview

The discussion revolves around the comparison of two linear regression approaches: modeling Y as a function of X versus modeling X as a function of Y. Participants explore the implications of each approach, particularly in terms of error minimization and the impact of measurement accuracy on the choice of model. The conversation includes technical reasoning, simulations, and debates over the appropriateness of each regression method.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants argue that the choice of regression model should depend on how the data will be used and which sum of squared errors (SSE) is minimized.
  • One participant presents a Monte Carlo simulation comparing the two regression models, noting that the model with the noisier variable tends to yield better regression coefficients.
  • Another participant questions whether the goal is to minimize the square error of the estimated regression coefficients or the predictions of the given data.
  • There is a discussion about the formulas that relate E(Y|X) and E(X|Y), suggesting potential methods for estimation.
  • Some participants assert that the Y=aX+b regression minimizes the wrong quantity if the goal is to estimate X.
  • Concerns are raised about the assumptions of ordinary least squares (OLS) regression, particularly regarding errors in independent variables.
  • Participants debate the rigor of quantifying the results of mis-estimating regression coefficients and whether the Monte Carlo results are valid tests of the models.
  • There is acknowledgment that estimating the parameters of the correct model is advantageous, but the choice of model remains contested.

Areas of Agreement / Disagreement

Participants express differing views on the appropriateness of each regression model, with no consensus reached on which approach is superior. The discussion remains unresolved, with multiple competing perspectives on the implications of measurement accuracy and error minimization.

Contextual Notes

Limitations include the dependence on the specific simulation model used and the assumptions regarding measurement errors in independent variables. The discussion highlights the complexity of choosing the appropriate regression model based on the context of the data.

  • #61
Dale said:
The important thing is which is measured/known most precisely.
(Didn't actually find that quote but it is referenced in post #58.)
Alex has a set of experimental data of inputs X and outputs Y.
For the purposes of a further experiment, she wants to input a value x that would give the best chance of an output in the vicinity of y.
How should she choose the value of x?
 
Physics news on Phys.org
  • #62
Personally, I would recommend a full Bayesian approach so that you can include all prior information on X and Y. Including any manufacturers information on the precision and accuracy of X and Y, any previous literature and expert opinion, and any physical relationship between them.

Then the posterior predictive distribution P(X|Y,y) would be what you want.
 
  • #63
haruspex said:
Alex has a set of experimental data of inputs X and outputs Y.
For the purposes of a further experiment, she wants to input a value x that would give the best chance of an output in the vicinity of y.
How should she choose the value of x?

That's almost a well defined mathematical problem. One thing that's missing is how Alex will pick the value of y. For example, would she pick it from a uniform distribution over an interval [y0, y1] where the experimental data exists? - or is she trying to predict a value of x that corresponds to a y-value that hasn't been observed yet?

Generally speaking, questions of the form "What is the probability of such-and-such given the observed data" require a Bayesian approach. Non-Bayesian approaches answer questions of the form "What is the probability of the observed data given that such-and-such is assumed to true?".
 
  • #64
Jarvis323 said:
Apparently, introducing multiplicative noise to x before OLS is equivalent to doing ridge regression, which is a form of regularized linear regression where the model is biased to reduce the variance. The intuition is that adding noise to x stretches out the distribution of x values, which in turn reduces the slope of the model.

http://madrury.github.io/jekyll/update/statistics/2017/08/12/noisy-regression.html

It is interesting I guess because as the slides I linked show, OLS doesn't really care about the variance in y so long as it's constant and normally distributed. But variance in x will lead to increased model bias (and reduce model variance)

So, for univariate OLS, the choice of using the lower or higher variance variable as the independent variable is actually an act of making a bias-variance tradeoff? And in general, you can always add noise to the independent variables if you want to, to increase bias/reduce model complexity/reduce model variance?
Are there "Reasonable" ways to check that the assumptions of Gauss-Markov : IID residuals with mean 0, pairwise independent and residuals independent of independent variables are met?
 
  • #65
Just an observation; not sure if it has been addressed: ( Not surprisingly) ,Unlike the purely geometric cases of a given line ## y=mx ## and its' 'Reciprocal' ##x= \frac {1}{m} y ## for which the product of the slopes is ## m \frac {1}{m} =1 ##, the regression lines have respective slopes ## r \frac { s_y}{s_x} ; r \frac { s_x}{ s_y} \neq 1 ## when ## r \neq \pm 1 ##. But maybe this is obvious.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K