- #1
- 566
- 61
So I have Y, the response and X1 and X2. I generate Y and X1 from a multivariate normal distribution. Then I manually set X2 to be nearly same as X1( the same except for the fact that I change up a few entries to make X2 distinct from X1).
I ran three separate linear regressions.
lm(Y~X1) -> X1 statistically significant
lm(Y~X2)-> X2 statistically significant
lm(Y~X1+X2)-> X1 statistically significant and X2 not statistically significant.
I suppose this makes sense. X1 is clearly confounds the relation between X2 and Y since X1 is causally related to X2 and to Y. But I'm not so clear as to what is mathematically going on. How do the algorithms detect this? Does it have something to do with holding X1 constant while interpreting X2?
I ran three separate linear regressions.
lm(Y~X1) -> X1 statistically significant
lm(Y~X2)-> X2 statistically significant
lm(Y~X1+X2)-> X1 statistically significant and X2 not statistically significant.
I suppose this makes sense. X1 is clearly confounds the relation between X2 and Y since X1 is causally related to X2 and to Y. But I'm not so clear as to what is mathematically going on. How do the algorithms detect this? Does it have something to do with holding X1 constant while interpreting X2?