Creating a confounding variable

  • #1
565
60
So I have Y, the response and X1 and X2. I generate Y and X1 from a multivariate normal distribution. Then I manually set X2 to be nearly same as X1( the same except for the fact that I change up a few entries to make X2 distinct from X1).

I ran three separate linear regressions.

lm(Y~X1) -> X1 statistically significant

lm(Y~X2)-> X2 statistically significant

lm(Y~X1+X2)-> X1 statistically significant and X2 not statistically significant.

I suppose this makes sense. X1 is clearly confounds the relation between X2 and Y since X1 is causally related to X2 and to Y. But I'm not so clear as to what is mathematically going on. How do the algorithms detect this? Does it have something to do with holding X1 constant while interpreting X2?
 
  • Like
Likes ZeGato

Answers and Replies

  • #2
andrewkirk
Science Advisor
Homework Helper
Insights Author
Gold Member
3,885
1,453
The algorithm selects coefficients c1 and c2 and intercept c0 so as to minimise the sum of squares of (Y - (c0 + c1 X1 + c2 X2)).
Because the fit between X1 and Y is better than between X2 and Y, it will choose a high absolute value coefficient for X1 and a low one for X2. So the confidence interval for the estimator of c2, given the null hypothesis that the true value of the coefficient is zero, will include the actual estimate, meaning that it is not statistically significant.
 
Last edited:
  • Like
Likes FallenApple and ZeGato

Related Threads on Creating a confounding variable

Replies
1
Views
661
Replies
2
Views
625
Replies
11
Views
3K
  • Last Post
Replies
2
Views
1K
  • Last Post
Replies
3
Views
3K
  • Last Post
Replies
12
Views
3K
Replies
0
Views
865
  • Last Post
Replies
2
Views
1K
Replies
1
Views
610
Top