Creating a confounding variable

  • #1
566
60
So I have Y, the response and X1 and X2. I generate Y and X1 from a multivariate normal distribution. Then I manually set X2 to be nearly same as X1( the same except for the fact that I change up a few entries to make X2 distinct from X1).

I ran three separate linear regressions.

lm(Y~X1) -> X1 statistically significant

lm(Y~X2)-> X2 statistically significant

lm(Y~X1+X2)-> X1 statistically significant and X2 not statistically significant.

I suppose this makes sense. X1 is clearly confounds the relation between X2 and Y since X1 is causally related to X2 and to Y. But I'm not so clear as to what is mathematically going on. How do the algorithms detect this? Does it have something to do with holding X1 constant while interpreting X2?
 

Answers and Replies

  • #2
andrewkirk
Science Advisor
Homework Helper
Insights Author
Gold Member
3,987
1,538
The algorithm selects coefficients c1 and c2 and intercept c0 so as to minimise the sum of squares of (Y - (c0 + c1 X1 + c2 X2)).
Because the fit between X1 and Y is better than between X2 and Y, it will choose a high absolute value coefficient for X1 and a low one for X2. So the confidence interval for the estimator of c2, given the null hypothesis that the true value of the coefficient is zero, will include the actual estimate, meaning that it is not statistically significant.
 
Last edited:
  • Like
Likes FallenApple and ZeGato

Related Threads on Creating a confounding variable

Replies
1
Views
764
Replies
11
Views
5K
  • Last Post
Replies
2
Views
2K
  • Last Post
Replies
3
Views
3K
Replies
0
Views
973
Replies
2
Views
806
  • Last Post
Replies
8
Views
1K
  • Last Post
Replies
2
Views
1K
  • Last Post
Replies
6
Views
3K
  • Last Post
Replies
0
Views
2K
Top