A Interpreting the Association Between y and x while Holding the Group Constant

  • A
  • Thread starter Thread starter FallenApple
  • Start date Start date
  • Tags Tags
    Concept Interaction
FallenApple
Messages
564
Reaction score
61
Say we have a phenomenon where we want to see if x is related to y where x is continuous. Further, there is an opposite effect of x on group 1 compared to group2. Say for group 1, increasing x is associated with increasing y, for group 2, increasing x is associated with decreasing. (this is not realistic but think medication that works very well for one group but is poisonous for another.

So if I do regression ##y\sim x+\epsilon## I would expect to get 0 association. Since they would cancel out on average.

So I know more appropriate model is ##y\sim x+I(G2)+x*I(G2)+\epsilon## where I(G2) is an indictor for belonging to group 2 with a value of 1 and 0 if it belongs to group1.

But what if I want to interpret the association between y and x while holding the group constant? then the equation would be ##y\sim x+I(G2)+\epsilon## since in regression, that is what one does. But in this case, how would that make sense?

I would interpret ##\hat{\beta_{x}}## as the difference in mean of y holding the group status constant? What does that even mean in this case? How can we get an unique estimate for ##\beta_{x}## when holding the group constant when we don't even know what group it is? Does this mean that ##y\sim x+I(G2)+\epsilon## is just invalid as a model?

I know that ##y\sim x+\epsilon## is valid because it just averaged over group. that is,## E(y|x)=E_{group}(E(y|x)|group))## and is just the model that produces the marginal interpretation. And from the interaction equation, ##y\sim x+I(G2)+x*I(G2)+\epsilon## we can get valid interpretations as well.
 
Physics news on Phys.org
The first thing to note is that the above are not model equations. They are a bit like R code, except that R does not include the epsilon term.

The correct way to write the first two model equations is as
$$y_i= \beta_x x_i+\epsilon_i$$
and
$$y_i= \beta_x x_i+\beta_{G2}I(G2_i)+\beta_{x:I(G2)}x_i I(G2_i)+\epsilon_i$$
where ##i## is the observation number.

In the paragraphs after that it is not clear what you want to do. The way you describe it, it sounds like the info you are after is already provided by a regression using the second model. If that's not it, a more precise description of what you are after is needed.

Try running the following code, which implements the second model above in R by first simulating data with the required relationships then performing a regression to recover estimates of the parameters:
Code:
n<-100
x<-rep(0:n,2)
grp<-c(rep(0,(n+1)),rep(1,(n+1)))
y<- ifelse(grp==0,x,n-x)-n/2
summary(lm(y~x*grp))
You'll see from the results that it tells us that the slope of ##y## against ##x## is +1 if ##grp==0## and -1 if ##grp==1##.
 
  • Like
Likes Dale
andrewkirk said:
The first thing to note is that the above are not model equations. They are a bit like R code, except that R does not include the epsilon term.

The correct way to write the first two model equations is as
$$y_i= \beta_x x_i+\epsilon_i$$
and
$$y_i= \beta_x x_i+\beta_{G2}I(G2_i)+\beta_{x:I(G2)}x_i I(G2_i)+\epsilon_i$$
where ##i## is the observation number.

In the paragraphs after that it is not clear what you want to do. The way you describe it, it sounds like the info you are after is already provided by a regression using the second model. If that's not it, a more precise description of what you are after is needed.

Try running the following code, which implements the second model above in R by first simulating data with the required relationships then performing a regression to recover estimates of the parameters:
Code:
n<-100
x<-rep(0:n,2)
grp<-c(rep(0,(n+1)),rep(1,(n+1)))
y<- ifelse(grp==0,x,n-x)-n/2
summary(lm(y~x*grp))
You'll see from the results that it tells us that the slope of ##y## against ##x## is +1 if ##grp==0## and -1 if ##grp==1##.

I was just saying if I used the model ##y_i= \beta_x x_i+\beta_{G2}I(G2_i)+\epsilon_i##, can I actually interpret ## \beta_x ##?

It goes " For a one unit increase in x, the estimated mean y increases by ## \hat{ \beta_x}## when ##I(G2_i)## is fixed. " That is the textbook interpretation. But logically, that doesn't make sense because you can't fix it, you have to pick.Now the regression you posted is giving me interesting results. When I do summary(lm(y~x+grp)) , x has no effect. Which is what you would expect if you just randomly sampled a bunch of x's without paying attention to the group. By by fixing group, you are paying attention to it.
 
FallenApple said:
I was just saying if I used the model ##y_i= \beta_x x_i+\beta_{G2}I(G2_i)+\epsilon_i##, can I actually interpret ## \beta_x ##?
Yes, with that model the intercept varies with group but the slope does not. So in the scenario you describe, ##\beta_x## is going to be pretty useless, likely near zero and not statistically significant because it has to cover both groups.

To discriminate, we need to introduce an interaction term ##\beta_{x:I(G2)}x_i I(G2_i)##. In that model ##\beta_x## is the slope for the first group and ##\beta_x+\beta_{x:I(G2)}## is the slope for the second group. In R we can include all three terms - the two factors and their interaction - with the compact description
Code:
y~ x*grp
 
  • Like
Likes FallenApple
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Back
Top