- #1

fog37

- 1,568

- 108

- TL;DR Summary
- Linear Model with independent categorical variable

Hello,

I have been pondering on the following: we have data for blood pressure BP (response variable) and data about age and gender (categorical variable with two levels). We can build two linear regression models: $$BP=b_0+b_1 age+b_2 gender$$ $$BP=b_0+b_1 age$$

The first model does not take gender into account and plots one single best-fit line disregarding that gender may have an effect.

The 2nd model includes ##gender## and two scenarios are possible: assuming no interaction term, the categorical variable ##gender## may shift the best fit regression line up or down depending its value being ##1## or ##0## and the sign of its corresponding coefficient. If the shift is very small, then ##gender## does not have an effect. But if best-fit line vertical shift is meaningful, then ##gender## has an effect. That means that the ##BP## values for males and females form different clusters that would require two different best-fit lines (same slope different intercept).

The 2nd model, including ##gender## takes care of that difference.

Thank you!

I have been pondering on the following: we have data for blood pressure BP (response variable) and data about age and gender (categorical variable with two levels). We can build two linear regression models: $$BP=b_0+b_1 age+b_2 gender$$ $$BP=b_0+b_1 age$$

The first model does not take gender into account and plots one single best-fit line disregarding that gender may have an effect.

The 2nd model includes ##gender## and two scenarios are possible: assuming no interaction term, the categorical variable ##gender## may shift the best fit regression line up or down depending its value being ##1## or ##0## and the sign of its corresponding coefficient. If the shift is very small, then ##gender## does not have an effect. But if best-fit line vertical shift is meaningful, then ##gender## has an effect. That means that the ##BP## values for males and females form different clusters that would require two different best-fit lines (same slope different intercept).

The 2nd model, including ##gender## takes care of that difference.

**Would the 2nd model be exactly equivalent to creating two separate linear regression models and best-fit lines, one for the male group and one for the female group, once we recognize that male and female form different clusters of points w.r.t. blood pressure BP?**Thank you!