# Linear Regression Models (1)

• kingwinner
In summary: The question here is not whether E[Y] is equal to E[Y|X]. The question is whether E[Y] is equal to E[Y|X = x]. The answer is yes if you assume the random variable X has fixed values x. In other words, if X is not random, then E[Y] and E[Y|X = x] are equal.In summary, regression models involve two types of variables: X as the independent variable and Y as the dependent variable, which is often modeled as random. X can also be modeled as random or have a fixed value for each observation. The difference between E(Y|X) and E(Y|X=x) lies in the assumption of X being random or fixed. In the

#### kingwinner

1) "In regression models, there are two types of variables:
X = independent variable
Y = dependent variable
Y is modeled as random.
X is sometimes modeled as random and sometimes it has fixed value for each observation."

I don't understand the meaning of the last line. When is X random? When is X fixed? Can anyone illustrate each case with a quick example?

2) "Simple linear regression model: Y = β0 + β1X + ε
If X is random, E(Y|X) = β0 + β1X
If X is fixed, E(Y|X=x) = β0 + β1x"

Now what's the difference between E(Y|X) and E(Y|X=x)? The above is suuposed to be dealing with 2 separate cases (X random and X fixed), but I don't see any difference...
Most of the time, I am seeing E(Y) = β0 + β1X instead, how come? This is inconsistent with the above. E(Y) is not the same as E(Y|X=x) and I don't think they can ever be equal.

Thanks for explaining!

Last edited:
In many cases the question whether X is random is theoretical. A clear-cut case for nonrandom X is the time trend (e.g., seconds into the experiment, or years into the Obama administration, etc.). Two clear cases of random X is (a) when X is co-determined with Y; and (b) when X is measured with random error.

E(Y|X) implies that the random variable X is not assumed to take on a particular value; E(Y|X=x) implies X is assumed to equal the predetermined, nonrandom value x. E[Y] is being used as a shorthand for "E[Y|X] if X is random, E[Y|x] otherwise."

For example, if we have height v.s. age (Y v.s. X), is X fixed or random?

Also, what does it mean for X to be FIXED? If we have five data points, x1,x2...,x5, and NOT all of them have the same value of X (e.g. x1≠x2), is X fixed in this case?

Thank you!

"Fixed vs. random" usually depends on your goal. In your example, height vs. age, there may be at least two different contexts:

1. Heights of 10 children are measured at ages 1 through 10. We would like to determine the relationship between height and age for these 10 children.

2. 100 children are selected at random from a population of 10,000; their ages are recorded and their heights are measured. We would like to determine a general relationship between height and age for the entire population, based on this sample.

In case 1, age is fixed. In case 2, it is random.

Enuma_Elish said:
"Fixed vs. random" usually depends on your goal. In your example, height vs. age, there may be at least two different contexts:

1. Heights of 10 children are measured at ages 1 through 10. We would like to determine the relationship between height and age for these 10 children.

2. 100 children are selected at random from a population of 10,000; their ages are recorded and their heights are measured. We would like to determine a general relationship between height and age for the entire population, based on this sample.

In case 1, age is fixed. In case 2, it is random.
Thanks for the concrete examples. Things make a lot more sense now!

2) By definitions,
E(Y)=

∫ y f(y) dy
-∞

E(Y|X)=

∫ y f(y|x) dy
-∞

If X is FIXED, does this ALWAYS imply that X and Y are INDEPENDENT and E(Y)=E(Y|X=x)?? Why or why not?

For simple linear regression model, my textbook typically write
Y= β0 + β1*X + ε as
E(Y) = β0 + β1*X

However, I have seen occasionally that
Y= β0 + β1*X + ε is written as
E(Y|X) = β0 + β1*X which looks a bit inconsistent to the above...how come? The definitions of E(Y) and E(Y|X) are clearly different as I outlined above, but here it seems like they are equal? How come?

Thanks for explaining!

Last edited:
E(Y) (using shorthand notation) is a function of X: E(Y) = b0 + b1 X. That means E(Y) is never indep. of X; the question is whether it's a dependence on a nonrandom variable ("x"), or a random variable ("X"). As I explained above, E(Y) is a shorthand notation.

In their Econometric Foundations, Mittelhammer, Judge & Miller hold "E[Y] = E[Y|X] whenever X = x," (i.e. always). [Not an exact quotation.]

Last edited:

## Q1. What is a linear regression model?

A linear regression model is a statistical method used to identify and quantify the relationship between a dependent variable and one or more independent variables. It assumes that there is a linear relationship between the variables and seeks to find the best fitting line to describe this relationship.

## Q2. What is the purpose of a linear regression model?

The purpose of a linear regression model is to predict the value of the dependent variable based on the values of the independent variables. It also helps to understand the relationship between the variables and identify which independent variables have the most impact on the dependent variable.

## Q3. How is a linear regression model calculated?

A linear regression model is calculated by finding the best fitting line through the data points using the least squares method. This involves minimizing the sum of the squared differences between the actual values and the predicted values by adjusting the slope and intercept of the line.

## Q4. What are the assumptions of a linear regression model?

The assumptions of a linear regression model include: linearity, normality, independence of errors, homoscedasticity (constant variance), and absence of multicollinearity (no strong correlations between independent variables).

## Q5. How do you interpret the coefficients in a linear regression model?

The coefficients in a linear regression model represent the change in the dependent variable for every one unit change in the corresponding independent variable, while holding all other variables constant. For example, if the coefficient for x1 is 0.5, it means that for every one unit increase in x1, the predicted value of the dependent variable will increase by 0.5 units.