Becoming Familiar with Regression Notation and Terminology

AndreTheGiant · Nov 4, 2011

Hi there.

I am having some trouble understanding the full context of this question.

Suppose we have a categorical variable T E (1...n) and we observe k observations for Y when T = n. If a regression model holds:

i) Write down Y in terms of dummy variables X1...Xi

ii) What is the design matrix X

iii) what is b?

So what does it mean when it says there are k observations when T = n? A categorical variable is sort of like a dummy variable right? As in using numbers to represent qualitative measurements such as hair colour etc. So does T only take one value from 1 to n or multiple values?

So if T = n, Y would be represented by B1X1 + ... + BnXn in terms of dummy variables?

how can there only be one b?

Thanks.

Stephen Tashi · Nov 4, 2011

AndreTheGiant said:

Hi there.

I am having some trouble understanding the full context of this question.

I am also. Did this come from a textbook or course? If so, what was the subject of the chapter?

Suppose we have a categorical variable T E (1...n)

Did you mean to write "[itex] T \in \{1,2,..n\} [/itex]"?

and we observe k observations for Y when T = n.

If you quoted that phrase accurately, I agree it is unclear. Could it have said
"we observe [itex] k_n [/itex] observations for [itex] Y [/itex] when [itex]T = n [/itex]"?

If a regression model holds:

i) Write down Y in terms of dummy variables X1...Xi

Dummy variables for categorical data are often defined as variables that take only the value 0 or 1. So perhaps you are being asked to encode the category as a vector [itex] (X_1,X_2,...X_n)[/itex] where [itex] X_c = 1 [/itex] when [itex] T = c[/itex] and the rest of the [itex] X_i [/itex] are zero.

What do your text materials say about doing regression with such variables?

AndreTheGiant · Nov 4, 2011

It is a homework question and about your points.

The first one is correct that is what i meant. The second one is also correct, i forgot to put the subscript n on the k.

As for the first part. I also made a mistake there. It asks to write down E(Y|T) in terms of X1...Xi dummy variables, not Y, I am not sure if it isthe right way to approach because that is what i thought as well, but wouldn't that matrix just be the design matrix like the second part is asking? I thought I would be writing it like E(Y|T) = b0 + b1x1 + b2x2 +... +bnxn?

Stephen Tashi · Nov 5, 2011

AndreTheGiant said:

It is a homework question

You didn't say what the course was. Are you studying ANCOVA?

I can't give you much help on ANCOVA because I haven't looks at such material since the 1980's and I think I only took one course that would have included it. (Of course, I can look up things on the web and refresh or educate my mind, but if you are taking a course in this material, you should have been instructed about the fundamentals of it.

So what does it mean when it says there are k observations when T = n?

We've established that there are [itex] k_n [/itex] observations when [itex] T = n [/itex].
So my guess is that an example of the data is something like this:

[itex] T = 1, k_1 = 4, [/itex] observed Y values: (2.4, 3.2, 9.8, 3.2)
[itex] T = 2, k_2 = 4, [/itex] observed Y values: (3.3, 3.2, 1.7)
[itex] T = 3, k_2 = 6, [/itex] observed Y values ( 2.6, 3.3, 2.0,3.4, 2.2, 9.0)

A categorical variable is sort of like a dummy variable right?

I think they are different concepts. Don't your materials define them? Are you trying to work the problem without referring to your book or lecture notes?

As in using numbers to represent qualitative measurements such as hair colour etc. So does T only take one value from 1 to n or multiple values?

The problem says T (which is a variable) takes values only from 1 to n, but in each different situation , it may take a different one of those values.

Sometimes dummy variables X1,X2,...are only allowed to take the values of 0 or 1.
So when T = 2, we have X1 = 0, X2 = 1, X3 = 0. You'll have to see what your course materials say about this.

AndreTheGiant · Nov 5, 2011

This is a one way Anova problem. Ancova isn't covered just yet. Its a regression analysis course. I have my notes but I am still confused and i can't find anything about this on the web really that explains it in the same notation or way my instructor does. I'm really trying to understand it.

Stephen Tashi · Nov 5, 2011

You could post questions about statements made by your lecturer using his own notation if you don't understand the lectures.

Searching the web for a few minutes, I found this link:

http://www.biomedware.com/files/documentation/spacestat/Statistics/Regression/Categorical_Data_in_Regression_Analyses.htm

The section "Reference cell vs. Effect cell parameterization" suggests that your problem could expect you encode the possible values of T by using n-1 dummy variables and let the case when all dummy variables are all 0 represent the "reference" case. It also explains the relationship between the coefficients [itex] b_i [/itex] in the regression and various means.

iii) what is b?

How does your lecturer use 'b' in other problems? Can 'b' be a matrix. Was there a subscript on 'b' that you left out?

Becoming Familiar with Regression Notation and Terminology

Related to Becoming Familiar with Regression Notation and Terminology

1. What is regression analysis?

2. What types of regression models are there?

3. How is the quality of a regression model assessed?

4. What is the difference between correlation and regression?

5. What is multicollinearity in regression?

Similar threads

Hot Threads

Recent Insights