Regression Question help

1. Nov 4, 2011

AndreTheGiant

Hi there.

I am having some trouble understanding the full context of this question.

Suppose we have a categorical variable T E (1...n) and we observe k observations for Y when T = n. If a regression model holds:

i) Write down Y in terms of dummy variables X1...Xi

ii) What is the design matrix X

iii) what is b?

So what does it mean when it says there are k observations when T = n? A categorical variable is sort of like a dummy variable right? As in using numbers to represent qualitative measurements such as hair colour etc. So does T only take one value from 1 to n or multiple values?

So if T = n, Y would be represented by B1X1 + ... + BnXn in terms of dummy variables?

how can there only be one b?

Thanks.

2. Nov 4, 2011

Stephen Tashi

I am also. Did this come from a text book or course? If so, what was the subject of the chapter?

Did you mean to write "$T \in \{1,2,..n\}$"?

If you quoted that phrase accurately, I agree it is unclear. Could it have said
"we observe $k_n$ observations for $Y$ when $T = n$"?

Dummy variables for categorical data are often defined as variables that take only the value 0 or 1. So perhaps you are being asked to encode the category as a vector $(X_1,X_2,....X_n)$ where $X_c = 1$ when $T = c$ and the rest of the $X_i$ are zero.

What do your text materials say about doing regression with such variables?

3. Nov 4, 2011

AndreTheGiant

The first one is correct that is what i meant. The second one is also correct, i forgot to put the subscript n on the k.

As for the first part. I also made a mistake there. It asks to write down E(Y|T) in terms of X1...Xi dummy variables, not Y, I am not sure if it isthe right way to approach because that is what i thought as well, but wouldn't that matrix just be the design matrix like the second part is asking? I thought I would be writing it like E(Y|T) = b0 + b1x1 + b2x2 +... +bnxn?

4. Nov 5, 2011

Stephen Tashi

You didn't say what the course was. Are you studying ANCOVA?

I can't give you much help on ANCOVA because I haven't looks at such material since the 1980's and I think I only took one course that would have included it. (Of course, I can look up things on the web and refresh or educate my mind, but if you are taking a course in this material, you should have been instructed about the fundamentals of it.

We've established that there are $k_n$ observations when $T = n$.
So my guess is that an example of the data is something like this:

$T = 1, k_1 = 4,$ observed Y values: (2.4, 3.2, 9.8, 3.2)
$T = 2, k_2 = 4,$ observed Y values: (3.3, 3.2, 1.7)
$T = 3, k_2 = 6,$ observed Y values ( 2.6, 3.3, 2.0,3.4, 2.2, 9.0)

I think they are different concepts. Don't your materials define them? Are you trying to work the problem without referring to your book or lecture notes?

The problem says T (which is a variable) takes values only from 1 to n, but in each different situation , it may take a different one of those values.

Sometimes dummy variables X1,X2,...are only allowed to take the values of 0 or 1.
So when T = 2, we have X1 = 0, X2 = 1, X3 = 0. You'll have to see what your course materials say about this.

5. Nov 5, 2011

AndreTheGiant

This is a one way Anova problem. Ancova isn't covered just yet. Its a regression analysis course. I have my notes but im still confused and i cant find anything about this on the web really that explains it in the same notation or way my instructor does. I'm really trying to understand it.

6. Nov 5, 2011

Stephen Tashi

You could post questions about statements made by your lecturer using his own notation if you don't understand the lectures.

Searching the web for a few minutes, I found this link:

http://www.biomedware.com/files/documentation/spacestat/Statistics/Regression/Categorical_Data_in_Regression_Analyses.htm [Broken]

The section "Reference cell vs. Effect cell parameterization" suggests that your problem could expect you encode the possible values of T by using n-1 dummy variables and let the case when all dummy variables are all 0 represent the "reference" case. It also explains the relationship between the coefficients $b_i$ in the regression and various means.

How does your lecturer use 'b' in other problems? Can 'b' be a matrix. Was there a subscript on 'b' that you left out?

Last edited by a moderator: May 5, 2017