Regression Question help

  • #1

Main Question or Discussion Point

Hi there.

I am having some trouble understanding the full context of this question.

Suppose we have a categorical variable T E (1...n) and we observe k observations for Y when T = n. If a regression model holds:

i) Write down Y in terms of dummy variables X1...Xi

ii) What is the design matrix X

iii) what is b?


So what does it mean when it says there are k observations when T = n? A categorical variable is sort of like a dummy variable right? As in using numbers to represent qualitative measurements such as hair colour etc. So does T only take one value from 1 to n or multiple values?

So if T = n, Y would be represented by B1X1 + ... + BnXn in terms of dummy variables?

how can there only be one b?

Thanks.
 

Answers and Replies

  • #2
Stephen Tashi
Science Advisor
7,017
1,241
Hi there.

I am having some trouble understanding the full context of this question.
I am also. Did this come from a text book or course? If so, what was the subject of the chapter?

Suppose we have a categorical variable T E (1...n)
Did you mean to write "[itex] T \in \{1,2,..n\} [/itex]"?

and we observe k observations for Y when T = n.
If you quoted that phrase accurately, I agree it is unclear. Could it have said
"we observe [itex] k_n [/itex] observations for [itex] Y [/itex] when [itex]T = n [/itex]"?


If a regression model holds:

i) Write down Y in terms of dummy variables X1...Xi
Dummy variables for categorical data are often defined as variables that take only the value 0 or 1. So perhaps you are being asked to encode the category as a vector [itex] (X_1,X_2,....X_n)[/itex] where [itex] X_c = 1 [/itex] when [itex] T = c[/itex] and the rest of the [itex] X_i [/itex] are zero.

What do your text materials say about doing regression with such variables?
 
  • #3
It is a homework question and about your points.

The first one is correct that is what i meant. The second one is also correct, i forgot to put the subscript n on the k.

As for the first part. I also made a mistake there. It asks to write down E(Y|T) in terms of X1...Xi dummy variables, not Y, I am not sure if it isthe right way to approach because that is what i thought as well, but wouldn't that matrix just be the design matrix like the second part is asking? I thought I would be writing it like E(Y|T) = b0 + b1x1 + b2x2 +... +bnxn?
 
  • #4
Stephen Tashi
Science Advisor
7,017
1,241
It is a homework question
You didn't say what the course was. Are you studying ANCOVA?

I can't give you much help on ANCOVA because I haven't looks at such material since the 1980's and I think I only took one course that would have included it. (Of course, I can look up things on the web and refresh or educate my mind, but if you are taking a course in this material, you should have been instructed about the fundamentals of it.

So what does it mean when it says there are k observations when T = n?
We've established that there are [itex] k_n [/itex] observations when [itex] T = n [/itex].
So my guess is that an example of the data is something like this:

[itex] T = 1, k_1 = 4, [/itex] observed Y values: (2.4, 3.2, 9.8, 3.2)
[itex] T = 2, k_2 = 4, [/itex] observed Y values: (3.3, 3.2, 1.7)
[itex] T = 3, k_2 = 6, [/itex] observed Y values ( 2.6, 3.3, 2.0,3.4, 2.2, 9.0)

A categorical variable is sort of like a dummy variable right?
I think they are different concepts. Don't your materials define them? Are you trying to work the problem without referring to your book or lecture notes?

As in using numbers to represent qualitative measurements such as hair colour etc. So does T only take one value from 1 to n or multiple values?
The problem says T (which is a variable) takes values only from 1 to n, but in each different situation , it may take a different one of those values.

Sometimes dummy variables X1,X2,...are only allowed to take the values of 0 or 1.
So when T = 2, we have X1 = 0, X2 = 1, X3 = 0. You'll have to see what your course materials say about this.
 
  • #5
This is a one way Anova problem. Ancova isn't covered just yet. Its a regression analysis course. I have my notes but im still confused and i cant find anything about this on the web really that explains it in the same notation or way my instructor does. I'm really trying to understand it.
 
  • #6
Stephen Tashi
Science Advisor
7,017
1,241
You could post questions about statements made by your lecturer using his own notation if you don't understand the lectures.

Searching the web for a few minutes, I found this link:

http://www.biomedware.com/files/documentation/spacestat/Statistics/Regression/Categorical_Data_in_Regression_Analyses.htm [Broken]

The section "Reference cell vs. Effect cell parameterization" suggests that your problem could expect you encode the possible values of T by using n-1 dummy variables and let the case when all dummy variables are all 0 represent the "reference" case. It also explains the relationship between the coefficients [itex] b_i [/itex] in the regression and various means.


iii) what is b?
How does your lecturer use 'b' in other problems? Can 'b' be a matrix. Was there a subscript on 'b' that you left out?
 
Last edited by a moderator:

Related Threads on Regression Question help

  • Last Post
Replies
1
Views
660
  • Last Post
Replies
3
Views
2K
Replies
7
Views
702
  • Last Post
Replies
2
Views
1K
  • Last Post
Replies
6
Views
962
  • Last Post
Replies
24
Views
2K
Replies
3
Views
1K
Replies
1
Views
881
Replies
3
Views
1K
Replies
5
Views
11K
Top