Becoming Familiar with Regression Notation and Terminology

  • Thread starter AndreTheGiant
  • Start date
  • Tags
    Regression
In summary: I don't think I have ever seen a notation like this before. Can someone please help me understand it?orIn summary, the problem is asking for the design matrix for a regression model when the data only has k observations when T = n. The first part of the problem is correctly stating that the data would be represented by B1X1 + ... + BnXn in terms of dummy variables, but the second part is asking for the b0 + b1x1 + b2x2 +...+bnxn to be written in terms of the dummy variables.
  • #1
AndreTheGiant
28
0
Hi there.

I am having some trouble understanding the full context of this question.

Suppose we have a categorical variable T E (1...n) and we observe k observations for Y when T = n. If a regression model holds:

i) Write down Y in terms of dummy variables X1...Xi

ii) What is the design matrix X

iii) what is b?


So what does it mean when it says there are k observations when T = n? A categorical variable is sort of like a dummy variable right? As in using numbers to represent qualitative measurements such as hair colour etc. So does T only take one value from 1 to n or multiple values?

So if T = n, Y would be represented by B1X1 + ... + BnXn in terms of dummy variables?

how can there only be one b?

Thanks.
 
Physics news on Phys.org
  • #2
AndreTheGiant said:
Hi there.

I am having some trouble understanding the full context of this question.

I am also. Did this come from a textbook or course? If so, what was the subject of the chapter?

Suppose we have a categorical variable T E (1...n)

Did you mean to write "[itex] T \in \{1,2,..n\} [/itex]"?

and we observe k observations for Y when T = n.

If you quoted that phrase accurately, I agree it is unclear. Could it have said
"we observe [itex] k_n [/itex] observations for [itex] Y [/itex] when [itex]T = n [/itex]"?


If a regression model holds:

i) Write down Y in terms of dummy variables X1...Xi

Dummy variables for categorical data are often defined as variables that take only the value 0 or 1. So perhaps you are being asked to encode the category as a vector [itex] (X_1,X_2,...X_n)[/itex] where [itex] X_c = 1 [/itex] when [itex] T = c[/itex] and the rest of the [itex] X_i [/itex] are zero.

What do your text materials say about doing regression with such variables?
 
  • #3
It is a homework question and about your points.

The first one is correct that is what i meant. The second one is also correct, i forgot to put the subscript n on the k.

As for the first part. I also made a mistake there. It asks to write down E(Y|T) in terms of X1...Xi dummy variables, not Y, I am not sure if it isthe right way to approach because that is what i thought as well, but wouldn't that matrix just be the design matrix like the second part is asking? I thought I would be writing it like E(Y|T) = b0 + b1x1 + b2x2 +... +bnxn?
 
  • #4
AndreTheGiant said:
It is a homework question
You didn't say what the course was. Are you studying ANCOVA?

I can't give you much help on ANCOVA because I haven't looks at such material since the 1980's and I think I only took one course that would have included it. (Of course, I can look up things on the web and refresh or educate my mind, but if you are taking a course in this material, you should have been instructed about the fundamentals of it.

So what does it mean when it says there are k observations when T = n?
We've established that there are [itex] k_n [/itex] observations when [itex] T = n [/itex].
So my guess is that an example of the data is something like this:

[itex] T = 1, k_1 = 4, [/itex] observed Y values: (2.4, 3.2, 9.8, 3.2)
[itex] T = 2, k_2 = 4, [/itex] observed Y values: (3.3, 3.2, 1.7)
[itex] T = 3, k_2 = 6, [/itex] observed Y values ( 2.6, 3.3, 2.0,3.4, 2.2, 9.0)

A categorical variable is sort of like a dummy variable right?
I think they are different concepts. Don't your materials define them? Are you trying to work the problem without referring to your book or lecture notes?

As in using numbers to represent qualitative measurements such as hair colour etc. So does T only take one value from 1 to n or multiple values?

The problem says T (which is a variable) takes values only from 1 to n, but in each different situation , it may take a different one of those values.

Sometimes dummy variables X1,X2,...are only allowed to take the values of 0 or 1.
So when T = 2, we have X1 = 0, X2 = 1, X3 = 0. You'll have to see what your course materials say about this.
 
  • #5
This is a one way Anova problem. Ancova isn't covered just yet. Its a regression analysis course. I have my notes but I am still confused and i can't find anything about this on the web really that explains it in the same notation or way my instructor does. I'm really trying to understand it.
 
  • #6
You could post questions about statements made by your lecturer using his own notation if you don't understand the lectures.

Searching the web for a few minutes, I found this link:

http://www.biomedware.com/files/documentation/spacestat/Statistics/Regression/Categorical_Data_in_Regression_Analyses.htm

The section "Reference cell vs. Effect cell parameterization" suggests that your problem could expect you encode the possible values of T by using n-1 dummy variables and let the case when all dummy variables are all 0 represent the "reference" case. It also explains the relationship between the coefficients [itex] b_i [/itex] in the regression and various means.


iii) what is b?

How does your lecturer use 'b' in other problems? Can 'b' be a matrix. Was there a subscript on 'b' that you left out?
 
Last edited by a moderator:

Related to Becoming Familiar with Regression Notation and Terminology

1. What is regression analysis?

Regression analysis is a statistical method used to examine the relationship between a dependent variable and one or more independent variables. It is commonly used to predict the value of the dependent variable based on the values of the independent variables.

2. What types of regression models are there?

There are several types of regression models, including linear regression, logistic regression, polynomial regression, and multivariate regression. The type of regression model used depends on the type of data and the research question being addressed.

3. How is the quality of a regression model assessed?

The quality of a regression model is typically assessed by examining the goodness of fit, or how well the model fits the data. This can be determined by looking at the coefficient of determination (R-squared), mean squared error, and residual plots.

4. What is the difference between correlation and regression?

While correlation and regression are related, they serve different purposes. Correlation measures the strength and direction of the relationship between two variables, while regression is used to predict the value of a dependent variable based on the values of one or more independent variables.

5. What is multicollinearity in regression?

Multicollinearity occurs when there is a high correlation between two or more independent variables in a regression model. This can cause issues with the interpretation of the coefficients and can lead to inaccurate predictions. It is important to check for multicollinearity and address it before finalizing a regression model.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
323
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
914
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Back
Top