# Multiple least squares regression

1. Apr 11, 2013

### zzmanzz

1. The problem statement, all variables and given/known data

design a regression model that will use the dataset

y trial x1 x2 x3

0.08536, 1, -1, -1, -1.00000
0.09026, 2, -1, -1, -1.00000
0.10188, 1, -1, -1, -0.33333
0.09301, 2, -1, -1, -0.33333
0.10362, 1, -1, -1, 0.33333
0.09920, 2, -1, -1, 0.33333
0.11033, 1, -1, -1, 1.00000
0.10744, 2, -1, -1, 1.00000
0.10172, 1, -1, 0, -1.00000
0.09360, 2, -1, 0, -1.00000
0.10800, 1, -1, 0, -0.33333
0.11685, 2, -1, 0, -0.33333
0.11002, 1, -1, 0, 0.33333
0.11221, 2, -1, 0, 0.33333
0.11533, 1, -1, 0, 1.00000
0.12328, 2, -1, 0, 1.00000
0.21908, 1, -1, 1, -1.00000
0.19675, 2, -1, 1, -1.00000
0.22744, 1, -1, 1, -0.33333
0.21138, 2, -1, 1, -0.33333
0.28118, 1, -1, 1, 0.33333
0.26413, 2, -1, 1, 0.33333
0.32416, 1, -1, 1, 1.00000
0.30590, 2, -1, 1, 1.00000
0.32390, 1, 1, -1, -1.00000
0.34938, 2, 1, -1, -1.00000
0.13669, 1, 1, -1, -0.33333
0.12953, 2, 1, -1, -0.33333
0.07987, 1, 1, -1, 0.33333
0.07884, 2, 1, -1, 0.33333
0.05959, 1, 1, -1, 1.00000
0.06172, 2, 1, -1, 1.00000
0.21624, 1, 1, 0, -1.00000
0.21925, 2, 1, 0, -1.00000
0.11777, 1, 1, 0, -0.33333
0.11127, 2, 1, 0, -0.33333
0.07338, 1, 1, 0, 0.33333
0.07354, 2, 1, 0, 0.33333
0.05601, 1, 1, 0, 1.00000
0.05622, 2, 1, 0, 1.00000
0.69966, 1, 1, 1, -1.00000
1.58131, 2, 1, 1, -1.00000
0.18522, 1, 1, 1, -0.33333
0.17043, 2, 1, 1, -0.33333
0.09530, 1, 1, 1, 0.33333
0.10060, 2, 1, 1, 0.33333
0.06655, 1, 1, 1, 1.00000
0.06814, 2, 1, 1, 1.00000

2. Relevant equations

I loaded the dataset and calculated

c = (X'*X)^(-1)*X' * y

where

X = [ones X1 X2 X3]

48*4 data matrix

y is a 48*1 column vector

solving for column vector c -> [c_o c_1 c_2 c_3]'

3. The attempt at a solution

I got the regression coefficients but the predictions are terrible for my model. Am I doing something wrong?

2. Apr 13, 2013

### BruceW

your method looks correct to me. It is not surprising that the predictions are not very good. You need to keep in mind that even though your method might be correct, it still may be terrible at making predictions. In this case, there are 3 'dimensions' and it seems that the input variables take on only a few different possible values. Maybe you can try plotting the data, looking at one dimension at a time, to see intuitively whether it looks linear or not.

edit: when I say 3 'dimensions', I mean the 3 input variables, for example, (temperature, size, colour) might be the three 'dimensions', i.e. the 3 input variables which correspond to a particular value of y. I thought I should say this, because I am not sure about how widely used the word 'dimensions' is, in this context.

3. Apr 13, 2013

### Ray Vickson

When you have such limited ranges of variables (values like -1, 0, 1, etc.) it starts to look like an experimental design problem for a *quadratic* fit. I suggest you re-run the model with added columns $x_2^2, x_3^2, x_1 x_2, x_1 x_3, x_2 x_3.$ That will give you a total of 1 + 3 + 2 + 3 = 9 terms in your expression for y. If you have the x-values already, you can (depending on the software you use) calculate those extra columns to add to the data set.

Note: the data set has only the two values -1 and +1 for $x_1$, so does not distinguish between 1 and $x_1^2$; that is why we omit $x_1^2$.

Last edited: Apr 13, 2013
4. Apr 13, 2013

### BruceW

mm, it depends on what kind of behaviour we would believe the underlying system actually has. Using a quadratic fit might give no better than a linear fit (maybe even worse). Trying a quadratic fit is a good way to extend the homework though.

(@zzmanzz) Also, when you said the predictions are 'terrible', they shouldn't be like ridiculously far off. How bad are the predictions compared to the range of the y data?

5. Apr 14, 2013

### Ray Vickson

The best quadratic fit cannot be worse than a linear fit; if the best quadratic happens to have zero coefficients for the squared or product terms, it will reduce to linear.

Anyway, no matter what the exact form is, when the x-values are limited one cannot tell the difference between a more general model and a quadratic. The variable $x_1$ just takes the two values +1 and -1, so any $f(x_1)$ is indistinguishable from a linear function. The variable $x_2$ takes only the three values -1, 0 and +1, so any $f(x_2)$ is indistinguishable from a quadratic. The variable $x_3$ takes 4 values, so one can think of going to a cubic, but that any other function $f(x_3)$ would give the same results.

Where we have some wriggle room is in the "interaction" terms: we could include terms like $x_1 x_2^2, x_1 x_3^2, x_2 x_3^2, x_2^2 x_3, x_1 x_2 x_3,$ etc. Including them would improve the fit, but whether or not the coefficients would be statistically significant is another matter.

6. Apr 14, 2013

### BruceW

Yeah, I wasn't clear on what I meant. I mean the prediction accuracy for the best quadratic fit can be worse than the prediction accuracy for the best linear fit. (Which is the most meaningful way to test a model, I think). For example, if the underlying data is y=mx+c, with some noise added, then the linear fit will give closer predictions, on average.

Ah, yeah that is a good point. As long as we treat the dimensions independently, and If the data in a certain dimension only take on n values, then a polynomial fit of degree n (for that dimension) will give the same predictions as a polynomial of any higher order. (assuming the data values we have already seen are the only possible data values, and I don't know what his data is coming from, but it does look that way).

So in this case, if we use a polynomial fit of higher order, we never make our model worse. In this situation, it will always give at least as good prediction as a lower order polynomial (under the conditions in my last paragraph). I guess this is a consequence of the fact that in this situation, a discrete distribution should be used really. But the problem was to use a regression model, and AFAIK, that implies using a continuous distribution. So I guess he should not use a discrete distribution.