A Linear Regression with Non Linear Basis Functions

joshthekid
Messages
46
Reaction score
1
So I am currently learning some regression techniques for my research and have been reading a text that describes linear regression in terms of basis functions. I got linear basis functions down and no exactly how to get there because I saw this a lot in my undergrad basically, in matrix notation
y=wTx
you then define your loss function as
1/n Σn(wi*xi-yi)2
then you take the partial derivatives with respect to w set it equal to zero and solve.

So now I want to use a non-linear basis functions, let's say I want to use m gaussians basis functions, φi, the procedure is the same but I am not sure exactly on the construction of the model. Let's say I have L features is the model equation of the form

ynmΣLwiφi(xj)

in other words I have created a linear combination of M new features, φ(x), which are constructed with all L of the previous features for each data point n:
yn=w0+w11(x1)+φ1(x2)...+...φ1(xL) ...+...wm1(x1)+φ2(x2)...+...φm(xL))

where xi are features / variables for my model and not data values? I hope this makes sense. Thanks in advance.
 
Physics news on Phys.org
The parameters you wish to estimate are the ##w_i## and the values ##(x_1,...,x_L)## are known for each data point?
 
micromass said:
The parameters you wish to estimate are the ##w_i## and the values ##(x_1,...,x_L)## are known for each data point?

That is correct.
 
Then you have a standard linear regression. Linear refers to the coefficients and not the functions used. Thus your loss function is again

L = \sum_{i=1}^n \left(y_i - w_0 - w_1\sum_k \phi_1(x_k) - w_2\sum_k \phi_2(x_k) - ... - w_N \sum_k \phi_N(x_k)\right)^2

and you minimize this by taking partial derivatives and setting them equal to ##0##. In matrix notation, you let ##Y## by the column matrix with entries the ##y_i## and you let ##X## be the design matrix whose ##i##th row is
\left(1~~\sum_k \phi_1(x_k)~~ ...~~\sum_k \phi_N(x_k)\right)
The coefficients are then ##W = (X^TX)^{-1} X^T Y##.
 
micromass said:
Then you have a standard linear regression. Linear refers to the coefficients and not the functions used. Thus your loss function is again

L = \sum_{i=1}^n \left(y_i - w_0 - w_1\sum_k \phi_1(x_k) - w_2\sum_k \phi_2(x_k) - ... - w_N \sum_k \phi_N(x_k)\right)^2

and you minimize this by taking partial derivatives and setting them equal to ##0##. In matrix notation, you let ##Y## by the column matrix with entries the ##y_i## and you let ##X## be the design matrix whose ##i##th row is
\left(1~~\sum_k \phi_1(x_k)~~ ...~~\sum_k \phi_N(x_k)\right)
The coefficients are then ##W = (X^TX)^{-1} X^T Y##.
Great Thanks, this is what I thought it meant but the way you wrote it makes it lot clearer than the text I am using which has all formulas in matrix notation and it hard to tell if they are talking about a single random variable or a vector of random variables.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Back
Top