A Linear Regression with Non Linear Basis Functions

joshthekid · Jun 27, 2016

So I am currently learning some regression techniques for my research and have been reading a text that describes linear regression in terms of basis functions. I got linear basis functions down and no exactly how to get there because I saw this a lot in my undergrad basically, in matrix notation
y=w^Tx
you then define your loss function as
1/n Σⁿ(w_i*x_i-y_i)²
then you take the partial derivatives with respect to w set it equal to zero and solve.

So now I want to use a non-linear basis functions, let's say I want to use m gaussians basis functions, φ_i, the procedure is the same but I am not sure exactly on the construction of the model. Let's say I have L features is the model equation of the form

y_n=Σ^mΣ^Lw_iφ_i(x_j)

in other words I have created a linear combination of M new features, φ(x), which are constructed with all L of the previous features for each data point n:
y_n=w₀+w₁(φ₁(x₁)+φ₁(x₂)...+...φ₁(x_L) ...+...w_m(φ₁(x₁)+φ₂(x₂)...+...φ_m(x_L))

where x_i are features / variables for my model and not data values? I hope this makes sense. Thanks in advance.

micromass · Jun 27, 2016

The parameters you wish to estimate are the ##w_i## and the values ##(x_1,...,x_L)## are known for each data point?

joshthekid · Jun 27, 2016

micromass said:

The parameters you wish to estimate are the ##w_i## and the values ##(x_1,...,x_L)## are known for each data point?

That is correct.

micromass · Jun 27, 2016

Then you have a standard linear regression. Linear refers to the coefficients and not the functions used. Thus your loss function is again

L = \sum_{i=1}^n \left(y_i - w_0 - w_1\sum_k \phi_1(x_k) - w_2\sum_k \phi_2(x_k) - ... - w_N \sum_k \phi_N(x_k)\right)^2

and you minimize this by taking partial derivatives and setting them equal to ##0##. In matrix notation, you let ##Y## by the column matrix with entries the ##y_i## and you let ##X## be the design matrix whose ##i##th row is
\left(1~~\sum_k \phi_1(x_k)~~ ...~~\sum_k \phi_N(x_k)\right)
The coefficients are then ##W = (X^TX)^{-1} X^T Y##.

joshthekid · Jun 27, 2016

micromass said:

Then you have a standard linear regression. Linear refers to the coefficients and not the functions used. Thus your loss function is again

L = \sum_{i=1}^n \left(y_i - w_0 - w_1\sum_k \phi_1(x_k) - w_2\sum_k \phi_2(x_k) - ... - w_N \sum_k \phi_N(x_k)\right)^2

and you minimize this by taking partial derivatives and setting them equal to ##0##. In matrix notation, you let ##Y## by the column matrix with entries the ##y_i## and you let ##X## be the design matrix whose ##i##th row is
\left(1~~\sum_k \phi_1(x_k)~~ ...~~\sum_k \phi_N(x_k)\right)
The coefficients are then ##W = (X^TX)^{-1} X^T Y##.

Great Thanks, this is what I thought it meant but the way you wrote it makes it lot clearer than the text I am using which has all formulas in matrix notation and it hard to tell if they are talking about a single random variable or a vector of random variables.

A Linear Regression with Non Linear Basis Functions

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Roulette wheel physics and probability'

Thread 'Here's a Statistics problem for game of Polo (or Hockey if you like)'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective