Minimization and least squares/ridge regression

monkaez · Jul 18, 2012

Homework Statement

f(x;a) = x_o + (a_1,a_2,a_3,...a_d)*x

min a (Xa - Y)^t o^(-1) (Xa - Y)

a = (a_0 a_1 a_2 a_3 a_4 . . . a_d)^t

Homework Equations

Y = (y_1 y_2 y_ 3 ... y_k)

X = Dsign Matrix

The Attempt at a Solution

to minimize write

(X(a+ (delta a) - Y )^t o^(-1) (X (a+ delta a) - Y)

= (Xa - Y)^t o^(-1) (Xa - Y) + (delta a)^t X^t o^(-1) (Xa - Y) + (Xa -Y)^t o^(-1) X (delta a) + O((delta a)^t * (delta a))

= (Xa - Y)^t o^(-1) (Xa - Y) + 2*(delta a)^t X^t o^(-1) (Xa - Y) + + O((delta a)^t * (delta a))

a = (X^t o^(-1) X)^(-1) X^t o^(-1) YThis is directly out of my professors notes and I have no clue how this proves that the resulting product is always the minimum this way?

The above solution a is just the solution for standard least squares problem if o = o^2 * I

a = (X^t X)^(-1)*X^t*Y

I guess my main issue is understanding how the minimization process works and how he drops the terms in the O notation. Any input is greatly appreciated. (I tried using latex but the code doesn't manifest for me and I need to read more about it.)

mighty2000 · Jul 18, 2012

To understand why this solution is the minimum, we need to look at the properties of the least squares problem. The goal of the least squares problem is to find the values of the parameters (a) that minimize the sum of the squared errors between the predicted values (Xa) and the actual values (Y). In other words, we want to find the values of a that make the predicted values as close to the actual values as possible.

In the attempt at a solution, we see that the error term (Xa - Y) is multiplied by the inverse of the covariance matrix (o^(-1)) on both sides. This term is known as the weighted error, and it is used to give more weight to the errors that have smaller variances. This is important because it ensures that the parameters are not overly influenced by outliers or errors with large variances.

Next, we see that the error term is squared and summed, which is the same as taking the squared error between the predicted values and the actual values. This is the objective function that we want to minimize.

Now, let's look at the terms in the O notation. The O notation represents higher order terms that are negligible compared to the terms that are being kept. In other words, as the values of delta a get smaller, the O notation terms become less and less important and can be ignored. This is a common practice in mathematical proofs, as it simplifies the calculations and makes it easier to see the main steps.

Finally, we see that the solution for a is given by (X^t o^(-1) X)^(-1) X^t o^(-1) Y. This is known as the normal equation, and it is derived from setting the derivative of the objective function to 0. This solution is the minimum because it satisfies the necessary condition for a minimum, which is that the derivative of the objective function is equal to 0.

In summary, the solution for a is the minimum because it minimizes the sum of the squared errors, takes into account the variances of the errors, and satisfies the necessary condition for a minimum.

Minimization and least squares/ridge regression

Homework Statement

Homework Equations

The Attempt at a Solution

1. What is the purpose of minimization and least squares/ridge regression?

2. What is the difference between least squares and ridge regression?

3. How is the optimal value for the penalty term determined in ridge regression?

4. What are the assumptions of least squares/ridge regression?

5. Can ridge regression be used for feature selection?

Similar threads

Hot Threads

Recent Insights