Optimizing Linear Regression Cost Function

In summary, the conversation discusses optimizing a function in machine learning represented by the formula $J(\theta)=\sum_{i=1}^n \left(\sum_{j=1}^{k}(\theta^Tx^{(i)}-y^{(i)})_j^2\right)$ where $\theta$ is a $n$ by $k$ matrix and $x$ is a $n$ by 1 matrix. The individual steps for differentiating the function are also discussed, including the use of the product rule. There was initially confusion about the notation used for the summation, but it was later clarified that it represents summing up the elements in a column vector rather than a sum of column vectors.
  • #1
Dethrone
717
0
I'm trying to optimize the function below, but I'm not sure where I made a mistake. (This is an application in machine learning.)$$J(\theta)=\sum_{i=1}^n \left(\sum_{j=1}^{k}(\theta^Tx^{(i)}-y^{(i)})_j^2\right)$$
where $\theta$ is a $n$ by $k$ matrix and $x$ is a $n$ by 1 matrix.

$$=\sum_{i=1}^n \left((\theta^Tx^{(i)}-y^{(i)})_1^2+...+(\theta^Tx^{(i)}-y^{(i)})_k^2 \right)$$

Differentiating,
$$\pd{J}{\theta_{pq}}=2\sum_{i=1}^n \left( (\theta^Tx^{(i)}-y^{(i)})_1\pd{}{\theta_{pq}}(\theta^Tx^{(i)})_1+...+(\theta^Tx^{(i)}-y^{(i)})_k\pd{}{\theta_{pq}}(\theta^Tx^{(i)})_k\right)$$
But, if we look at the first term, $(\theta^Tx^{(i)}-y^{(i)})_1$ is a $k$ by 1 vector and $\pd{}{\theta_{pq}}(\theta^Tx^{(i)})$ is also a $k$ by 1 vector, so we can't multiply them together...(maybe unless we use tensors...). Where did I make a mistake?
 
Physics news on Phys.org
  • #2
Nevermind. It turns out I misunderstood my prof's notation.
$\left(\sum_{j=1}^{k}(\theta^Tx^{(i)}-y^{(i)})_j^2\right)$ is apparently summing up the elements in the column vector, and not a sum of column vectors. Not sure if this is standard notation but it wasn't apparent for me.

But, even if I interpret it the way I did in the original post, where is the mistake I made? I'm curious.
 
Last edited:
  • #3
Hey Rido12! ;)

When we write $x^2$ aren't we multiplying two k by 1 vectors as well?
However, what is meant, is $x^Tx$.
When we differentiate we should apply the product rule to it.

As for summing over j, we're really summing independent measurements.
They can indeed be organized as k by 1 columns in a large matrix.
Still, that's separate from the 'mistake' you mentioned.
 
  • #4
I like Serena said:
Hey Rido12! ;)

When we write $x^2$ aren't we multiplying two k by 1 vectors as well?
However, what is meant, is $x^Tx$.
When we differentiate we should apply the product rule to it.

As for summing over j, we're really summing independent measurements.
They can indeed be organized as k by 1 columns in a large matrix.
Still, that's separate from the 'mistake' you mentioned.

Hi I like Serena!

Thanks! Can you clarify which terms you were referring to that required the product rule?

EDIT:

Oh, are you talking about $(\theta^Tx^{(i)}-y^{(i)})_1^2=(\theta^Tx^{(i)}-y^{(i)})_1^T (\theta^Tx^{(i)}-y^{(i)})_1$, then apply the product rule from here? Now I feel rather silly for forgetting that (Smoking)
 
Last edited:

1. What is a cost function in linear regression?

The cost function in linear regression is a mathematical representation of the error between the predicted values and the actual values of a dataset. It measures how well the line of best fit fits the data points and is typically represented as the sum of squared errors.

2. Why is it important to optimize the cost function in linear regression?

Optimizing the cost function in linear regression is important because it allows us to find the line of best fit that minimizes the error between the predicted values and the actual values. This results in a more accurate and reliable model for making predictions.

3. How do you optimize the cost function in linear regression?

The most common method for optimizing the cost function in linear regression is gradient descent. This involves iteratively adjusting the parameters of the regression line until the cost function is minimized. Other methods include using closed-form equations such as the normal equation.

4. What are the drawbacks of optimizing the cost function in linear regression?

One drawback of optimizing the cost function in linear regression is that it can be computationally expensive, especially for large datasets. Additionally, if the data is not well-suited for linear regression, optimizing the cost function may not result in an accurate model.

5. Can the cost function be used to evaluate the performance of a linear regression model?

Yes, the cost function can be used to evaluate the performance of a linear regression model. A lower cost function value indicates a better fit of the regression line to the data, while a higher value indicates a poorer fit. However, it is important to also consider other metrics, such as R-squared, to fully evaluate the performance of a linear regression model.

Similar threads

Replies
3
Views
1K
Replies
4
Views
360
Replies
5
Views
1K
Replies
3
Views
1K
Replies
2
Views
890
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
964
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
632
  • Calculus
Replies
3
Views
791
Back
Top