Optimizing Linear Regression Cost Function

Dethrone · Jan 25, 2017

I'm trying to optimize the function below, but I'm not sure where I made a mistake. (This is an application in machine learning.)$$J(\theta)=\sum_{i=1}^n \left(\sum_{j=1}^{k}(\theta^Tx^{(i)}-y^{(i)})_j^2\right)$$
where $\theta$ is a $n$ by $k$ matrix and $x$ is a $n$ by 1 matrix.

$$=\sum_{i=1}^n \left((\theta^Tx^{(i)}-y^{(i)})_1^2+...+(\theta^Tx^{(i)}-y^{(i)})_k^2 \right)$$

Differentiating,
$$\pd{J}{\theta_{pq}}=2\sum_{i=1}^n \left( (\theta^Tx^{(i)}-y^{(i)})_1\pd{}{\theta_{pq}}(\theta^Tx^{(i)})_1+...+(\theta^Tx^{(i)}-y^{(i)})_k\pd{}{\theta_{pq}}(\theta^Tx^{(i)})_k\right)$$
But, if we look at the first term, $(\theta^Tx^{(i)}-y^{(i)})_1$ is a $k$ by 1 vector and $\pd{}{\theta_{pq}}(\theta^Tx^{(i)})$ is also a $k$ by 1 vector, so we can't multiply them together...(maybe unless we use tensors...). Where did I make a mistake?

Dethrone · Jan 25, 2017

Nevermind. It turns out I misunderstood my prof's notation.
$\left(\sum_{j=1}^{k}(\theta^Tx^{(i)}-y^{(i)})_j^2\right)$ is apparently summing up the elements in the column vector, and not a sum of column vectors. Not sure if this is standard notation but it wasn't apparent for me.

But, even if I interpret it the way I did in the original post, where is the mistake I made? I'm curious.

I like Serena · Jan 26, 2017

Hey Rido12! ;)

When we write $x^2$ aren't we multiplying two k by 1 vectors as well?
However, what is meant, is $x^Tx$.
When we differentiate we should apply the product rule to it.

As for summing over j, we're really summing independent measurements.
They can indeed be organized as k by 1 columns in a large matrix.
Still, that's separate from the 'mistake' you mentioned.

Dethrone · Jan 26, 2017

I like Serena said:

Hey Rido12! ;)

When we write $x^2$ aren't we multiplying two k by 1 vectors as well?
However, what is meant, is $x^Tx$.
When we differentiate we should apply the product rule to it.

As for summing over j, we're really summing independent measurements.
They can indeed be organized as k by 1 columns in a large matrix.
Still, that's separate from the 'mistake' you mentioned.

Hi I like Serena!

Thanks! Can you clarify which terms you were referring to that required the product rule?

EDIT:

Oh, are you talking about $(\theta^Tx^{(i)}-y^{(i)})_1^2=(\theta^Tx^{(i)}-y^{(i)})_1^T (\theta^Tx^{(i)}-y^{(i)})_1$, then apply the product rule from here? Now I feel rather silly for forgetting that (Smoking)

Optimizing Linear Regression Cost Function

1. What is a cost function in linear regression?

2. Why is it important to optimize the cost function in linear regression?

3. How do you optimize the cost function in linear regression?

4. What are the drawbacks of optimizing the cost function in linear regression?

5. Can the cost function be used to evaluate the performance of a linear regression model?

Similar threads

Hot Threads

Recent Insights