MHB Optimizing Linear Regression Cost Function

Dethrone
Messages
716
Reaction score
0
I'm trying to optimize the function below, but I'm not sure where I made a mistake. (This is an application in machine learning.)$$J(\theta)=\sum_{i=1}^n \left(\sum_{j=1}^{k}(\theta^Tx^{(i)}-y^{(i)})_j^2\right)$$
where $\theta$ is a $n$ by $k$ matrix and $x$ is a $n$ by 1 matrix.

$$=\sum_{i=1}^n \left((\theta^Tx^{(i)}-y^{(i)})_1^2+...+(\theta^Tx^{(i)}-y^{(i)})_k^2 \right)$$

Differentiating,
$$\pd{J}{\theta_{pq}}=2\sum_{i=1}^n \left( (\theta^Tx^{(i)}-y^{(i)})_1\pd{}{\theta_{pq}}(\theta^Tx^{(i)})_1+...+(\theta^Tx^{(i)}-y^{(i)})_k\pd{}{\theta_{pq}}(\theta^Tx^{(i)})_k\right)$$
But, if we look at the first term, $(\theta^Tx^{(i)}-y^{(i)})_1$ is a $k$ by 1 vector and $\pd{}{\theta_{pq}}(\theta^Tx^{(i)})$ is also a $k$ by 1 vector, so we can't multiply them together...(maybe unless we use tensors...). Where did I make a mistake?
 
Physics news on Phys.org
Nevermind. It turns out I misunderstood my prof's notation.
$\left(\sum_{j=1}^{k}(\theta^Tx^{(i)}-y^{(i)})_j^2\right)$ is apparently summing up the elements in the column vector, and not a sum of column vectors. Not sure if this is standard notation but it wasn't apparent for me.

But, even if I interpret it the way I did in the original post, where is the mistake I made? I'm curious.
 
Last edited:
Hey Rido12! ;)

When we write $x^2$ aren't we multiplying two k by 1 vectors as well?
However, what is meant, is $x^Tx$.
When we differentiate we should apply the product rule to it.

As for summing over j, we're really summing independent measurements.
They can indeed be organized as k by 1 columns in a large matrix.
Still, that's separate from the 'mistake' you mentioned.
 
I like Serena said:
Hey Rido12! ;)

When we write $x^2$ aren't we multiplying two k by 1 vectors as well?
However, what is meant, is $x^Tx$.
When we differentiate we should apply the product rule to it.

As for summing over j, we're really summing independent measurements.
They can indeed be organized as k by 1 columns in a large matrix.
Still, that's separate from the 'mistake' you mentioned.

Hi I like Serena!

Thanks! Can you clarify which terms you were referring to that required the product rule?

EDIT:

Oh, are you talking about $(\theta^Tx^{(i)}-y^{(i)})_1^2=(\theta^Tx^{(i)}-y^{(i)})_1^T (\theta^Tx^{(i)}-y^{(i)})_1$, then apply the product rule from here? Now I feel rather silly for forgetting that (Smoking)
 
Last edited:
Back
Top