Derivation of least squares containing vectors

1. Feb 11, 2012

nobben

Hi,

I'm trying to derive a result from this paper:

http://www.research.yahoo.net/files/HuKorenVolinsky-ICDM08.pdf

given cost function

$$\min_{x_* , y_* } \sum_{u, i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( \sum_u ||x_u||^2 + \sum_i ||y_||^2 \right)$$

Where both xu and yi are vectors in ℝk.

I want to find the minimum by using alternating least squares. Therefore I fix y and find the derivative with respect to xu.

cui, pui and λ are constants.

They derive the following:

$$x_u = \left( Y^TC^uY + \lambda I \right) ^{-1} Y^T C^u p \left( u \right)$$

I'm unsure how reproduce this result...

I don't know if I should try to derive with respect to the whole vector xu or against one entry k (xuk) in the vector and then try to map this to a function for the whole vector?

Any pointers are very much appreciated.

2. Feb 12, 2012

nobben

I think I solved it. Including solution below if anyone is interested!

First we define a variable S_u below which is the cost function from Equation \eqref{eq:implicit-cost-function} but where we have fixed u so that we are calculating the predicted errors for one specific user.

\label{eq:cost-u}
S_u = \sum_{i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( || x_u ||^2 + \sum_i || y_i||^2 \right)

We now find the derivative of S_u with respect to x_u.
To help clarify the process we introduce a variable \epsilon_{i} below which is the prediction-error of item i with respect to user u.

\label{eq:pred-error}
\epsilon_{i} = p_{i} - \sum_{j=1}^{n} x_{uj} y_{ij}

We also calculate the derivative of \epsilon_{i} with respect to the j entries in the user factor x_uj.

\label{eq:error-deriv}
\frac{d \epsilon_{i}}{d x_{uj}} = - y_{ij}

We now rewrite using the \epsilon_i variable.
There are m users, n items and f is the length of the factor vectors.

S_u = \sum_{i=1}^{n} \left( c_i \epsilon_i^2 + \lambda \sum_{k=1}^f y_{ik} \right) + \lambda \sum_{p=1}^f x_{up}^2

Then we find the derivative.

\forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \epsilon_i \frac{d \epsilon_i}{d x_{uj}} \right) + 2 \lambda x_{uj}

Replace \epsilon_i and $$\frac{d \epsilon_i}{d x_{uj}}$$.

\forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}

Set the derivative to 0.

\forall[1 \leq j \leq f] : 0 = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}

Rearrange.

\forall[1 \leq j \leq f] : \sum_{i=1}^n \sum_{k=1}^{n} y_{ij} c_i y_{ik} x_{uk} + 2 \lambda x_{uj} = \sum_{i=1}^n c_i p_{i} y_{ij}

Rewrite using matrix notation.
\begin{eqnarray}
x_u (Y^TC^uY + \lambda I) & = & Y^T C^u p(u) \\
\label{eq:xu-diff}
x_u & = & (Y^TC^uY + \lambda I)^{-1}Y^T C^u p(u)
\end{eqnarray}
Y is the $n \times f$ matrix containing the user factors.
$C^u$ is the diagonal matrix containing the confidence values where $C^u_{ii} = c_{ui}$.
The vector $p(u)$ contains all the preference values $p_{ui}$ by user $u$.