Derivation of least squares containing vectors

nobben · Feb 11, 2012

Hi,

I'm trying to derive a result from this paper:

http://www.research.yahoo.net/files/HuKorenVolinsky-ICDM08.pdf

given cost function

<br /> \min_{x_* , y_* } \sum_{u, i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( \sum_u ||x_u||^2 + \sum_i ||y_||^2 \right)<br />

Where both x_u and y_i are vectors in ℝ^k.

I want to find the minimum by using alternating least squares. Therefore I fix y and find the derivative with respect to x_u.

c_ui, p_ui and λ are constants.

They derive the following:

x_u = \left( Y^TC^uY + \lambda I \right) ^{-1} Y^T C^u p \left( u \right)

I'm unsure how reproduce this result...

I don't know if I should try to derive with respect to the whole vector x_u or against one entry k (x_uk) in the vector and then try to map this to a function for the whole vector?

Any pointers are very much appreciated.

nobben · Feb 12, 2012

I think I solved it. Including solution below if anyone is interested!

First we define a variable S_u below which is the cost function from Equation \eqref{eq:implicit-cost-function} but where we have fixed u so that we are calculating the predicted errors for one specific user.

\begin{equation}
\label{eq:cost-u}
S_u = \sum_{i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( || x_u ||^2 + \sum_i || y_i||^2 \right)
\end{equation}

We now find the derivative of S_u with respect to x_u.
To help clarify the process we introduce a variable \epsilon_{i} below which is the prediction-error of item i with respect to user u.
\begin{equation}
\label{eq:pred-error}
\epsilon_{i} = p_{i} - \sum_{j=1}^{n} x_{uj} y_{ij}
\end{equation}
We also calculate the derivative of \epsilon_{i} with respect to the j entries in the user factor x_uj.
\begin{equation}
\label{eq:error-deriv}
\frac{d \epsilon_{i}}{d x_{uj}} = - y_{ij}
\end{equation}
We now rewrite using the \epsilon_i variable.
There are m users, n items and f is the length of the factor vectors.
\begin{equation}
S_u = \sum_{i=1}^{n} \left( c_i \epsilon_i^2 + \lambda \sum_{k=1}^f y_{ik} \right) + \lambda \sum_{p=1}^f x_{up}^2
\end{equation}
Then we find the derivative.
\begin{equation}
\forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \epsilon_i \frac{d \epsilon_i}{d x_{uj}} \right) + 2 \lambda x_{uj}
\end{equation}
Replace \epsilon_i and \frac{d \epsilon_i}{d x_{uj}}.
\begin{equation}
\forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}
\end{equation}
Set the derivative to 0.
\begin{equation}
\forall[1 \leq j \leq f] : 0 = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}
\end{equation}
Rearrange.
\begin{equation}
\forall[1 \leq j \leq f] : \sum_{i=1}^n \sum_{k=1}^{n} y_{ij} c_i y_{ik} x_{uk} + 2 \lambda x_{uj} = \sum_{i=1}^n c_i p_{i} y_{ij}
\end{equation}
Rewrite using matrix notation.
\begin{eqnarray}
x_u (Y^TC^uY + \lambda I) & = & Y^T C^u p(u) \\
\label{eq:xu-diff}
x_u & = & (Y^TC^uY + \lambda I)^{-1}Y^T C^u p(u)
\end{eqnarray}
Y is the $n \times f$ matrix containing the user factors.
$C^u$ is the diagonal matrix containing the confidence values where $C^u_{ii} = c_{ui}$.
The vector $p(u)$ contains all the preference values $p_{ui}$ by user $u$.

Derivation of least squares containing vectors

Thread 'How to Find Critical Points of function f(x,y,z)'

Similar threads

Hot Threads

I Problem in understanding instantaneous velocity

I How to find the path if we only know the velocity (without common formulas)?

I Explicit logical justification for last step in epsilon/delta proof?

A Getting the power spectral density from a plot

A How to Find Critical Points of function f(x,y,z)

Recent Insights

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers