Derivation of least squares containing vectors

nobben · Feb 11, 2012

Hi,

I'm trying to derive a result from this paper:

http://www.research.yahoo.net/files/HuKorenVolinsky-ICDM08.pdf

given cost function

[tex] \min_{x_* , y_* } \sum_{u, i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( \sum_u ||x_u||^2 + \sum_i ||y_||^2 \right)[/tex]

Where both x_u and y_i are vectors in ℝ^k.

I want to find the minimum by using alternating least squares. Therefore I fix y and find the derivative with respect to x_u.

c_ui, p_ui and λ are constants.

They derive the following:

[tex]x_u = \left( Y^TC^uY + \lambda I \right) ^{-1} Y^T C^u p \left( u \right)[/tex]

I'm unsure how reproduce this result...

I don't know if I should try to derive with respect to the whole vector x_u or against one entry k (x_uk) in the vector and then try to map this to a function for the whole vector?

Any pointers are very much appreciated.

nobben · Feb 12, 2012

I think I solved it. Including solution below if anyone is interested!

First we define a variable S_u below which is the cost function from Equation \eqref{eq:implicit-cost-function} but where we have fixed u so that we are calculating the predicted errors for one specific user.

\begin{equation}
\label{eq:cost-u}
S_u = \sum_{i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( || x_u ||^2 + \sum_i || y_i||^2 \right)
\end{equation}

We now find the derivative of S_u with respect to x_u.
To help clarify the process we introduce a variable \epsilon_{i} below which is the prediction-error of item i with respect to user u.
\begin{equation}
\label{eq:pred-error}
\epsilon_{i} = p_{i} - \sum_{j=1}^{n} x_{uj} y_{ij}
\end{equation}
We also calculate the derivative of \epsilon_{i} with respect to the j entries in the user factor x_uj.
\begin{equation}
\label{eq:error-deriv}
\frac{d \epsilon_{i}}{d x_{uj}} = - y_{ij}
\end{equation}
We now rewrite using the \epsilon_i variable.
There are m users, n items and f is the length of the factor vectors.
\begin{equation}
S_u = \sum_{i=1}^{n} \left( c_i \epsilon_i^2 + \lambda \sum_{k=1}^f y_{ik} \right) + \lambda \sum_{p=1}^f x_{up}^2
\end{equation}
Then we find the derivative.
\begin{equation}
\forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \epsilon_i \frac{d \epsilon_i}{d x_{uj}} \right) + 2 \lambda x_{uj}
\end{equation}
Replace \epsilon_i and [tex]\frac{d \epsilon_i}{d x_{uj}}[/tex].
\begin{equation}
\forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}
\end{equation}
Set the derivative to 0.
\begin{equation}
\forall[1 \leq j \leq f] : 0 = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}
\end{equation}
Rearrange.
\begin{equation}
\forall[1 \leq j \leq f] : \sum_{i=1}^n \sum_{k=1}^{n} y_{ij} c_i y_{ik} x_{uk} + 2 \lambda x_{uj} = \sum_{i=1}^n c_i p_{i} y_{ij}
\end{equation}
Rewrite using matrix notation.
\begin{eqnarray}
x_u (Y^TC^uY + \lambda I) & = & Y^T C^u p(u) \\
\label{eq:xu-diff}
x_u & = & (Y^TC^uY + \lambda I)^{-1}Y^T C^u p(u)
\end{eqnarray}
Y is the $n \times f$ matrix containing the user factors.
$C^u$ is the diagonal matrix containing the confidence values where $C^u_{ii} = c_{ui}$.
The vector $p(u)$ contains all the preference values $p_{ui}$ by user $u$.

Derivation of least squares containing vectors

Similar threads

Undergrad Finding the minimum distance between two curves

Undergrad Proving that convexity implies second order derivative being positive

High School Straightforward integration…

Undergrad Why ##a^0=1##?

High School Arc Length for Hyperbolic Sin

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight