Derivation of least squares containing vectors

Click For Summary
SUMMARY

The discussion focuses on deriving the least squares solution for a cost function involving user and item vectors in the context of collaborative filtering. The cost function is defined as min_{x_*, y_*} ∑_{u,i} c_{ui} (p_{ui} - x^T_u y_i)^2 + λ(∑_u ||x_u||^2 + ∑_i ||y_i||^2). The user successfully derives the formula x_u = (Y^T C^u Y + λI)^{-1} Y^T C^u p(u) using alternating least squares by fixing the item vector and calculating the derivative with respect to the user vector. The discussion emphasizes the importance of understanding matrix notation and the role of confidence values in the derivation process.

PREREQUISITES
  • Understanding of alternating least squares (ALS) algorithm
  • Familiarity with matrix calculus and derivatives
  • Knowledge of collaborative filtering techniques
  • Proficiency in linear algebra, specifically vector and matrix operations
NEXT STEPS
  • Study the implementation of the Alternating Least Squares algorithm in libraries like Apache Spark MLlib
  • Learn about matrix factorization techniques in collaborative filtering
  • Explore the role of regularization parameters in machine learning models
  • Investigate the application of confidence matrices in recommender systems
USEFUL FOR

Data scientists, machine learning engineers, and researchers focusing on collaborative filtering and recommendation systems will benefit from this discussion, particularly those interested in the mathematical foundations of least squares optimization.

nobben
Messages
2
Reaction score
0
Hi,

I'm trying to derive a result from this paper:

http://www.research.yahoo.net/files/HuKorenVolinsky-ICDM08.pdf

given cost function

<br /> \min_{x_* , y_* } \sum_{u, i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( \sum_u ||x_u||^2 + \sum_i ||y_||^2 \right)<br />

Where both xu and yi are vectors in ℝk.

I want to find the minimum by using alternating least squares. Therefore I fix y and find the derivative with respect to xu.

cui, pui and λ are constants.

They derive the following:

x_u = \left( Y^TC^uY + \lambda I \right) ^{-1} Y^T C^u p \left( u \right)

I'm unsure how reproduce this result...

I don't know if I should try to derive with respect to the whole vector xu or against one entry k (xuk) in the vector and then try to map this to a function for the whole vector?

Any pointers are very much appreciated.
 
Physics news on Phys.org
I think I solved it. Including solution below if anyone is interested!




First we define a variable S_u below which is the cost function from Equation \eqref{eq:implicit-cost-function} but where we have fixed u so that we are calculating the predicted errors for one specific user.

\begin{equation}
\label{eq:cost-u}
S_u = \sum_{i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( || x_u ||^2 + \sum_i || y_i||^2 \right)
\end{equation}

We now find the derivative of S_u with respect to x_u.
To help clarify the process we introduce a variable \epsilon_{i} below which is the prediction-error of item i with respect to user u.
\begin{equation}
\label{eq:pred-error}
\epsilon_{i} = p_{i} - \sum_{j=1}^{n} x_{uj} y_{ij}
\end{equation}
We also calculate the derivative of \epsilon_{i} with respect to the j entries in the user factor x_uj.
\begin{equation}
\label{eq:error-deriv}
\frac{d \epsilon_{i}}{d x_{uj}} = - y_{ij}
\end{equation}
We now rewrite using the \epsilon_i variable.
There are m users, n items and f is the length of the factor vectors.
\begin{equation}
S_u = \sum_{i=1}^{n} \left( c_i \epsilon_i^2 + \lambda \sum_{k=1}^f y_{ik} \right) + \lambda \sum_{p=1}^f x_{up}^2
\end{equation}
Then we find the derivative.
\begin{equation}
\forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \epsilon_i \frac{d \epsilon_i}{d x_{uj}} \right) + 2 \lambda x_{uj}
\end{equation}
Replace \epsilon_i and \frac{d \epsilon_i}{d x_{uj}}.
\begin{equation}
\forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}
\end{equation}
Set the derivative to 0.
\begin{equation}
\forall[1 \leq j \leq f] : 0 = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}
\end{equation}
Rearrange.
\begin{equation}
\forall[1 \leq j \leq f] : \sum_{i=1}^n \sum_{k=1}^{n} y_{ij} c_i y_{ik} x_{uk} + 2 \lambda x_{uj} = \sum_{i=1}^n c_i p_{i} y_{ij}
\end{equation}
Rewrite using matrix notation.
\begin{eqnarray}
x_u (Y^TC^uY + \lambda I) & = & Y^T C^u p(u) \\
\label{eq:xu-diff}
x_u & = & (Y^TC^uY + \lambda I)^{-1}Y^T C^u p(u)
\end{eqnarray}
Y is the $n \times f$ matrix containing the user factors.
$C^u$ is the diagonal matrix containing the confidence values where $C^u_{ii} = c_{ui}$.
The vector $p(u)$ contains all the preference values $p_{ui}$ by user $u$.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
9
Views
2K