Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Derivation of least squares containing vectors

  1. Feb 11, 2012 #1

    I'm trying to derive a result from this paper:


    given cost function

    \min_{x_* , y_* } \sum_{u, i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( \sum_u ||x_u||^2 + \sum_i ||y_||^2 \right)

    Where both xu and yi are vectors in ℝk.

    I want to find the minimum by using alternating least squares. Therefore I fix y and find the derivative with respect to xu.

    cui, pui and λ are constants.

    They derive the following:

    [tex]x_u = \left( Y^TC^uY + \lambda I \right) ^{-1} Y^T C^u p \left( u \right)[/tex]

    I'm unsure how reproduce this result...

    I don't know if I should try to derive with respect to the whole vector xu or against one entry k (xuk) in the vector and then try to map this to a function for the whole vector?

    Any pointers are very much appreciated.
  2. jcsd
  3. Feb 12, 2012 #2
    I think I solved it. Including solution below if anyone is interested!

    First we define a variable S_u below which is the cost function from Equation \eqref{eq:implicit-cost-function} but where we have fixed u so that we are calculating the predicted errors for one specific user.

    S_u = \sum_{i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( || x_u ||^2 + \sum_i || y_i||^2 \right)

    We now find the derivative of S_u with respect to x_u.
    To help clarify the process we introduce a variable \epsilon_{i} below which is the prediction-error of item i with respect to user u.
    \epsilon_{i} = p_{i} - \sum_{j=1}^{n} x_{uj} y_{ij}
    We also calculate the derivative of \epsilon_{i} with respect to the j entries in the user factor x_uj.
    \frac{d \epsilon_{i}}{d x_{uj}} = - y_{ij}
    We now rewrite using the \epsilon_i variable.
    There are m users, n items and f is the length of the factor vectors.
    S_u = \sum_{i=1}^{n} \left( c_i \epsilon_i^2 + \lambda \sum_{k=1}^f y_{ik} \right) + \lambda \sum_{p=1}^f x_{up}^2
    Then we find the derivative.
    \forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \epsilon_i \frac{d \epsilon_i}{d x_{uj}} \right) + 2 \lambda x_{uj}
    Replace \epsilon_i and [tex]\frac{d \epsilon_i}{d x_{uj}}[/tex].
    \forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}
    Set the derivative to 0.
    \forall[1 \leq j \leq f] : 0 = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}
    \forall[1 \leq j \leq f] : \sum_{i=1}^n \sum_{k=1}^{n} y_{ij} c_i y_{ik} x_{uk} + 2 \lambda x_{uj} = \sum_{i=1}^n c_i p_{i} y_{ij}
    Rewrite using matrix notation.
    x_u (Y^TC^uY + \lambda I) & = & Y^T C^u p(u) \\
    x_u & = & (Y^TC^uY + \lambda I)^{-1}Y^T C^u p(u)
    Y is the $n \times f$ matrix containing the user factors.
    $C^u$ is the diagonal matrix containing the confidence values where $C^u_{ii} = c_{ui}$.
    The vector $p(u)$ contains all the preference values $p_{ui}$ by user $u$.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook