1. Limited time only! Sign up for a free 30min personal tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Sparsity of Support vector machines over an RKHS

  1. Jan 18, 2017 #1
    Im trying to solve the following problem from the book 'Learning with kernels', and would really appreciate a little help.

    Background information

    - Let $\{(x_{1},y_{1}),...,(x_{N},y_{N})\}$ be a dataset, L a Loss function and $H(k)$ a reproducing kernel Hilbert space with kernel $k$. The representer theorem states that the minimizer $f \in H(k)$ of the following optimization problem
    \begin{equation}
    \operatorname*{argmin}_{f \in H(k)} \sum_{i=1}^{N}L(y_{i},f(x_{i})) + \Vert f \Vert_{H(k)}
    \end{equation}
    can be represented as
    \begin{equation}
    f(\cdot) = \sum_{i=1}^{N}\alpha_{i}k(x_{i},\cdot)
    \end{equation}


    Problem statement

    - Show that it is a sufficient requirement for the coefficients $\alpha_{i}$ of the kernel expansion to vanish, if for the corresponding loss
    functions $L(y_{i},f(x_{i}))$ both the lhs and the rhs derivative with respect to $f(x_{i})$ vanish. Hint: use the proof strategy of the Representer theorem (see https://en.wikipedia.org/wiki/Representer_theorem).



    My sparse thoughts

    - I have been pondering on this for half a day, but don't seem to get anywhere. I dont exactly know how to make sense of the derivative of L. Since we want to minimize over $H(k)$ (a set of functions), I dont see how it is relevant to take the derivative (and thus minimize) w.r.t. $f(x_{i})$ (which is a real number). Furthermore, this derivative-approach seemingly has nothing to do with the proof strategy from the Representer theorem; in this proof, $H(k)$ is decomposed into the span of $k(x_{i})$ and its orthogonal complement. It is then shown that the component of $f \in H(k)$ that lies in the ortogonal complement vanish when evaluated in the $x_{i}$'s, and thus does not contribute to the Loss-function term of the minimization problem.

    Ps. Im hoping to be able to use this result to show that the solutions to the SVM optimization problem only depends on the 'support vectors'.

    Any help would be greatly appreciated.
     

    Attached Files:

  2. jcsd
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Can you offer guidance or do you also need help?
Draft saved Draft deleted



Similar Discussions: Sparsity of Support vector machines over an RKHS
Loading...