# Sparsity of Support vector machines over an RKHS

Tags:
1. Jan 18, 2017

### eipiplusone

Im trying to solve the following problem from the book 'Learning with kernels', and would really appreciate a little help.

Background information

- Let $\{(x_{1},y_{1}),...,(x_{N},y_{N})\}$ be a dataset, L a Loss function and $H(k)$ a reproducing kernel Hilbert space with kernel $k$. The representer theorem states that the minimizer $f \in H(k)$ of the following optimization problem

\operatorname*{argmin}_{f \in H(k)} \sum_{i=1}^{N}L(y_{i},f(x_{i})) + \Vert f \Vert_{H(k)}

can be represented as

f(\cdot) = \sum_{i=1}^{N}\alpha_{i}k(x_{i},\cdot)

Problem statement

- Show that it is a sufficient requirement for the coefficients $\alpha_{i}$ of the kernel expansion to vanish, if for the corresponding loss
functions $L(y_{i},f(x_{i}))$ both the lhs and the rhs derivative with respect to $f(x_{i})$ vanish. Hint: use the proof strategy of the Representer theorem (see https://en.wikipedia.org/wiki/Representer_theorem).

My sparse thoughts

- I have been pondering on this for half a day, but don't seem to get anywhere. I dont exactly know how to make sense of the derivative of L. Since we want to minimize over $H(k)$ (a set of functions), I dont see how it is relevant to take the derivative (and thus minimize) w.r.t. $f(x_{i})$ (which is a real number). Furthermore, this derivative-approach seemingly has nothing to do with the proof strategy from the Representer theorem; in this proof, $H(k)$ is decomposed into the span of $k(x_{i})$ and its orthogonal complement. It is then shown that the component of $f \in H(k)$ that lies in the ortogonal complement vanish when evaluated in the $x_{i}$'s, and thus does not contribute to the Loss-function term of the minimization problem.

Ps. Im hoping to be able to use this result to show that the solutions to the SVM optimization problem only depends on the 'support vectors'.

Any help would be greatly appreciated.