- #1

eipiplusone

- 10

- 0

Im trying to solve the following problem from the book 'Learning with kernels', and would really appreciate a little help.

- Let $\{(x_{1},y_{1}),...,(x_{N},y_{N})\}$ be a dataset, L a Loss function and $H(k)$ a reproducing kernel Hilbert space with kernel $k$. The representer theorem states that the minimizer $f \in H(k)$ of the following optimization problem

\begin{equation}

\operatorname*{argmin}_{f \in H(k)} \sum_{i=1}^{N}L(y_{i},f(x_{i})) + \Vert f \Vert_{H(k)}

\end{equation}

can be represented as

\begin{equation}

f(\cdot) = \sum_{i=1}^{N}\alpha_{i}k(x_{i},\cdot)

\end{equation}

- Show that it is a sufficient requirement for the coefficients $\alpha_{i}$ of the kernel expansion to vanish, if for the corresponding loss

functions $L(y_{i},f(x_{i}))$ both the lhs and the rhs derivative with respect to $f(x_{i})$ vanish. Hint: use the proof strategy of the Representer theorem (see https://en.wikipedia.org/wiki/Representer_theorem).

- I have been pondering on this for half a day, but don't seem to get anywhere. I don't exactly know how to make sense of the derivative of L. Since we want to minimize over $H(k)$ (a set of functions), I don't see how it is relevant to take the derivative (and thus minimize) w.r.t. $f(x_{i})$ (which is a real number). Furthermore, this derivative-approach seemingly has nothing to do with the proof strategy from the Representer theorem; in this proof, $H(k)$ is decomposed into the span of $k(x_{i})$ and its orthogonal complement. It is then shown that the component of $f \in H(k)$ that lies in the ortogonal complement vanish when evaluated in the $x_{i}$'s, and thus does not contribute to the Loss-function term of the minimization problem.

Ps. I am hoping to be able to use this result to show that the solutions to the SVM optimization problem only depends on the 'support vectors'.

Any help would be greatly appreciated.

**Background information**- Let $\{(x_{1},y_{1}),...,(x_{N},y_{N})\}$ be a dataset, L a Loss function and $H(k)$ a reproducing kernel Hilbert space with kernel $k$. The representer theorem states that the minimizer $f \in H(k)$ of the following optimization problem

\begin{equation}

\operatorname*{argmin}_{f \in H(k)} \sum_{i=1}^{N}L(y_{i},f(x_{i})) + \Vert f \Vert_{H(k)}

\end{equation}

can be represented as

\begin{equation}

f(\cdot) = \sum_{i=1}^{N}\alpha_{i}k(x_{i},\cdot)

\end{equation}

**Problem statement**- Show that it is a sufficient requirement for the coefficients $\alpha_{i}$ of the kernel expansion to vanish, if for the corresponding loss

functions $L(y_{i},f(x_{i}))$ both the lhs and the rhs derivative with respect to $f(x_{i})$ vanish. Hint: use the proof strategy of the Representer theorem (see https://en.wikipedia.org/wiki/Representer_theorem).

**My sparse thoughts**- I have been pondering on this for half a day, but don't seem to get anywhere. I don't exactly know how to make sense of the derivative of L. Since we want to minimize over $H(k)$ (a set of functions), I don't see how it is relevant to take the derivative (and thus minimize) w.r.t. $f(x_{i})$ (which is a real number). Furthermore, this derivative-approach seemingly has nothing to do with the proof strategy from the Representer theorem; in this proof, $H(k)$ is decomposed into the span of $k(x_{i})$ and its orthogonal complement. It is then shown that the component of $f \in H(k)$ that lies in the ortogonal complement vanish when evaluated in the $x_{i}$'s, and thus does not contribute to the Loss-function term of the minimization problem.

Ps. I am hoping to be able to use this result to show that the solutions to the SVM optimization problem only depends on the 'support vectors'.

Any help would be greatly appreciated.