- #1

TackyBoar

- 1

- 0

*Iterative methods for optimization*by C. Kelley (PDF) and I'm struggling to understand proof of

Lemma 6.2.1.Let \(\displaystyle S\) be a simplex. Let \(\displaystyle \nabla f \) be Lipschitz continuous in a neighborhood of \(\displaystyle S\) with Lipschitz constant \(\displaystyle 2K_f\). Then there is \(\displaystyle K>0\), depending only on \(\displaystyle K_f\) such that

\(\displaystyle \left\lVert \nabla f(x_1) - D(f : S) \right\rVert \leq K \kappa(V) \sigma_+(S).\)

Notes on notation: \(\displaystyle S \) is a simplex with vertices \(\displaystyle x_1 \) to \(\displaystyle x_{N+1} \) (order matters), some edges \(\displaystyle v_j = x_j - x_1 \) that make matrix \(\displaystyle V = (v_1, \dots, v_n) \) and \(\displaystyle \sigma_+(S) = \max_j \lVert x_1 - x_j \rVert_2 \). Simplex "gradient" is \(\displaystyle D(f : S) = V^{-T} (f(x_2) - f(x_1), \dots, f(x_{N+1}) - f(x_1)). \)

Proof starts with the following statement:

Our smoothness assumptions on \(\displaystyle f\) and Taylor’s theorem imply that for all \(\displaystyle 2 \leq j \leq N+1 \)

\(\displaystyle \left|f\left(x_{1}\right)-f\left(x_{j}\right)+v_{j}^{T} \nabla f\left(x_{1}\right)\right| \leq K_{f}\left\|v_{j}\right\|^{2}\)

**Why is it true?**

Using Taylor expansion about \(\displaystyle x_1 \) gives

\(\displaystyle f(x_j) = f(x_1) + \nabla f(x_1)(x_j - x_1)^T + R(x_j - x_1) = f(x_1) + \nabla f(x_1)v_j^T + R(v_j) \)

and using Mean Value Theorem on \(\displaystyle R(v_i) \) leads to

\(\displaystyle

|R(v_j)|

= |R(v_j)| - |R(0)|

\leq \lVert v_j \rVert \cdot \sup_{\theta \in [0, 1]} \lVert \nabla R(\theta v_j) \rVert

\leq \lVert v_j \rVert \cdot K_f \lVert v_j \rVert.

\)

It seems that the constant could be just \(\displaystyle 1 \cdot K_f \). Is there a reasoning error in what I wrote? Later in the text I spotted a similar Lemma with a different constant:

Lemma 6.2.5.Let \(\displaystyle S\) be a nonsingular simplex and let \(\displaystyle \nabla^2 f \) be Lipschitz continuous in a neighborhood of \(\displaystyle S \cup R(S) \) with Lipschitz constant \(\displaystyle 3K_C \). Then there is \(\displaystyle K>0 \) such that

\(\displaystyle \left\lVert \nabla f(x_1) - D_C(f : S) \right\rVert \leq K \kappa(V) \sigma_+(S)^2. \)

It uses similar concepts and I think the constant may mean something here.