# Iterative Minimization lemma proof

1. Dec 14, 2014

### rayge

1. The problem statement, all variables and given/known data
$f(x)$ is the function we want to minimize. Beyond being real-valued, there are no other conditions on it. (I'm surprised it's not at least continuous, but the book doesn't say that's a condition.) We choose the next $x^k$ through the relation $x^k = x^{k-1} + \alpha_{k}d^k$. We assume $d^k$ is a descent direction. That is, for small positive $\alpha$, $f(x^{k-1} + \alpha d^k)$ is decreasing.

Here's the lemma we want to prove:
When $x^k$ is constructed using the optimal $\alpha$, we have $\nabla f(x^k) \cdot d^k = 0$

3. The attempt at a solution
It's suggested in the book that we should differentiate the function $f(x^{k-1} + \alpha d^k)$ with respect to $\alpha$. My problem is that I don't know how to differentiate a function that isn't defined. My first guess went something like this, but I don't see how I'm any closer to a solution.

$$\frac{\partial}{\partial \alpha} f(x^{k-1} + \alpha d^k) = \frac{\partial}{\partial \alpha} f(x^k)$$
$$f'(x^{k-1} + \alpha d^k)d^k = f'(x^k)(0)$$
$$f'(x^{k-1} + \alpha d^k)d^k = 0$$

I'm really not looking for an answer, but if someone could point me to where I could learn about taking differentials of undefined functions that would be helpful. I'm guessing that somehow I can extract a gradient out of this, and a dot product, but I'm feeling pretty confused.

2. Dec 14, 2014

### Stephen Tashi

If you're expected to differentiate $f$ it would have to be differentiable, hence continuous.

You didn't say what $x^k$ is. I assume it is a vector. For example, $x^k = ( {x^k}_1, {x^k}_2 )$

Visualize all the arguments of $f$ . For example $f( {x^{k-1}}_1 + \alpha {d^k}_1, {x^{k-1}}_2 + \alpha {d^k}_2)$

Things might look more familiar In different notation. If we have a real valued function $f(x,y)$ of two real variables $x,y$ and $X(\alpha), \ Y(\alpha)$ are real valued functions of $\alpha$ then
$\frac{df}{d\alpha} = \frac{\partial df}{dx}\frac{dX}{d\alpha} + \frac{\partial df}{dy} \frac{dY}{d\alpha}$
$= \nabla f \cdot (\frac{dX}{d\alpha},\frac{dY}{d\alpha})$

3. Dec 14, 2014

### Ray Vickson

Taking derivatives does not make sense unless $f$ is differentiable, so automatically continuous---in fact, even smoother than continuous. Maybe the book does not say that, but it is part of the reader's assumed background. Furthermore (just to be picky), expressions like $\nabla f(x^k) \cdot d^k$ depend on even more smoothness: $f$ must be continuously differentiable (not just plain differentiable). If the book does not make this clear, that is a flaw; it should definitely cover these points somewhere.

BTW: the stated result is false if $f$ is not continuously differentiable; there are some easy counterexamples.

Last edited: Dec 14, 2014
4. Dec 14, 2014

### rayge

Actually it is made clear in the beginning on the chapter that we're dealing with quadratic functions of the form $f(x) = \frac{1}{2}\|Ax - b\|^2_2$. My fault for not reading closely enough.

5. Dec 14, 2014

### Ray Vickson

OK, but the result is actually true for any continuously-differentiable $f$.

6. Dec 14, 2014

### rayge

Good to know, thanks. I applied the chain rule as suggested, and got:
$$\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = \nabla f \cdot d^k$$
From here, it seems reasonable that $\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = 0$ is our optimal value but I don't know how to prove that from our assumptions. For small $\alpha$, $f(x^{k-1} + \alpha d^k)$ is decreasing. For large enough $\alpha$, I think $f(x^{k-1} + \alpha d^k)$ is increasing (i.e. we pass by the optimal point). I just don't have a solid grasp on how the function works, and assuming it's increasing for large $\alpha$, why that is the case.

Thanks again.

7. Dec 14, 2014

### Ray Vickson

You are searching for a minimum along a line, so you are trying to minimize a univariate function $\phi(\alpha)$, which just happens to be of the form
$$\phi(\alpha) = f(x^{k-1} + \alpha d^k)$$
for some vector constants $x^{k-1}, d^k$. How do you minimize functions of one variable?

8. Dec 15, 2014

### Ray Vickson

I see that my previous response did not deal with all of your question. I cannot supply too many details without violating PF helping policies, but I can give a hint. In your specific case you have $f(\vec{x}) = || A \vec{x} - \vec{b}||^2$. So letting $\vec{y} = A \vec{x}^k - \vec{b}$ and $\vec{p} = A \vec{d}^k$, we have that $\phi(\alpha) = f(\vec{x}^k + \vec{d}^k \alpha)$ has the form $\phi(\alpha) = ||\vec{y} + \alpha \vec{p}||^2 = a + b \alpha + c \alpha^2$. How do you find $a, b, c$? Given that $\vec{d}^k$ is a descent direction, what does this say about the values or signs of $a,b,c$? What do these signs say about the behavior of $\phi(\alpha)$?