Iterative Minimization lemma proof

rayge · Dec 14, 2014

Homework Statement

[itex]f(x)[/itex] is the function we want to minimize. Beyond being real-valued, there are no other conditions on it. (I'm surprised it's not at least continuous, but the book doesn't say that's a condition.) We choose the next [itex]x^k[/itex] through the relation [itex]x^k = x^{k-1} + \alpha_{k}d^k[/itex]. We assume [itex]d^k[/itex] is a descent direction. That is, for small positive [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing.

Here's the lemma we want to prove:
When [itex]x^k[/itex] is constructed using the optimal [itex]\alpha[/itex], we have [itex]\nabla f(x^k) \cdot d^k = 0[/itex]

The Attempt at a Solution

It's suggested in the book that we should differentiate the function [itex]f(x^{k-1} + \alpha d^k)[/itex] with respect to [itex]\alpha[/itex]. My problem is that I don't know how to differentiate a function that isn't defined. My first guess went something like this, but I don't see how I'm any closer to a solution.

[tex]\frac{\partial}{\partial \alpha} f(x^{k-1} + \alpha d^k) = \frac{\partial}{\partial \alpha} f(x^k)[/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = f'(x^k)(0)[/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = 0[/tex]

I'm really not looking for an answer, but if someone could point me to where I could learn about taking differentials of undefined functions that would be helpful. I'm guessing that somehow I can extract a gradient out of this, and a dot product, but I'm feeling pretty confused.

Stephen Tashi · Dec 14, 2014

rayge said:

Homework Statement

[itex]f(x)[/itex] is the function we want to minimize. Beyond being real-valued, there are no other conditions on it. (I'm surprised it's not at least continuous, but the book doesn't say that's a condition.)

If you're expected to differentiate [itex]f[/itex] it would have to be differentiable, hence continuous.

We choose the next [itex]x^k[/itex] through the relation [itex]x^k = x^{k-1} + \alpha_{k}d^k[/itex].

You didn't say what [itex]x^k[/itex] is. I assume it is a vector. For example, [itex]x^k = ( {x^k}_1, {x^k}_2 )[/itex]

We assume [itex]d^k[/itex] is a descent direction. That is, for small positive [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing.

Here's the lemma we want to prove:
When [itex]x^k[/itex] is constructed using the optimal [itex]\alpha[/itex], we have [itex]\nabla f(x^k) \cdot d^k = 0[/itex]

The Attempt at a Solution

It's suggested in the book that we should differentiate the function [itex]f(x^{k-1} + \alpha d^k)[/itex] with respect to [itex]\alpha[/itex]. My problem is that I don't know how to differentiate a function that isn't defined.

Visualize all the arguments of [itex]f[/itex] . For example [itex]f( {x^{k-1}}_1 + \alpha {d^k}_1, {x^{k-1}}_2 + \alpha {d^k}_2)[/itex]

Things might look more familiar In different notation. If we have a real valued function [itex]f(x,y)[/itex] of two real variables [itex]x,y[/itex] and [itex]X(\alpha), \ Y(\alpha)[/itex] are real valued functions of [itex]\alpha[/itex] then
[itex]\frac{df}{d\alpha} = \frac{\partial df}{dx}\frac{dX}{d\alpha} + \frac{\partial df}{dy} \frac{dY}{d\alpha}[/itex]
[itex]= \nabla f \cdot (\frac{dX}{d\alpha},\frac{dY}{d\alpha})[/itex]

Ray Vickson · Dec 14, 2014

rayge said:

Homework Statement

[itex]f(x)[/itex] is the function we want to minimize. Beyond being real-valued, there are no other conditions on it. (I'm surprised it's not at least continuous, but the book doesn't say that's a condition.) We choose the next [itex]x^k[/itex] through the relation [itex]x^k = x^{k-1} + \alpha_{k}d^k[/itex]. We assume [itex]d^k[/itex] is a descent direction. That is, for small positive [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing.

Here's the lemma we want to prove:
When [itex]x^k[/itex] is constructed using the optimal [itex]\alpha[/itex], we have [itex]\nabla f(x^k) \cdot d^k = 0[/itex]

The Attempt at a Solution

It's suggested in the book that we should differentiate the function [itex]f(x^{k-1} + \alpha d^k)[/itex] with respect to [itex]\alpha[/itex]. My problem is that I don't know how to differentiate a function that isn't defined. My first guess went something like this, but I don't see how I'm any closer to a solution.

[tex]\frac{\partial}{\partial \alpha} f(x^{k-1} + \alpha d^k) = \frac{\partial}{\partial \alpha} f(x^k)[/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = f'(x^k)(0)[/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = 0[/tex]

I'm really not looking for an answer, but if someone could point me to where I could learn about taking differentials of undefined functions that would be helpful. I'm guessing that somehow I can extract a gradient out of this, and a dot product, but I'm feeling pretty confused.

Taking derivatives does not make sense unless ##f## is differentiable, so automatically continuous---in fact, even smoother than continuous. Maybe the book does not say that, but it is part of the reader's assumed background. Furthermore (just to be picky), expressions like ##\nabla f(x^k) \cdot d^k## depend on even more smoothness: ##f## must be continuously differentiable (not just plain differentiable). If the book does not make this clear, that is a flaw; it should definitely cover these points somewhere.

BTW: the stated result is false if ##f## is not continuously differentiable; there are some easy counterexamples.

rayge · Dec 14, 2014

Actually it is made clear in the beginning on the chapter that we're dealing with quadratic functions of the form [itex]f(x) = \frac{1}{2}\|Ax - b\|^2_2[/itex]. My fault for not reading closely enough.

Ray Vickson · Dec 14, 2014

rayge said:

Actually it is made clear in the beginning on the chapter that we're dealing with quadratic functions of the form [itex]f(x) = \frac{1}{2}\|Ax - b\|^2_2[/itex]. My fault for not reading closely enough.

OK, but the result is actually true for any continuously-differentiable ##f##.

rayge · Dec 14, 2014

Good to know, thanks. I applied the chain rule as suggested, and got:
[tex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = \nabla f \cdot d^k[/tex]
From here, it seems reasonable that [itex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = 0[/itex] is our optimal value but I don't know how to prove that from our assumptions. For small [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing. For large enough [itex]\alpha[/itex], I think [itex]f(x^{k-1} + \alpha d^k)[/itex] is increasing (i.e. we pass by the optimal point). I just don't have a solid grasp on how the function works, and assuming it's increasing for large [itex]\alpha[/itex], why that is the case.

Thanks again.

Ray Vickson · Dec 14, 2014

rayge said:

Good to know, thanks. I applied the chain rule as suggested, and got:
[tex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = \nabla f \cdot d^k[/tex]
From here, it seems reasonable that [itex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = 0[/itex] is our optimal value but I don't know how to prove that from our assumptions. For small [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing. For large enough [itex]\alpha[/itex], I think [itex]f(x^{k-1} + \alpha d^k)[/itex] is increasing (i.e. we pass by the optimal point). I just don't have a solid grasp on how the function works, and assuming it's increasing for large [itex]\alpha[/itex], why that is the case.

Thanks again.

You are searching for a minimum along a line, so you are trying to minimize a univariate function ##\phi(\alpha)##, which just happens to be of the form
[tex]\phi(\alpha) = f(x^{k-1} + \alpha d^k)[/tex]
for some vector constants ##x^{k-1}, d^k##. How do you minimize functions of one variable?

Ray Vickson · Dec 15, 2014

rayge said:

Good to know, thanks. I applied the chain rule as suggested, and got:
[tex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = \nabla f \cdot d^k[/tex]
From here, it seems reasonable that [itex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = 0[/itex] is our optimal value but I don't know how to prove that from our assumptions. For small [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing. For large enough [itex]\alpha[/itex], I think [itex]f(x^{k-1} + \alpha d^k)[/itex] is increasing (i.e. we pass by the optimal point). I just don't have a solid grasp on how the function works, and assuming it's increasing for large [itex]\alpha[/itex], why that is the case.

Thanks again.

I see that my previous response did not deal with all of your question. I cannot supply too many details without violating PF helping policies, but I can give a hint. In your specific case you have ##f(\vec{x}) = || A \vec{x} - \vec{b}||^2##. So letting ##\vec{y} = A \vec{x}^k - \vec{b}## and ##\vec{p} = A \vec{d}^k##, we have that ##\phi(\alpha) = f(\vec{x}^k + \vec{d}^k \alpha)## has the form ##\phi(\alpha) = ||\vec{y} + \alpha \vec{p}||^2 = a + b \alpha + c \alpha^2##. How do you find ##a, b, c##? Given that ##\vec{d}^k## is a descent direction, what does this say about the values or signs of ##a,b,c##? What do these signs say about the behavior of ##\phi(\alpha)##?

Iterative Minimization lemma proof

Homework Statement

The Attempt at a Solution

Homework Statement

The Attempt at a Solution

Homework Statement

The Attempt at a Solution

Hi! Can someone explain about Differential Equations?

Deriving spatial derivatives

Is this the correct general solution of the given PDE?

Investigating the real roots of a cubic equation

What does "compute Aut(G)" mean?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Iterative Minimization lemma proof

Homework Statement

The Attempt at a Solution

Homework Statement

The Attempt at a Solution

Homework Statement

The Attempt at a Solution

Similar threads