Iterative Minimization lemma proof

In summary, we are attempting to minimize a real-valued function f(x) with no other conditions on it. We use the relation x^k = x^{k-1} + \alpha_{k}d^k to choose the next x^k and assume d^k is a descent direction. We also assume that for small positive \alpha, f(x^{k-1} + \alpha d^k) is decreasing. The lemma we want to prove is that when x^k is constructed using the optimal \alpha, we have \nabla f(x^k) \cdot d^k = 0. To prove this, we differentiate the function f(x^{k-1} + \alpha d^
  • #1
25
0

Homework Statement


[itex]f(x)[/itex] is the function we want to minimize. Beyond being real-valued, there are no other conditions on it. (I'm surprised it's not at least continuous, but the book doesn't say that's a condition.) We choose the next [itex]x^k[/itex] through the relation [itex]x^k = x^{k-1} + \alpha_{k}d^k[/itex]. We assume [itex]d^k[/itex] is a descent direction. That is, for small positive [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing.

Here's the lemma we want to prove:
When [itex]x^k[/itex] is constructed using the optimal [itex]\alpha[/itex], we have [itex]\nabla f(x^k) \cdot d^k = 0[/itex]

The Attempt at a Solution


It's suggested in the book that we should differentiate the function [itex]f(x^{k-1} + \alpha d^k)[/itex] with respect to [itex]\alpha[/itex]. My problem is that I don't know how to differentiate a function that isn't defined. My first guess went something like this, but I don't see how I'm any closer to a solution.

[tex]\frac{\partial}{\partial \alpha} f(x^{k-1} + \alpha d^k) = \frac{\partial}{\partial \alpha} f(x^k) [/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = f'(x^k)(0)[/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = 0[/tex]

I'm really not looking for an answer, but if someone could point me to where I could learn about taking differentials of undefined functions that would be helpful. I'm guessing that somehow I can extract a gradient out of this, and a dot product, but I'm feeling pretty confused.
 
Physics news on Phys.org
  • #2
rayge said:

Homework Statement


[itex]f(x)[/itex] is the function we want to minimize. Beyond being real-valued, there are no other conditions on it. (I'm surprised it's not at least continuous, but the book doesn't say that's a condition.)
If you're expected to differentiate [itex] f [/itex] it would have to be differentiable, hence continuous.

We choose the next [itex]x^k[/itex] through the relation [itex]x^k = x^{k-1} + \alpha_{k}d^k[/itex].

You didn't say what [itex] x^k [/itex] is. I assume it is a vector. For example, [itex] x^k = ( {x^k}_1, {x^k}_2 ) [/itex]

We assume [itex]d^k[/itex] is a descent direction. That is, for small positive [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing.

Here's the lemma we want to prove:
When [itex]x^k[/itex] is constructed using the optimal [itex]\alpha[/itex], we have [itex]\nabla f(x^k) \cdot d^k = 0[/itex]

The Attempt at a Solution


It's suggested in the book that we should differentiate the function [itex]f(x^{k-1} + \alpha d^k)[/itex] with respect to [itex]\alpha[/itex]. My problem is that I don't know how to differentiate a function that isn't defined.

Visualize all the arguments of [itex] f [/itex] . For example [itex] f( {x^{k-1}}_1 + \alpha {d^k}_1, {x^{k-1}}_2 + \alpha {d^k}_2) [/itex]

Things might look more familiar In different notation. If we have a real valued function [itex] f(x,y) [/itex] of two real variables [itex] x,y [/itex] and [itex] X(\alpha), \ Y(\alpha) [/itex] are real valued functions of [itex] \alpha [/itex] then
[itex] \frac{df}{d\alpha} = \frac{\partial df}{dx}\frac{dX}{d\alpha} + \frac{\partial df}{dy} \frac{dY}{d\alpha} [/itex]
[itex] = \nabla f \cdot (\frac{dX}{d\alpha},\frac{dY}{d\alpha}) [/itex]
 
  • #3
rayge said:

Homework Statement


[itex]f(x)[/itex] is the function we want to minimize. Beyond being real-valued, there are no other conditions on it. (I'm surprised it's not at least continuous, but the book doesn't say that's a condition.) We choose the next [itex]x^k[/itex] through the relation [itex]x^k = x^{k-1} + \alpha_{k}d^k[/itex]. We assume [itex]d^k[/itex] is a descent direction. That is, for small positive [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing.

Here's the lemma we want to prove:
When [itex]x^k[/itex] is constructed using the optimal [itex]\alpha[/itex], we have [itex]\nabla f(x^k) \cdot d^k = 0[/itex]

The Attempt at a Solution


It's suggested in the book that we should differentiate the function [itex]f(x^{k-1} + \alpha d^k)[/itex] with respect to [itex]\alpha[/itex]. My problem is that I don't know how to differentiate a function that isn't defined. My first guess went something like this, but I don't see how I'm any closer to a solution.

[tex]\frac{\partial}{\partial \alpha} f(x^{k-1} + \alpha d^k) = \frac{\partial}{\partial \alpha} f(x^k) [/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = f'(x^k)(0)[/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = 0[/tex]

I'm really not looking for an answer, but if someone could point me to where I could learn about taking differentials of undefined functions that would be helpful. I'm guessing that somehow I can extract a gradient out of this, and a dot product, but I'm feeling pretty confused.

Taking derivatives does not make sense unless ##f## is differentiable, so automatically continuous---in fact, even smoother than continuous. Maybe the book does not say that, but it is part of the reader's assumed background. Furthermore (just to be picky), expressions like ##\nabla f(x^k) \cdot d^k## depend on even more smoothness: ##f## must be continuously differentiable (not just plain differentiable). If the book does not make this clear, that is a flaw; it should definitely cover these points somewhere.

BTW: the stated result is false if ##f## is not continuously differentiable; there are some easy counterexamples.
 
Last edited:
  • #4
Actually it is made clear in the beginning on the chapter that we're dealing with quadratic functions of the form [itex]f(x) = \frac{1}{2}\|Ax - b\|^2_2[/itex]. My fault for not reading closely enough.
 
  • #5
rayge said:
Actually it is made clear in the beginning on the chapter that we're dealing with quadratic functions of the form [itex]f(x) = \frac{1}{2}\|Ax - b\|^2_2[/itex]. My fault for not reading closely enough.

OK, but the result is actually true for any continuously-differentiable ##f##.
 
  • #6
Good to know, thanks. I applied the chain rule as suggested, and got:
[tex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = \nabla f \cdot d^k[/tex]
From here, it seems reasonable that [itex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = 0[/itex] is our optimal value but I don't know how to prove that from our assumptions. For small [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing. For large enough [itex]\alpha[/itex], I think [itex]f(x^{k-1} + \alpha d^k)[/itex] is increasing (i.e. we pass by the optimal point). I just don't have a solid grasp on how the function works, and assuming it's increasing for large [itex]\alpha[/itex], why that is the case.

Thanks again.
 
  • #7
rayge said:
Good to know, thanks. I applied the chain rule as suggested, and got:
[tex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = \nabla f \cdot d^k[/tex]
From here, it seems reasonable that [itex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = 0[/itex] is our optimal value but I don't know how to prove that from our assumptions. For small [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing. For large enough [itex]\alpha[/itex], I think [itex]f(x^{k-1} + \alpha d^k)[/itex] is increasing (i.e. we pass by the optimal point). I just don't have a solid grasp on how the function works, and assuming it's increasing for large [itex]\alpha[/itex], why that is the case.

Thanks again.

You are searching for a minimum along a line, so you are trying to minimize a univariate function ##\phi(\alpha)##, which just happens to be of the form
[tex] \phi(\alpha) = f(x^{k-1} + \alpha d^k) [/tex]
for some vector constants ##x^{k-1}, d^k##. How do you minimize functions of one variable?
 
  • #8
rayge said:
Good to know, thanks. I applied the chain rule as suggested, and got:
[tex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = \nabla f \cdot d^k[/tex]
From here, it seems reasonable that [itex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = 0[/itex] is our optimal value but I don't know how to prove that from our assumptions. For small [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing. For large enough [itex]\alpha[/itex], I think [itex]f(x^{k-1} + \alpha d^k)[/itex] is increasing (i.e. we pass by the optimal point). I just don't have a solid grasp on how the function works, and assuming it's increasing for large [itex]\alpha[/itex], why that is the case.

Thanks again.
I see that my previous response did not deal with all of your question. I cannot supply too many details without violating PF helping policies, but I can give a hint. In your specific case you have ##f(\vec{x}) = || A \vec{x} - \vec{b}||^2##. So letting ##\vec{y} = A \vec{x}^k - \vec{b}## and ##\vec{p} = A \vec{d}^k##, we have that ##\phi(\alpha) = f(\vec{x}^k + \vec{d}^k \alpha)## has the form ##\phi(\alpha) = ||\vec{y} + \alpha \vec{p}||^2 = a + b \alpha + c \alpha^2##. How do you find ##a, b, c##? Given that ##\vec{d}^k## is a descent direction, what does this say about the values or signs of ##a,b,c##? What do these signs say about the behavior of ##\phi(\alpha)##?
 

Suggested for: Iterative Minimization lemma proof

Replies
4
Views
492
Replies
3
Views
198
Replies
5
Views
1K
Replies
12
Views
1K
Replies
20
Views
1K
Replies
3
Views
485
Replies
2
Views
574
Back
Top