Iterative Minimization lemma proof

In summary, we are attempting to minimize a real-valued function f(x) with no other conditions on it. We use the relation x^k = x^{k-1} + \alpha_{k}d^k to choose the next x^k and assume d^k is a descent direction. We also assume that for small positive \alpha, f(x^{k-1} + \alpha d^k) is decreasing. The lemma we want to prove is that when x^k is constructed using the optimal \alpha, we have \nabla f(x^k) \cdot d^k = 0. To prove this, we differentiate the function f(x^{k-1} + \alpha d^
  • #1
rayge
25
0

Homework Statement


[itex]f(x)[/itex] is the function we want to minimize. Beyond being real-valued, there are no other conditions on it. (I'm surprised it's not at least continuous, but the book doesn't say that's a condition.) We choose the next [itex]x^k[/itex] through the relation [itex]x^k = x^{k-1} + \alpha_{k}d^k[/itex]. We assume [itex]d^k[/itex] is a descent direction. That is, for small positive [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing.

Here's the lemma we want to prove:
When [itex]x^k[/itex] is constructed using the optimal [itex]\alpha[/itex], we have [itex]\nabla f(x^k) \cdot d^k = 0[/itex]

The Attempt at a Solution


It's suggested in the book that we should differentiate the function [itex]f(x^{k-1} + \alpha d^k)[/itex] with respect to [itex]\alpha[/itex]. My problem is that I don't know how to differentiate a function that isn't defined. My first guess went something like this, but I don't see how I'm any closer to a solution.

[tex]\frac{\partial}{\partial \alpha} f(x^{k-1} + \alpha d^k) = \frac{\partial}{\partial \alpha} f(x^k) [/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = f'(x^k)(0)[/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = 0[/tex]

I'm really not looking for an answer, but if someone could point me to where I could learn about taking differentials of undefined functions that would be helpful. I'm guessing that somehow I can extract a gradient out of this, and a dot product, but I'm feeling pretty confused.
 
Physics news on Phys.org
  • #2
rayge said:

Homework Statement


[itex]f(x)[/itex] is the function we want to minimize. Beyond being real-valued, there are no other conditions on it. (I'm surprised it's not at least continuous, but the book doesn't say that's a condition.)
If you're expected to differentiate [itex] f [/itex] it would have to be differentiable, hence continuous.

We choose the next [itex]x^k[/itex] through the relation [itex]x^k = x^{k-1} + \alpha_{k}d^k[/itex].

You didn't say what [itex] x^k [/itex] is. I assume it is a vector. For example, [itex] x^k = ( {x^k}_1, {x^k}_2 ) [/itex]

We assume [itex]d^k[/itex] is a descent direction. That is, for small positive [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing.

Here's the lemma we want to prove:
When [itex]x^k[/itex] is constructed using the optimal [itex]\alpha[/itex], we have [itex]\nabla f(x^k) \cdot d^k = 0[/itex]

The Attempt at a Solution


It's suggested in the book that we should differentiate the function [itex]f(x^{k-1} + \alpha d^k)[/itex] with respect to [itex]\alpha[/itex]. My problem is that I don't know how to differentiate a function that isn't defined.

Visualize all the arguments of [itex] f [/itex] . For example [itex] f( {x^{k-1}}_1 + \alpha {d^k}_1, {x^{k-1}}_2 + \alpha {d^k}_2) [/itex]

Things might look more familiar In different notation. If we have a real valued function [itex] f(x,y) [/itex] of two real variables [itex] x,y [/itex] and [itex] X(\alpha), \ Y(\alpha) [/itex] are real valued functions of [itex] \alpha [/itex] then
[itex] \frac{df}{d\alpha} = \frac{\partial df}{dx}\frac{dX}{d\alpha} + \frac{\partial df}{dy} \frac{dY}{d\alpha} [/itex]
[itex] = \nabla f \cdot (\frac{dX}{d\alpha},\frac{dY}{d\alpha}) [/itex]
 
  • #3
rayge said:

Homework Statement


[itex]f(x)[/itex] is the function we want to minimize. Beyond being real-valued, there are no other conditions on it. (I'm surprised it's not at least continuous, but the book doesn't say that's a condition.) We choose the next [itex]x^k[/itex] through the relation [itex]x^k = x^{k-1} + \alpha_{k}d^k[/itex]. We assume [itex]d^k[/itex] is a descent direction. That is, for small positive [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing.

Here's the lemma we want to prove:
When [itex]x^k[/itex] is constructed using the optimal [itex]\alpha[/itex], we have [itex]\nabla f(x^k) \cdot d^k = 0[/itex]

The Attempt at a Solution


It's suggested in the book that we should differentiate the function [itex]f(x^{k-1} + \alpha d^k)[/itex] with respect to [itex]\alpha[/itex]. My problem is that I don't know how to differentiate a function that isn't defined. My first guess went something like this, but I don't see how I'm any closer to a solution.

[tex]\frac{\partial}{\partial \alpha} f(x^{k-1} + \alpha d^k) = \frac{\partial}{\partial \alpha} f(x^k) [/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = f'(x^k)(0)[/tex]
[tex]f'(x^{k-1} + \alpha d^k)d^k = 0[/tex]

I'm really not looking for an answer, but if someone could point me to where I could learn about taking differentials of undefined functions that would be helpful. I'm guessing that somehow I can extract a gradient out of this, and a dot product, but I'm feeling pretty confused.

Taking derivatives does not make sense unless ##f## is differentiable, so automatically continuous---in fact, even smoother than continuous. Maybe the book does not say that, but it is part of the reader's assumed background. Furthermore (just to be picky), expressions like ##\nabla f(x^k) \cdot d^k## depend on even more smoothness: ##f## must be continuously differentiable (not just plain differentiable). If the book does not make this clear, that is a flaw; it should definitely cover these points somewhere.

BTW: the stated result is false if ##f## is not continuously differentiable; there are some easy counterexamples.
 
Last edited:
  • #4
Actually it is made clear in the beginning on the chapter that we're dealing with quadratic functions of the form [itex]f(x) = \frac{1}{2}\|Ax - b\|^2_2[/itex]. My fault for not reading closely enough.
 
  • #5
rayge said:
Actually it is made clear in the beginning on the chapter that we're dealing with quadratic functions of the form [itex]f(x) = \frac{1}{2}\|Ax - b\|^2_2[/itex]. My fault for not reading closely enough.

OK, but the result is actually true for any continuously-differentiable ##f##.
 
  • #6
Good to know, thanks. I applied the chain rule as suggested, and got:
[tex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = \nabla f \cdot d^k[/tex]
From here, it seems reasonable that [itex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = 0[/itex] is our optimal value but I don't know how to prove that from our assumptions. For small [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing. For large enough [itex]\alpha[/itex], I think [itex]f(x^{k-1} + \alpha d^k)[/itex] is increasing (i.e. we pass by the optimal point). I just don't have a solid grasp on how the function works, and assuming it's increasing for large [itex]\alpha[/itex], why that is the case.

Thanks again.
 
  • #7
rayge said:
Good to know, thanks. I applied the chain rule as suggested, and got:
[tex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = \nabla f \cdot d^k[/tex]
From here, it seems reasonable that [itex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = 0[/itex] is our optimal value but I don't know how to prove that from our assumptions. For small [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing. For large enough [itex]\alpha[/itex], I think [itex]f(x^{k-1} + \alpha d^k)[/itex] is increasing (i.e. we pass by the optimal point). I just don't have a solid grasp on how the function works, and assuming it's increasing for large [itex]\alpha[/itex], why that is the case.

Thanks again.

You are searching for a minimum along a line, so you are trying to minimize a univariate function ##\phi(\alpha)##, which just happens to be of the form
[tex] \phi(\alpha) = f(x^{k-1} + \alpha d^k) [/tex]
for some vector constants ##x^{k-1}, d^k##. How do you minimize functions of one variable?
 
  • #8
rayge said:
Good to know, thanks. I applied the chain rule as suggested, and got:
[tex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = \nabla f \cdot d^k[/tex]
From here, it seems reasonable that [itex]\frac{d f(x^{k-1} + \alpha d^k)}{d\alpha} = 0[/itex] is our optimal value but I don't know how to prove that from our assumptions. For small [itex]\alpha[/itex], [itex]f(x^{k-1} + \alpha d^k)[/itex] is decreasing. For large enough [itex]\alpha[/itex], I think [itex]f(x^{k-1} + \alpha d^k)[/itex] is increasing (i.e. we pass by the optimal point). I just don't have a solid grasp on how the function works, and assuming it's increasing for large [itex]\alpha[/itex], why that is the case.

Thanks again.
I see that my previous response did not deal with all of your question. I cannot supply too many details without violating PF helping policies, but I can give a hint. In your specific case you have ##f(\vec{x}) = || A \vec{x} - \vec{b}||^2##. So letting ##\vec{y} = A \vec{x}^k - \vec{b}## and ##\vec{p} = A \vec{d}^k##, we have that ##\phi(\alpha) = f(\vec{x}^k + \vec{d}^k \alpha)## has the form ##\phi(\alpha) = ||\vec{y} + \alpha \vec{p}||^2 = a + b \alpha + c \alpha^2##. How do you find ##a, b, c##? Given that ##\vec{d}^k## is a descent direction, what does this say about the values or signs of ##a,b,c##? What do these signs say about the behavior of ##\phi(\alpha)##?
 

1. What is the Iterative Minimization lemma?

The Iterative Minimization lemma is a mathematical concept used in optimization problems. It states that a sequence of minimizers can be found by repeatedly minimizing a function over smaller and smaller sets.

2. How is the Iterative Minimization lemma used in scientific research?

The Iterative Minimization lemma is commonly used in various fields of science such as computer science, engineering, and economics to solve optimization problems. It provides a systematic approach to finding the optimal solution to a problem.

3. What is the proof of the Iterative Minimization lemma?

The proof of the Iterative Minimization lemma involves using mathematical induction to show that the sequence of minimizers generated by the iterative process converges to the optimal solution of the problem. It also relies on the continuity and compactness properties of the function being minimized.

4. What are the limitations of the Iterative Minimization lemma?

The Iterative Minimization lemma may not always provide the optimal solution to a problem, as it relies on certain assumptions about the function being minimized. It may also converge slowly or not converge at all in some cases.

5. Are there any real-world applications of the Iterative Minimization lemma?

Yes, the Iterative Minimization lemma has many practical applications in fields such as machine learning, signal processing, and operations research. It is used to optimize various processes and systems, such as in image reconstruction, data compression, and resource allocation.

Similar threads

  • Calculus and Beyond Homework Help
Replies
1
Views
447
  • Calculus and Beyond Homework Help
Replies
7
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
920
  • Calculus and Beyond Homework Help
Replies
1
Views
717
  • Calculus and Beyond Homework Help
Replies
11
Views
3K
  • Calculus and Beyond Homework Help
Replies
2
Views
500
Replies
1
Views
926
  • Calculus and Beyond Homework Help
Replies
3
Views
566
  • Calculus and Beyond Homework Help
Replies
8
Views
243
  • Calculus and Beyond Homework Help
Replies
10
Views
382
Back
Top