Show that grad f(x) = 0 when x is a local minimum of f

JulienB
Messages
408
Reaction score
12

Homework Statement



Hi everybody! I'm preparing a maths exam and I unsure how to answer this question:

What is a local extremum of a function ##f: \mathbb{R}^2 \to \mathbb{R}##? Explain why ##\nabla f(x) = 0## when ##x## is a local minimum of ##f## and ##f## is continuous and differentiable.

Homework Equations



In the chapter about local extrema, the script of my teacher says:

"##f: M \to \mathbb{R}## (##M \subseteq \mathbb{R}^n## is an open set) is the function for which the local extrema will be searched, ##g: M \to \mathbb{R}^m## describes the additional conditions, and ##m \in {0, 1, ... , n}## is the number of additional conditions.
A point ##x_0 \in M## with ##g(x_0) = 0## is called local minimum of ##f## under the additional condition ##g(x) = 0## if ##\exists \epsilon > 0##, so that ##\forall x \in M## with ##0 < || x - x_0 || < \epsilon## the statement ##f(x) \geq f(x_0)## is valid."

The Attempt at a Solution



I don't really get the point of the question about ##\nabla f(x) = 0##. I would say that since ##f'(x) = \nabla f(x)##, ##\nabla f(x)## describes the slope of the function at point ##x##. If the slope is ##0##, then there is a local extrema at point ##x##. So if ##f## has a local minimum (or local maximum) at point ##x##, ##\nabla f(x) = 0##. Would you say that is enough? It is always nicer to answer such questions with a mathematical description rather than with words... Do you guys have a suggestion?

The script of my teacher confuses me. I don't understand what are those additional conditions. Is ##g## equivalent to ##f'##? That would be weird, but that would be explain why ##g(x) = 0## indicates a local extremum for ##f##. If not, can someone briefly explain me what that is?

There is also something about matrices (apparently the Hesse-matrix especially). I tried to manipulate it but without knowing the function it didn't take me anywhere. Is it relevant for this question?

Thanks a lot in advance for your answers, and sorry I posted so many questions. I would really appreciate your help. :)Julien.
 
Last edited:
Physics news on Phys.org
JulienB said:
If the slope is 0, then there is a local extrema at point ##x##.
If there is a local extremum at ##x##, then ##\nabla f(x) = 0##. (So by solving ##\nabla f(x) = 0## you find the candidates for being a local extremum.) The converse is not true: If ##\nabla f(x) = 0## then ##x## could also be a saddle point (= an "inflection point" in case ##n = 1##.) One way to be sure is to look at the eigenvalues of the second derivative (the Hessian) of ##f## at ##x##, if it exists.

JulienB said:
So if ##f## has a local minimum (or local maximum) at point ##x##, ##\nabla f(x) = 0.##

Exactly.

JulienB said:
Would you say that is enough? It is always nicer to answer such questions with a mathematical description rather than with words... Do you guys have a suggestion?

It depends. If this is indeed a mathematics exam, I would expect him to want a rigorous proof. So, assume that ##x \in M## is a local extremum and prove that this implies that ##\nabla f(x) = 0##. As it stands, you have not proven that yet.

JulienB said:
The script of my teacher confuses me. I don't understand what are those additional conditions. Is ##g## equivalent to ##′f'##? That would be weird, but that would be explain why ##g(x) = 0## indicates a local extremum for ##f##. If not, can someone briefly explain me what that is?

Your teacher is likely talking about constrained optimization (with equality constraints). What you have been talking about so far is unconstrained optimization. In the latter case there are no side conditions, which corresponds to the case ##m = 0## in your teacher's definition.

If you want to learn how to do rigorous proofs, I would advise you to first try to tackle the unconstrained case.
 
  • Like
Likes JulienB
Hi @Krylov and first thank you for your answer. The part about constrained optimization has helped me a bit, especially to understand that this case is unconstrained. Here is what I wrote after reading your post:

##f:\mathbb{R}^2 \to \mathbb{R}##, ##x \in \mathbb{R}^2##
##x## local minimum of ##f##, ##f## continuous and differentiable ##\implies## ##f'(x) = ( \partial_1 f(x)\ \partial_2 f(x) ) = (0\ 0)##
##\nabla f(x) = ( \partial_1 f(x), \partial_2 f(x) ) = (0, 0) = \vec{0}##

Is that a more mathematically rigorous proof then?Thanks a lot.Julien.
 
And for the first question, I now wrote:

A point ##\vec{x} \in \mathbb{R}^2## is called local extremum of ##f: \mathbb{R}^2 \to \mathbb{R}##, if:
- ##f## is continuous and differentiable;
- ##\partial_1 f(\vec{x}) = \partial_2 f(\vec{x}) = 0##;
- ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##.

Would you say those conditions are necessary for ##\vec{x}## to be a local extremum of ##f##? I am still a little unsure about the ##\geq## in the 3rd condition. I think that with ##>##, ##\vec{x}## is then a strict local extremum, but the question did not mention "strict" so I replaced it with ##\geq##. Good/Bad idea? :)

And in the first condition, should ##f## be continuous and differentiable in ##\vec{x}## or everywhere?

Thanks a lot in advance for your answers.Julien.
 
JulienB said:
Hi @Krylov and first thank you for your answer. The part about constrained optimization has helped me a bit, especially to understand that this case is unconstrained.
:smile:

Let me also point out that I find your teacher's definition

JulienB said:
A point ##x_0 \in M## with ##g(x_0) = 0## is called local minimum of ##f## under the additional condition ##g(x) = 0## if ##\exists \epsilon > 0##, so that ##\forall x \in M## with ##0 < || x - x_0 || < \epsilon## the statement ##f(x) \geq f(x_0)## is valid."

only makes sense when ##m = 0## (i.e. there are no constraints). Otherwise, it does not make sense, because in the defining part of his definition there is no mentioning of ##g## anywhere.

JulienB said:
Here is what I wrote after reading your post:

##f:\mathbb{R}^2 \to \mathbb{R}##, ##x \in \mathbb{R}^2##
##x## local minimum of ##f##, ##f## continuous and differentiable ##\implies## ##f'(x) = ( \partial_1 f(x)\ \partial_2 f(x) ) = (0\ 0)##
##\nabla f(x) = ( \partial_1 f(x), \partial_2 f(x) ) = (0, 0) = \vec{0}##

Is that a more mathematically rigorous proof then?

I'm afraid I would have to say "no". Surely I am not the "gold standard", but the way I would write it is something like this:

Let ##f : M \subseteq \mathbb{R}^2 \to \mathbb{R}## be differentiable on the open set ##M##. (You could assume ##M = \mathbb{R}^2## for simplicity. This does not change much.) If ##f## has a local minimum at ##x_0 \in M##, then ##\nabla f(x_0) = 0##.

Proof: By definition, there exists ##\epsilon > 0## such that for all ##x \in M## with ##\|x - x_0\| < \epsilon## the statement ##f(x) \ge f(x_0)## holds. (This is just your teacher's definition.) We need to prove that ##\nabla f(x_0) = 0##. I will give you the start of the proof (which works equally well in spaces of higher dimension), so you can try to proceed.

Suppose that ##\nabla f(x_0) \neq 0##. Then there exists some ##h \in \mathbb{R}^2## such that ##\nabla f(x_0) \cdot h \neq 0##. By replacing ##h## by ##-h##, if needed, we may assume that ##\nabla f(x_0) \cdot h < 0##. Now, by the definition of the differentiability of ##f## at ##x_0##,
$$
f(x) = f(x_0) + \nabla f(x_0) \cdot(x - x_0) + R(\|x - x_0\|)
$$
for all ##x## sufficiently close to ##x_0##. (Here the remainder term ##R(\|x - x_0\|) = o(\|x - x_0\|)## as ##x \to x_0##.) So, in particular, for ##x = x_0 + t h## (with ##t > 0## small enough such that ##\|x - x_0\| < \epsilon##), we have
$$
0 \le f(x_0 + th) - f(x_0) = t \left[\nabla f(x_0) \cdot h + \frac{1}{t}R(th)\right]
$$
Now, how do you obtain a contradiction?

It really depends on what you teacher is after. If he wants an entirely rigorous mathematical proof, I would give him something like the above. If he is fine with something more hand waving, then you could get away with less.
 
Last edited:
JulienB said:
And for the first question, I now wrote:

A point ##\vec{x} \in \mathbb{R}^2## is called local extremum of ##f: \mathbb{R}^2 \to \mathbb{R}##, if:
- ##f## is continuous and differentiable;
- ##\partial_1 f(\vec{x}) = \partial_2 f(\vec{x}) = 0##;
- ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##.

Would you say those conditions are necessary for ##\vec{x}## to be a local extremum of ##f##? I am still a little unsure about the ##\geq## in the 3rd condition. I think that with ##>##, ##\vec{x}## is then a strict local extremum, but the question did not mention "strict" so I replaced it with ##\geq##. Good/Bad idea? :)

And in the first condition, should ##f## be continuous and differentiable in ##\vec{x}## or everywhere?

Thanks a lot in advance for your answers.

Julien.

I would define "local extremum" following the definition of your teacher ("There exists ##\epsilon > 0## such that ##f(x) \ge f(x_0)## (min.) or ##f(x) \le f(x_0)## (max.) for all ##x \in M## with ##\|x - x_0\| < \epsilon##.")

If ##f## happens to be differentiable at ##x_0## there exists a necessary condition for an extremum (indeed, ##\nabla f(x_0) = 0##). If in addition ##\nabla f## is also differentiable close to ##x_0## and its derivative is continuous, then you can use the definiteness of the Hessian as a sufficient condition. When ##n = 2## this definiteness can be verified by inspecting the sign of the determinant of the Hessian, as you do when you write ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##. The inequality needs to be strict, though.
 
JulienB said:
And for the first question, I now wrote:

A point ##\vec{x} \in \mathbb{R}^2## is called local extremum of ##f: \mathbb{R}^2 \to \mathbb{R}##, if:
- ##f## is continuous and differentiable;
- ##\partial_1 f(\vec{x}) = \partial_2 f(\vec{x}) = 0##;
- ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##.

Would you say those conditions are necessary for ##\vec{x}## to be a local extremum of ##f##? I am still a little unsure about the ##\geq## in the 3rd condition. I think that with ##>##, ##\vec{x}## is then a strict local extremum, but the question did not mention "strict" so I replaced it with ##\geq##. Good/Bad idea? :)

And in the first condition, should ##f## be continuous and differentiable in ##\vec{x}## or everywhere?

Thanks a lot in advance for your answers.Julien.

There is a bit of a mis-match between 2nd-order necessary and sufficient conditions for a minimum. Assuming the function is ##C^2## in a neighbourhood of ##p_0 =(x_0,y_0)##, we first of all need ##f_x(p_0) = f_y(p_0) = 0## (where ##f_x, f_y## are the partial derivatives). That is a first-order necessary condition.

For second-order we have:

(1) Necessary (not sufficient) conditions for a minimum at ##p_0## are ##f_{xx}, f_{yy} \geq 0## and (as you say) ##f_{xx} f_{yy} \geq f_{xy}^2## (all evaluated at ##p_0##). Basically, the Hessian matrix must be positive semi-definite. (Note that besides the second "mixed" condition you gave, we also need the individual conditions on ##f_{xx}## and ##f_{yy}##, because without them we could have ##f_{xx} < 0, f_{yy} < 0## but ##f_{xx} f_{yy} \geq f_{xy}^2##, and that would correspond to a saddle point, not a minimum.)

To see that these are not sufficient, look at the function ##f(x,y) = (y-x^2)(y - 2x^2)##. Above the upper parabola ##U: y = 2 x^2## both factors of ##f## are > 0, so ##f > 0##. Below the lower parabola ##L: y = x^2## both factors are < 0, so ##f > 0## again. However, between the two parabolas (that is, where ##x^2 < y < 2 x^2##) the two factors have opposite signs, so ##f < 0##. Thus, the origin ##O = (0,0)## is neither a local max nor a local min. However, we have ##f_x =0, f_y=0## at ##O##, and the Hessian matrix is
$$H(x,y) \equiv \pmatrix{f_{xx} & f_{xy} \\f_{xy} & f_{yy}} = \pmatrix{24 x^2 - 6y & -6x \\ -6x & 2}, $$
so
$$H(O) = \pmatrix{0 & 0 \\ 0 & 2 } $$.
All the second-order necessary conditions hold at ##O##.(2) Sufficient (not necessary) conditions for a strict local minimum at ##p_0## are that ##f_{xx} > 0, f_{yy} > 0## and ##f_{xx} f_{yy} > f_{xy}^2##. This states that the Hessian matrix is positive definite.

To see that these are not necessary, look at ##f(x) = x^4 + y^4##. The origin ##O = (0,0)## is a strict global minimum of ##f##, but the sufficient conditions fail to hold at ##O##.

By the way: the function ##f## in (1) above was something I heard about in a seminar by Adi Ben-Israel many years ago. It has the intriguing property that if you go a short distance away from ##(0,0)## along ANY line, the function goes up strictly (that is, leads to points where ##f(p) > f(0,0)## for all nonzero ##p## a short distance along the line). Nevertheless, ##(0,0)## is NOT a local minimum.
 
Last edited:
  • Like
Likes JulienB
Mentor note: Helpers are reminded not to provide complete solutions or to do significant portions of the work towards such a solution. Instead, the questioner should be encouraged towards the solution via hints and suggestions or the pointing out of errors or misconceptions.
 
Thank you everybody for your answers, this thread has helped me a lot! I'm back to my books :)Julien.
 
  • Like
Likes S.G. Janssens

Similar threads

  • · Replies 3 ·
Replies
3
Views
968
  • · Replies 5 ·
Replies
5
Views
2K
Replies
30
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
26
Views
2K
Replies
16
Views
2K
  • · Replies 17 ·
Replies
17
Views
2K
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K