Show that grad f(x) = 0 when x is a local minimum of f

JulienB · Aug 17, 2016

Homework Statement

Hi everybody! I'm preparing a maths exam and I unsure how to answer this question:

What is a local extremum of a function ##f: \mathbb{R}^2 \to \mathbb{R}##? Explain why ##\nabla f(x) = 0## when ##x## is a local minimum of ##f## and ##f## is continuous and differentiable.

Homework Equations

In the chapter about local extrema, the script of my teacher says:

"##f: M \to \mathbb{R}## (##M \subseteq \mathbb{R}^n## is an open set) is the function for which the local extrema will be searched, ##g: M \to \mathbb{R}^m## describes the additional conditions, and ##m \in {0, 1, ... , n}## is the number of additional conditions.
A point ##x_0 \in M## with ##g(x_0) = 0## is called local minimum of ##f## under the additional condition ##g(x) = 0## if ##\exists \epsilon > 0##, so that ##\forall x \in M## with ##0 < || x - x_0 || < \epsilon## the statement ##f(x) \geq f(x_0)## is valid."

The Attempt at a Solution

I don't really get the point of the question about ##\nabla f(x) = 0##. I would say that since ##f'(x) = \nabla f(x)##, ##\nabla f(x)## describes the slope of the function at point ##x##. If the slope is ##0##, then there is a local extrema at point ##x##. So if ##f## has a local minimum (or local maximum) at point ##x##, ##\nabla f(x) = 0##. Would you say that is enough? It is always nicer to answer such questions with a mathematical description rather than with words... Do you guys have a suggestion?

The script of my teacher confuses me. I don't understand what are those additional conditions. Is ##g## equivalent to ##f'##? That would be weird, but that would be explain why ##g(x) = 0## indicates a local extremum for ##f##. If not, can someone briefly explain me what that is?

There is also something about matrices (apparently the Hesse-matrix especially). I tried to manipulate it but without knowing the function it didn't take me anywhere. Is it relevant for this question?

Thanks a lot in advance for your answers, and sorry I posted so many questions. I would really appreciate your help. :)Julien.

S.G. Janssens · Aug 17, 2016

JulienB said:

If the slope is 0, then there is a local extrema at point ##x##.

If there is a local extremum at ##x##, then ##\nabla f(x) = 0##. (So by solving ##\nabla f(x) = 0## you find the candidates for being a local extremum.) The converse is not true: If ##\nabla f(x) = 0## then ##x## could also be a saddle point (= an "inflection point" in case ##n = 1##.) One way to be sure is to look at the eigenvalues of the second derivative (the Hessian) of ##f## at ##x##, if it exists.

JulienB said:

So if ##f## has a local minimum (or local maximum) at point ##x##, ##\nabla f(x) = 0.##

Exactly.

JulienB said:

Would you say that is enough? It is always nicer to answer such questions with a mathematical description rather than with words... Do you guys have a suggestion?

It depends. If this is indeed a mathematics exam, I would expect him to want a rigorous proof. So, assume that ##x \in M## is a local extremum and prove that this implies that ##\nabla f(x) = 0##. As it stands, you have not proven that yet.

JulienB said:

The script of my teacher confuses me. I don't understand what are those additional conditions. Is ##g## equivalent to ##′f'##? That would be weird, but that would be explain why ##g(x) = 0## indicates a local extremum for ##f##. If not, can someone briefly explain me what that is?

Your teacher is likely talking about constrained optimization (with equality constraints). What you have been talking about so far is unconstrained optimization. In the latter case there are no side conditions, which corresponds to the case ##m = 0## in your teacher's definition.

If you want to learn how to do rigorous proofs, I would advise you to first try to tackle the unconstrained case.

JulienB · Aug 17, 2016

Hi @Krylov and first thank you for your answer. The part about constrained optimization has helped me a bit, especially to understand that this case is unconstrained. Here is what I wrote after reading your post:

##f:\mathbb{R}^2 \to \mathbb{R}##, ##x \in \mathbb{R}^2##
##x## local minimum of ##f##, ##f## continuous and differentiable ##\implies## ##f'(x) = ( \partial_1 f(x)\ \partial_2 f(x) ) = (0\ 0)##
##\nabla f(x) = ( \partial_1 f(x), \partial_2 f(x) ) = (0, 0) = \vec{0}##

Is that a more mathematically rigorous proof then?Thanks a lot.Julien.

JulienB · Aug 17, 2016

And for the first question, I now wrote:

A point ##\vec{x} \in \mathbb{R}^2## is called local extremum of ##f: \mathbb{R}^2 \to \mathbb{R}##, if:
- ##f## is continuous and differentiable;
- ##\partial_1 f(\vec{x}) = \partial_2 f(\vec{x}) = 0##;
- ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##.

Would you say those conditions are necessary for ##\vec{x}## to be a local extremum of ##f##? I am still a little unsure about the ##\geq## in the 3rd condition. I think that with ##>##, ##\vec{x}## is then a strict local extremum, but the question did not mention "strict" so I replaced it with ##\geq##. Good/Bad idea? :)

And in the first condition, should ##f## be continuous and differentiable in ##\vec{x}## or everywhere?

Thanks a lot in advance for your answers.Julien.

S.G. Janssens · Aug 17, 2016

JulienB said:

Hi @Krylov and first thank you for your answer. The part about constrained optimization has helped me a bit, especially to understand that this case is unconstrained.

Let me also point out that I find your teacher's definition

JulienB said:

A point ##x_0 \in M## with ##g(x_0) = 0## is called local minimum of ##f## under the additional condition ##g(x) = 0## if ##\exists \epsilon > 0##, so that ##\forall x \in M## with ##0 < || x - x_0 || < \epsilon## the statement ##f(x) \geq f(x_0)## is valid."

only makes sense when ##m = 0## (i.e. there are no constraints). Otherwise, it does not make sense, because in the defining part of his definition there is no mentioning of ##g## anywhere.

JulienB said:

Here is what I wrote after reading your post:

##f:\mathbb{R}^2 \to \mathbb{R}##, ##x \in \mathbb{R}^2##
##x## local minimum of ##f##, ##f## continuous and differentiable ##\implies## ##f'(x) = ( \partial_1 f(x)\ \partial_2 f(x) ) = (0\ 0)##
##\nabla f(x) = ( \partial_1 f(x), \partial_2 f(x) ) = (0, 0) = \vec{0}##

Is that a more mathematically rigorous proof then?

I'm afraid I would have to say "no". Surely I am not the "gold standard", but the way I would write it is something like this:

Let ##f : M \subseteq \mathbb{R}^2 \to \mathbb{R}## be differentiable on the open set ##M##. (You could assume ##M = \mathbb{R}^2## for simplicity. This does not change much.) If ##f## has a local minimum at ##x_0 \in M##, then ##\nabla f(x_0) = 0##.

Proof: By definition, there exists ##\epsilon > 0## such that for all ##x \in M## with ##\|x - x_0\| < \epsilon## the statement ##f(x) \ge f(x_0)## holds. (This is just your teacher's definition.) We need to prove that ##\nabla f(x_0) = 0##. I will give you the start of the proof (which works equally well in spaces of higher dimension), so you can try to proceed.

Suppose that ##\nabla f(x_0) \neq 0##. Then there exists some ##h \in \mathbb{R}^2## such that ##\nabla f(x_0) \cdot h \neq 0##. By replacing ##h## by ##-h##, if needed, we may assume that ##\nabla f(x_0) \cdot h < 0##. Now, by the definition of the differentiability of ##f## at ##x_0##,
$$
f(x) = f(x_0) + \nabla f(x_0) \cdot(x - x_0) + R(\|x - x_0\|)
$$
for all ##x## sufficiently close to ##x_0##. (Here the remainder term ##R(\|x - x_0\|) = o(\|x - x_0\|)## as ##x \to x_0##.) So, in particular, for ##x = x_0 + t h## (with ##t > 0## small enough such that ##\|x - x_0\| < \epsilon##), we have
$$
0 \le f(x_0 + th) - f(x_0) = t \left[\nabla f(x_0) \cdot h + \frac{1}{t}R(th)\right]
$$
Now, how do you obtain a contradiction?

It really depends on what you teacher is after. If he wants an entirely rigorous mathematical proof, I would give him something like the above. If he is fine with something more hand waving, then you could get away with less.

S.G. Janssens · Aug 17, 2016

JulienB said:

And for the first question, I now wrote:

A point ##\vec{x} \in \mathbb{R}^2## is called local extremum of ##f: \mathbb{R}^2 \to \mathbb{R}##, if:
- ##f## is continuous and differentiable;
- ##\partial_1 f(\vec{x}) = \partial_2 f(\vec{x}) = 0##;
- ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##.

Would you say those conditions are necessary for ##\vec{x}## to be a local extremum of ##f##? I am still a little unsure about the ##\geq## in the 3rd condition. I think that with ##>##, ##\vec{x}## is then a strict local extremum, but the question did not mention "strict" so I replaced it with ##\geq##. Good/Bad idea? :)

And in the first condition, should ##f## be continuous and differentiable in ##\vec{x}## or everywhere?

Thanks a lot in advance for your answers.

Julien.

I would define "local extremum" following the definition of your teacher ("There exists ##\epsilon > 0## such that ##f(x) \ge f(x_0)## (min.) or ##f(x) \le f(x_0)## (max.) for all ##x \in M## with ##\|x - x_0\| < \epsilon##.")

If ##f## happens to be differentiable at ##x_0## there exists a necessary condition for an extremum (indeed, ##\nabla f(x_0) = 0##). If in addition ##\nabla f## is also differentiable close to ##x_0## and its derivative is continuous, then you can use the definiteness of the Hessian as a sufficient condition. When ##n = 2## this definiteness can be verified by inspecting the sign of the determinant of the Hessian, as you do when you write ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##. The inequality needs to be strict, though.

Ray Vickson · Aug 18, 2016

JulienB said:

And for the first question, I now wrote:

A point ##\vec{x} \in \mathbb{R}^2## is called local extremum of ##f: \mathbb{R}^2 \to \mathbb{R}##, if:
- ##f## is continuous and differentiable;
- ##\partial_1 f(\vec{x}) = \partial_2 f(\vec{x}) = 0##;
- ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##.

Would you say those conditions are necessary for ##\vec{x}## to be a local extremum of ##f##? I am still a little unsure about the ##\geq## in the 3rd condition. I think that with ##>##, ##\vec{x}## is then a strict local extremum, but the question did not mention "strict" so I replaced it with ##\geq##. Good/Bad idea? :)

And in the first condition, should ##f## be continuous and differentiable in ##\vec{x}## or everywhere?

Thanks a lot in advance for your answers.Julien.

There is a bit of a mis-match between 2nd-order necessary and sufficient conditions for a minimum. Assuming the function is ##C^2## in a neighbourhood of ##p_0 =(x_0,y_0)##, we first of all need ##f_x(p_0) = f_y(p_0) = 0## (where ##f_x, f_y## are the partial derivatives). That is a first-order necessary condition.

For second-order we have:

(1) Necessary (not sufficient) conditions for a minimum at ##p_0## are ##f_{xx}, f_{yy} \geq 0## and (as you say) ##f_{xx} f_{yy} \geq f_{xy}^2## (all evaluated at ##p_0##). Basically, the Hessian matrix must be positive semi-definite. (Note that besides the second "mixed" condition you gave, we also need the individual conditions on ##f_{xx}## and ##f_{yy}##, because without them we could have ##f_{xx} < 0, f_{yy} < 0## but ##f_{xx} f_{yy} \geq f_{xy}^2##, and that would correspond to a saddle point, not a minimum.)

To see that these are not sufficient, look at the function ##f(x,y) = (y-x^2)(y - 2x^2)##. Above the upper parabola ##U: y = 2 x^2## both factors of ##f## are > 0, so ##f > 0##. Below the lower parabola ##L: y = x^2## both factors are < 0, so ##f > 0## again. However, between the two parabolas (that is, where ##x^2 < y < 2 x^2##) the two factors have opposite signs, so ##f < 0##. Thus, the origin ##O = (0,0)## is neither a local max nor a local min. However, we have ##f_x =0, f_y=0## at ##O##, and the Hessian matrix is
$$H(x,y) \equiv \pmatrix{f_{xx} & f_{xy} \\f_{xy} & f_{yy}} = \pmatrix{24 x^2 - 6y & -6x \\ -6x & 2}, $$
so
$$H(O) = \pmatrix{0 & 0 \\ 0 & 2 } $$.
All the second-order necessary conditions hold at ##O##.(2) Sufficient (not necessary) conditions for a strict local minimum at ##p_0## are that ##f_{xx} > 0, f_{yy} > 0## and ##f_{xx} f_{yy} > f_{xy}^2##. This states that the Hessian matrix is positive definite.

To see that these are not necessary, look at ##f(x) = x^4 + y^4##. The origin ##O = (0,0)## is a strict global minimum of ##f##, but the sufficient conditions fail to hold at ##O##.

By the way: the function ##f## in (1) above was something I heard about in a seminar by Adi Ben-Israel many years ago. It has the intriguing property that if you go a short distance away from ##(0,0)## along ANY line, the function goes up strictly (that is, leads to points where ##f(p) > f(0,0)## for all nonzero ##p## a short distance along the line). Nevertheless, ##(0,0)## is NOT a local minimum.

gneill · Aug 21, 2016

Mentor note: Helpers are reminded not to provide complete solutions or to do significant portions of the work towards such a solution. Instead, the questioner should be encouraged towards the solution via hints and suggestions or the pointing out of errors or misconceptions.

JulienB · Aug 22, 2016

Thank you everybody for your answers, this thread has helped me a lot! I'm back to my books :)Julien.

Show that grad f(x) = 0 when x is a local minimum of f

Homework Statement

Homework Equations

The Attempt at a Solution

Similar threads

Distance between a Clock's hands when the distance is increasing most rapidly

Volume with spherical coordinates

Use greedy vertex coloring algorithm to prove the upper bound of χ

Does this series converge uniformly?

Independent components of three indexed systems ##T_{ijk}##

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers