Local min no other zeros of gradient

jostpuur · Apr 23, 2014

Assume that [itex]f:\mathbb{R}^N\to\mathbb{R}[/itex] is a differentiable function and that [itex]x_0\in\mathbb{R}^N[/itex] is a local minimum of [itex]f[/itex]. Also assume that [itex]N\geq 2[/itex] and that the gradient of [itex]f[/itex] has no other zeros than the [itex]x_0[/itex]. In other words

[tex]
\nabla f(x)=0\quad\implies\quad x=x_0
[/tex]

Is the [itex]x_0[/itex] a global minimum?

jbunniii · Apr 23, 2014

How about if we put
$$f(x,y) = (1 + 2x^2 - x^4)(1+y^2)$$
Then
$$\nabla f(x,y) = (4(x - x^3)(1+y^2), (1 + 2x^2 - x^4)(2y))$$
Therefore ##\nabla f(x,y) = 0## if and only if ##(x,y) = (0,0)##. There is a local minimum at ##(0,0)##. But there is no global minimum, as can be seen by setting ##y=0## and letting ##x \rightarrow \infty##. There is also no global maximum, as can be seen by setting ##x=0## and letting ##y \rightarrow \infty##.

jostpuur · Apr 23, 2014

In the attempt the points [itex](x,y)=(\pm 1,0)[/itex] are saddle points where the gradient equals zero.

jbunniii · Apr 23, 2014

jostpuur said:

In the attempt the points [itex](x,y)=(\pm 1,0)[/itex] are saddle points where the gradient equals zero.

Oops, you're right.

jostpuur · Apr 23, 2014

I think I just managed to draw a picture of a counter example. We examine the two dimensional case and define [itex]f(x,y)=y^2-x^2[/itex] for [itex]y\geq 1[/itex], [itex]f(x,y)=y^2+x^2[/itex] for [itex]y\leq 1-\frac{1}{2}x^2[/itex], and then "invent something" to fill the gap [itex]1-\frac{1}{2}x^2 < y < 1[/itex].

First I tried to prove the "yes" answer, but after encountering some difficulties, I eventually started considering the possibility of some counter example. Then, at first I thought that a counter example would need to be very pathological, and didn't put much effort. Now I wouldn't be so pessimistic, although it still seem tricky to get the formulas working.

mfb · Apr 23, 2014

I don't see how this is supposed to become a counterexample. Where is your local minimum? (x=0, y=1) (the only special point in your function) is not a minimum, as every (x=epsilon, y=1) leads to a smaller function value.

I don't think such a function exists, but I don't have a mathematical proof.

jostpuur · Apr 24, 2014

Origin (x,y)=(0,0) would be the local minimum, and the only place where gradient is zero. (x,y)=(0,1) would be the clever point where the function turns to reach locally smaller values without being a saddle point.

mfb · Apr 24, 2014

Ah okay. Nice idea.

For "invent something", I suggest:
$$f(x,y)=y^2 - \cos\left(2\pi \frac{1-y}{x^2}\right) x^2$$
The argument in the cos runs between 0 and π, the cos is 1 for the upper border and -1 for the lower border.

For y=1, this reduces to y^2-x^2 and gives the right upper border.
For y=1-x^2/2, this reduces to y^2+x^2 and gives the right lower border.

The y-derivative:
$$\partial_y f(x,y) = 2y - 2 \pi \sin\left(2\pi \frac{1-y}{x^2}\right)$$
This simply gives 2 for y=1 and for y=1-x^2/2 and therefore the borders fit.

The x-derivative:
$$\partial_x f(x,y) = \frac{4 \pi (1-y)}{x} \sin\left(2\pi \frac{1-y}{x^2}\right) - 2x \cos\left(2\pi \frac{1-y}{x^2}\right)$$

It fits at the border, as the sin is zero and the cos is -1 there.

By plotting it, it does not seem to have critical points in the relevant range, but I don't see a proof yet as setting the derivatives to zero gives problematic equations.

jostpuur · Apr 25, 2014

It is very peculiar that your attempt has the right gradients at the borders. It seems that the cosine term was written only with the intent of getting +1 on [itex]y=1[/itex], and -1 on [itex]y=1-\frac{1}{2}x^2[/itex]. Do the gradients fit due to luck or was there something deliberate about the attempt?

By the way I'm not sure if the factor [itex]\frac{1}{2}[/itex] was relevant in the definition of the border. When drawing the original picture I thought it would be natural to have the ball [itex]x^2+y^2\leq 1[/itex] under the parabola, but it doesn't seem very important anymore.

My further observations were these: (Although there is some chance for mistakes...)

It can be proven that the condition [itex]\partial_x f=0[/itex] defines a curve

[tex]
1- y = \frac{\theta_0x^2}{2\pi}
[/tex]

where [itex]\theta_0[/itex] is a constant defined by conditions [itex]\theta_0\tan(\theta_0)=1[/itex] and [itex]0\leq\theta_0\leq\pi[/itex] (and [itex]\theta_0\neq\frac{\pi}{2}[/itex]). The constant is roughly [itex]\theta_0\approx 0.86033[/itex]. Then it suffices to prove that [itex]\partial_y f\neq 0[/itex] on the same curve.

[tex]
\partial_y f = 2y- 2\pi\sin(\theta_0)
[/tex]

holds on the curve, and [itex]\pi\sin(\theta_0)\approx 2.3815[/itex] implies that [itex]\partial_y f < 0[/itex].

mfb · Apr 26, 2014

It is very peculiar that your attempt has the right gradients at the borders.

This was part of the construction. I looked for a smooth transition between y^2+x^2 and y^2-x^2 via a prefactor for the x^2, this prefactor has to go from -1 to +1 and its derivative has to be zero at the borders. The cos was a natural choice. Maybe a 3rd-order polynomial would be easier to analyze.

jk22 · Apr 26, 2014

The global minimum could also be at the boundary, or at the limit x towards infinity, however I suppose that if the derivative does not tend towards 0 then the minimum is -infinity.

daverusin · May 5, 2014

I'm afraid the proposed solution does not really work: this function is not actually differentiable. I tried to figure out how on Earth it could be working, by sketching the level curves. I reorganized the function a bit, putting the questionable point at the origin and trying to simplify: for nonnegative X let g(X,y) = (y-1)^2 + h(X,y)*X where h(X,y) = +1 when y>X/2, -1 when y<0, and cos(2pi y/X) otherwise. (The goal eventually was to let X=x^2 which will basically glue the picture of level curves of g to their mirror images to give the level curves of f, slightly squished.) The problematic point is now at the origin, and you know it's problematic because the level curve g=1 passes through it and has three components: parts of two different parabolas above and below the X-axis, and then also a curve that sort of snakes to the northeast from the origin. Specifically it leaves the origin following its tangent line there, y=mX with m approximately 0.39492 (the solution to cos(2pi t)=-2t) . You know there has to be such a curve, intermediate between all the simpler level curves for positive and negative values of the function g. Anyway in the neighborhood of a point where g is differentiable and g' is nonzero, the level curves should all be approximately parallel curves; there cannot be three branches emanating from one point.

So, is g a critical point? It's easy enough to check (from the limit definition) that dg/dX = -1 and dg/dy=-2, so this is clearly not a point where g' is zero. But just because both partials exist does not make g differentiable! Differentiability is the property of being approximated by a linear function, which in this case would have to be L(X,y) = -X-2y . In particular, the only direction one ought to be able to leave the origin while staying on a level curve is in the direction of <2,-1>. Indeed, that matches the level curve in the bottom of the picture (where g(X,y) = (y-1)^2-X) but not the other two curves.

So this turns out to be a very interesting example ("grad g exists does not imply g differentiable") but not an example of the proposed phenomenon. I didn't give it much thought but I think having only one local min really does force that point to be a global min. But one does have to be careful; for example one can have "two mountains without a valley" : f(x,y)=1 - (x^2-1)^2 - (x^2y-x-1)^2 .

dave

mfb · May 6, 2014

Hmm... I think you can avoid issues with differentiability if you take X=x^4. That suppresses x-dependencies at the critical point without changing other parts. We would have to verify that 1-2y approximates g in the way the derivative requires it (I would expect it does).

daverusin · May 6, 2014

Here is a smooth function with only one critical point (at the origin), which is a local minimum but not a global minimum: f(x,y) = x^2 + y^2(1-x)^3. If you start filling the graph with water, the origin becomes the bottom of a roughly-triangular lake lying to the west of the line x=1. By the time the water has filled the graph to the height of 1, the lake includes the whole line x=1, stretching north and south to infinity. To the west the hills rise as would the sides of a bowl. A very narrow ridge rises to the east, too, but the ridge is surrounded to the north and south by water (forming two seas that asymptotically approach the x-axis far to the east). If you walk away from the origin NE or SE you will eventually enter the seas and then follow the seabed well below a depth of zero.

FWIW, the non-compactness of the domain appears to be kind of critical. This is the basis of the part of topology called "surgery" (and cobordism, I guess). Given a smooth function f : M --> R defined on a compact manifold, one can recover the topology of M by taping together the level curves (or more precisely the inverse images of intervals in R that contain none of the critical values). There is a fairly elementary treatment of these ideas in Chapter 5 of Andrew Wallace's book "Differential Topology".

jostpuur · May 6, 2014

It is true that our example contained the mistake, that we forgot to prove its differentiability at [itex](x,y)=(0,1)[/itex]. If it is differentiable there, the gradient must be

[tex]
\nabla f(0,1) = (0,2)
[/tex]

This means that the function is differentiable at this point if

[tex]
f(x,y) = 1 + 2(y-1) + o(\|(x,y-1)\|)
[/tex]

holds when [itex](x,y)\to (0,1)[/itex]. This obviously holds for [itex]1\leq y[/itex] and [itex]y\leq 1-\frac{1}{2}x^2[/itex], so the remaining task is to prove that

[tex]
y^2 - \cos\Big(2\pi \frac{1-y}{x^2}\Big)x^2= -1 + 2y + o(\|(x,y-1)\|)
[/tex]

holds for [itex]1-\frac{1}{2}x^2<y<1[/itex]. The claim is equivalent to the claim

[tex]
(y-1)^2 - \cos\Big(2\pi\frac{1-y}{x^2}\Big)x^2 = o(\|(x,y-1)\|)
[/tex]

Triangle inequality implies

[tex]
\Big|(y-1)^2 - \cos\Big(2\pi\frac{1-y}{x^2}\Big)x^2\Big| \leq (y-1)^2 + x^2
[/tex]

so it is clear, and in fact the error term can be written in the form [itex]O(\|(x,y-1)\|^2)[/itex].

The explanation by daverusin was confusing to me, but to me it seems clear now that our function is differentiable everywhere.

jostpuur · May 6, 2014

daverusin said:

Here is a smooth function with only one critical point (at the origin), which is a local minimum but not a global minimum: f(x,y) = x^2 + y^2(1-x)^3.

Well that was surprising with its simplicity...

mfb · May 6, 2014

It uses the same concept - shown with lines of equal height, two lines touch each other. That allows the transition between the local minimum and the unbounded region.
It is nice to have such a simple example.

daverusin · May 6, 2014

I stand corrected: jostpuur's example is indeed differentiable at the "nasty" spot; indeed, the simple inequality || f(x,y) - L(x,y) || <= ||v||^2 holds for all displacements v from the point (0,1). The multiple branches of the level curve through that point all leave the point in the same direction (east-west). The non-differentiability in my function g is smoothed out by the substitution X=x^2.

So I had another look at Wallace's book, which sure did make it seem like level curves could not bifurcate as this one does. I see now that Wallace assumes all functions are C^\infty, which imposes a much more regular behaviour than jostpuur's example, which is differentiable (has a linear approximation) but does not even have a quadratic approximation with ||f(x,y) - Q(x,y) || = o(||v||^3), which Wallace requires to get the pretty results.

PS -- I write calculus contests and am always on the lookout for surprising functions like these so feel free to send me examples!

jostpuur · May 6, 2014

There is at least one thing different in these examples (besides the simplicity issue). My remark concerning the [itex]\partial_y f[/itex] in post #9 proves that the first example is not continuously differentiable at [itex](x,y)=(0,1)[/itex]. So in the light of the first example only, one might still consider that a continuously differentiable counter example couldn't exist, but this possibility became dealt with now.

Local min no other zeros of gradient

1. What is a local minimum point with no other zeros of gradient?

2. How is a local minimum point with no other zeros of gradient different from a global minimum point?

3. What does it mean if a function has multiple local minimum points with no other zeros of gradient?

4. Can a local minimum point with no other zeros of gradient be a maximum point as well?

5. How can a local minimum point with no other zeros of gradient be identified in a function?

Similar threads

Hot Threads

Recent Insights