Proof of the method of lagrange multipliers

ehrenfest · Jun 29, 2008

I have used this method quite a lot but I have never completely understood the proof. The only book I have that provides a proof is Shifrin's "Multivariable Mathematics" which I find kind of confusing. Stewart's "proof" is more or less just geometric intuition. Does anyone know of a book that provides a rigorous proof of this result or a website that does?

slider142 · Jun 29, 2008

ehrenfest said:

I have used this method quite a lot but I have never completely understood the proof. The only book I have that provides a proof is Shifrin's "Multivariable Mathematics" which I find kind of confusing. Stewart's "proof" is more or less just geometric intuition. Does anyone know of a book that provides a rigorous proof of this result or a website that does?

Spivak's "Calculus on Manifolds" has the proof as an exercise. However, you are rigorously prepared to provide it by the tools in the chapter. An elegant proof is also given in Hubbard's "Vector Calculus, Linear Algebra, and Differential Forms".

HallsofIvy · Jun 29, 2008

A simple way of thinking about it, though not a rigorous proof, is this: suppose you were asked to find a maximum (or minimum) value for f(x,y,z). One method would be to calculate the gradient of f, which always points in the direction of fastest increase, follow that direction a short distance, the recalculate the gradient.Keep doing that until the is no "direction" to follow- i.e. until the gradient is the 0 vector. (I once wrote a computer program that used that idea to find max or min. It worked, but was horrendously slow!)

Now, suppose you want to find a maximum (or minimum) value for f(x,y,z) but must remain on the surface defined by g(x,y,z)= constant. Now you cannot, in general "follow the direction of the gradient" because that would lead you off the surface. But you can "come as close as possible"- take the projection of the gradient vector onto the surface and follow that. Repeat that as long as you can: not until the gradient vector is 0, but until its projection onto the surface is the 0 vector which is the same as saying grad f is perpendicular to the surface.

Since g(x,y,z)= constant is a "level surface" for g, it is easy to see that grad g is perpendicular to the surface at every point: our condition for a maximum (or minimum) value of f on the surface g(x,y,z)= constant is that the two gradients are parallel: that [itex]\nabla f= \lambda \nabla g[/itex] for some constant [/itex]\lambda[/itex].

ehrenfest · Jun 29, 2008

HallsofIvy said:

A simple way of thinking about it, though not a rigorous proof, is this: suppose you were asked to find a maximum (or minimum) value for f(x,y,z). One method would be to calculate the gradient of f, which always points in the direction of fastest increase, follow that direction a short distance, the recalculate the gradient.Keep doing that until the is no "direction" to follow- i.e. until the gradient is the 0 vector. (I once wrote a computer program that used that idea to find max or min. It worked, but was horrendously slow!)

Now, suppose you want to find a maximum (or minimum) value for f(x,y,z) but must remain on the surface defined by g(x,y,z)= constant. Now you cannot, in general "follow the direction of the gradient" because that would lead you off the surface. But you can "come as close as possible"- take the projection of the gradient vector onto the surface and follow that. Repeat that as long as you can: not until the gradient vector is 0, but until its projection onto the surface is the 0 vector which is the same as saying grad f is perpendicular to the surface.

Since g(x,y,z)= constant is a "level surface" for g, it is easy to see that grad g is perpendicular to the surface at every point: our condition for a maximum (or minimum) value of f on the surface g(x,y,z)= constant is that the two gradients are parallel: that [itex]\nabla f= \lambda \nabla g[/itex] for some constant [/itex]\lambda[/itex].

Its interesting I think that the rigorous proof relies on the implicit function theorem (at least in Shifrin) while the nonrigorous, geometric proof you just gave does not really have any hint of the implicit function theorem.

ehrenfest · Jul 8, 2008

slider142 said:

Spivak's "Calculus on Manifolds" has the proof as an exercise.

I finally managed to get a copy of that book. What exercise is it exactly?

HallsofIvy · Jul 9, 2008

ehrenfest said:

Its interesting I think that the rigorous proof relies on the implicit function theorem (at least in Shifrin) while the nonrigorous, geometric proof you just gave does not really have any hint of the implicit function theorem.

The devil is in the details!

tim_lou · Jul 9, 2008

The rigorous proof that I understand involves implicit function theorem. Basically, when a smooth function is at a critical point, you get df is degenerate.
in the case the range of f is ℝ

df=∇f is degenerate iff ∇f=0 vector.

when one has constrains,
i.e. maximizing f:ℝ3 -> ℝ, f(x,y,z) under the constrain g(x,y,z)=0,
if (∂z)g≠0, we can solve for z=z(x,y)

(if (∂z)g is zero, choose (∂x)g or other stuffs... in the case ∇g=0, you can figure things out seperatedly)

notice g(x,y,z(x,y))=0
[tex]\frac{\partial g}{\partial x} + \frac{\partial g}{\partial z}\frac{\partial z}{\partial x}=0[/tex]

similarly for y.

so that we may transform the problem into one that doesn't involve constrain,
i.e., maximize k(x,y)= f(x,y, z(x,y)), k is a function from ℝ2 -> ℝ, and then one can do

(∂x) k = 0
(∂y) k = 0

or

[tex]\frac{\partial f}{\partial x} + \frac{\partial f}{\partial z}\frac{\partial z}{\partial x}=0[/tex]

and

[tex]\frac{\partial f}{\partial y} + \frac{\partial f}{\partial z}\frac{\partial z}{\partial y}=0[/tex]

then we can transform this equation by solving ∂z in terms of ∂g
notice if one puts
[tex]\lambda = \frac{\partial f/\partial z}{\partial g/\partial z}[/tex]
it reduces to the Lagrange multiplier equation.

(of course, there are details like implicit function only works for small neighborhoods and what not... but the details can be worked out)

for me, I think intuitively that by applying constrain, the derivative of f is somehow modified.
I.e.
[tex]\textrm{D}f= \nabla f - \lambda \nabla g[/tex]
and that one proceeds with the usual procedure and solve Df=0.

arildno · Jul 9, 2008

Here's a sketch of a slightly different proof, avoiding the implicit function theorem and projections:

You are to maximize f(x,y,z) under the condition g(x,y,z)=0.

Now, consider instead the following four-variable function:
[tex]F(x,y,z,\lambda)=f(x,y,z)+\lambda{g}(x,y,z)[/tex]
Note that in the REGION g(x,y,z)=0, F is identical with f!

Now, we are to find the extrema of F, i.e, where its gradient is 0:
[tex]0=\frac{\partial{F}}{\partial{x}},0=\frac{\partial{F}}{\partial{y}},0=\frac{\partial{F}}{\partial{z}},0=\frac{\partial{F}}{\partial\lambda}[/tex]
The first three equations reduce to:
[tex]\nabla{f}+\lambda\nabla{g}=0[/tex]
whereas the last equation reduces to..lo and behold: g(x,y,z)=0.

Thus, F's extrema are constrained to the region where F is identical with f, i.e, F's (standard) extrema coincide with the constraint-extrema of f..

Count Iblis · Jul 12, 2008

arildno said:

Here's a sketch of a slightly different proof, avoiding the implicit function theorem and projections:

You are to maximize f(x,y,z) under the condition g(x,y,z)=0.

Now, consider instead the following four-variable function:
[tex]F(x,y,z,\lambda)=f(x,y,z)+\lambda{g}(x,y,z)[/tex]
Note that in the REGION g(x,y,z)=0, F is identical with f!

Now, we are to find the extrema of F, i.e, where its gradient is 0:
[tex]0=\frac{\partial{F}}{\partial{x}},0=\frac{\partial{F}}{\partial{y}},0=\frac{\partial{F}}{\partial{z}},0=\frac{\partial{F}}{\partial\lambda}[/tex]
The first three equations reduce to:
[tex]\nabla{f}+\lambda\nabla{g}=0[/tex]
whereas the last equation reduces to..lo and behold: g(x,y,z)=0.

Thus, F's extrema are constrained to the region where F is identical with f, i.e, F's (standard) extrema coincide with the constraint-extrema of f..

This is indeed the standard proof as is usually explained to physicists. This can be generalized in a straightforward way to the variational calculus case. You have to wonder why math students are always tortured by their profs who give horrible non-intuitive proofs.

Another example:

Proof of the formula for nabla^2 in spherical cordinates. Can be derived in just a few lines, but most math students in first year are presented the "straightforward" derivation, i.e. simply express all the partial derivatives in terms of r , theta and phi.

slider142 · Jul 12, 2008

ehrenfest said:

I finally managed to get a copy of that book. What exercise is it exactly?

I'm away from home at the moment, but I'll get back to you tomorrow.

slider142 · Jul 13, 2008

ehrenfest said:

I finally managed to get a copy of that book. What exercise is it exactly?

It is exercise 5-16. By the time you reach this exercise, the machinery is developed enough to give a simple proof in 3 or 4 lines.

matt grime · Jul 13, 2008

Just think of inflating a balloon - isn't that the standard method of explanation?

ehrenfest · Jul 13, 2008

slider142 said:

It is exercise 5-16. By the time you reach this exercise, the machinery is developed enough to give a simple proof in 3 or 4 lines.

Interesting! I've never seen the theorem written in the language of differential forms but I guess it is natural. I was looking for the exercise in the Implicit Functions section. I'll post the that Exercise and the solution here once I get that far through Spivak.

mathwonk · Jul 13, 2008

the derivative of a function in a given direction is done by dotting the gradient with the direction vector. hence the maximum of that function on a given surface occurs at a point where the directional derivative is zero in every direction in that surface, i.e. the gradient dots to zero with every vector tangent to a curve in the surface. this means the gradient is perpendicular to the surface, or equivalently parallel to the gradient vector of the function defining the surface as a level surface. so to maximize f on g=0, try setting gradf parallel to grad g at points of g=0.

Proof of the method of lagrange multipliers

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Why ##a^0=1##?

Undergrad Finding the minimum distance between two curves

High School Straightforward integration…

High School Arc Length for Hyperbolic Sin

Undergrad Ambiguity of the term "indefinite integral"

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect