Proof of the method of lagrange multipliers

Click For Summary
SUMMARY

The discussion centers on the proof of the method of Lagrange multipliers, with participants seeking rigorous resources. Key texts mentioned include Shifrin's "Multivariable Mathematics," which relies on the implicit function theorem, and Spivak's "Calculus on Manifolds," which presents the proof as an exercise. Hubbard's "Vector Calculus, Linear Algebra, and Differential Forms" is also noted for its elegant proof. The participants emphasize the geometric intuition behind the method, highlighting the necessity of understanding gradients and constraints in optimization problems.

PREREQUISITES
  • Understanding of gradients and their geometric interpretation
  • Familiarity with the implicit function theorem
  • Basic knowledge of multivariable calculus
  • Experience with optimization techniques in mathematics
NEXT STEPS
  • Study Spivak's "Calculus on Manifolds" to complete exercise 5-16 on Lagrange multipliers
  • Explore Hubbard's "Vector Calculus, Linear Algebra, and Differential Forms" for an elegant proof
  • Research the applications of the implicit function theorem in optimization
  • Investigate alternative proofs of the Lagrange multipliers method that avoid the implicit function theorem
USEFUL FOR

Mathematicians, students of multivariable calculus, and anyone interested in optimization techniques, particularly those studying Lagrange multipliers and their proofs.

ehrenfest
Messages
2,001
Reaction score
1
I have used this method quite a lot but I have never completely understood the proof. The only book I have that provides a proof is Shifrin's "Multivariable Mathematics" which I find kind of confusing. Stewart's "proof" is more or less just geometric intuition. Does anyone know of a book that provides a rigorous proof of this result or a website that does?
 
Physics news on Phys.org
ehrenfest said:
I have used this method quite a lot but I have never completely understood the proof. The only book I have that provides a proof is Shifrin's "Multivariable Mathematics" which I find kind of confusing. Stewart's "proof" is more or less just geometric intuition. Does anyone know of a book that provides a rigorous proof of this result or a website that does?

Spivak's "Calculus on Manifolds" has the proof as an exercise. However, you are rigorously prepared to provide it by the tools in the chapter. An elegant proof is also given in Hubbard's "Vector Calculus, Linear Algebra, and Differential Forms".
 
A simple way of thinking about it, though not a rigorous proof, is this: suppose you were asked to find a maximum (or minimum) value for f(x,y,z). One method would be to calculate the gradient of f, which always points in the direction of fastest increase, follow that direction a short distance, the recalculate the gradient.Keep doing that until the is no "direction" to follow- i.e. until the gradient is the 0 vector. (I once wrote a computer program that used that idea to find max or min. It worked, but was horrendously slow!)

Now, suppose you want to find a maximum (or minimum) value for f(x,y,z) but must remain on the surface defined by g(x,y,z)= constant. Now you cannot, in general "follow the direction of the gradient" because that would lead you off the surface. But you can "come as close as possible"- take the projection of the gradient vector onto the surface and follow that. Repeat that as long as you can: not until the gradient vector is 0, but until its projection onto the surface is the 0 vector which is the same as saying grad f is perpendicular to the surface.

Since g(x,y,z)= constant is a "level surface" for g, it is easy to see that grad g is perpendicular to the surface at every point: our condition for a maximum (or minimum) value of f on the surface g(x,y,z)= constant is that the two gradients are parallel: that \nabla f= \lambda \nabla g for some constant [/itex]\lambda[/itex].
 
HallsofIvy said:
A simple way of thinking about it, though not a rigorous proof, is this: suppose you were asked to find a maximum (or minimum) value for f(x,y,z). One method would be to calculate the gradient of f, which always points in the direction of fastest increase, follow that direction a short distance, the recalculate the gradient.Keep doing that until the is no "direction" to follow- i.e. until the gradient is the 0 vector. (I once wrote a computer program that used that idea to find max or min. It worked, but was horrendously slow!)

Now, suppose you want to find a maximum (or minimum) value for f(x,y,z) but must remain on the surface defined by g(x,y,z)= constant. Now you cannot, in general "follow the direction of the gradient" because that would lead you off the surface. But you can "come as close as possible"- take the projection of the gradient vector onto the surface and follow that. Repeat that as long as you can: not until the gradient vector is 0, but until its projection onto the surface is the 0 vector which is the same as saying grad f is perpendicular to the surface.

Since g(x,y,z)= constant is a "level surface" for g, it is easy to see that grad g is perpendicular to the surface at every point: our condition for a maximum (or minimum) value of f on the surface g(x,y,z)= constant is that the two gradients are parallel: that \nabla f= \lambda \nabla g for some constant [/itex]\lambda[/itex].

Its interesting I think that the rigorous proof relies on the implicit function theorem (at least in Shifrin) while the nonrigorous, geometric proof you just gave does not really have any hint of the implicit function theorem.
 
slider142 said:
Spivak's "Calculus on Manifolds" has the proof as an exercise.

I finally managed to get a copy of that book. What exercise is it exactly?
 
ehrenfest said:
Its interesting I think that the rigorous proof relies on the implicit function theorem (at least in Shifrin) while the nonrigorous, geometric proof you just gave does not really have any hint of the implicit function theorem.

The devil is in the details!
 
The rigorous proof that I understand involves implicit function theorem. Basically, when a smooth function is at a critical point, you get df is degenerate.
in the case the range of f is ℝ

df=∇f is degenerate iff ∇f=0 vector.

when one has constrains,
i.e. maximizing f:ℝ3 -> ℝ, f(x,y,z) under the constrain g(x,y,z)=0,
if (∂z)g≠0, we can solve for z=z(x,y)

(if (∂z)g is zero, choose (∂x)g or other stuffs... in the case ∇g=0, you can figure things out seperatedly)

notice g(x,y,z(x,y))=0
\frac{\partial g}{\partial x} + \frac{\partial g}{\partial z}\frac{\partial z}{\partial x}=0

similarly for y.

so that we may transform the problem into one that doesn't involve constrain,
i.e., maximize k(x,y)= f(x,y, z(x,y)), k is a function from ℝ2 -> ℝ, and then one can do

(∂x) k = 0
(∂y) k = 0

or

\frac{\partial f}{\partial x} + \frac{\partial f}{\partial z}\frac{\partial z}{\partial x}=0

and

\frac{\partial f}{\partial y} + \frac{\partial f}{\partial z}\frac{\partial z}{\partial y}=0

then we can transform this equation by solving ∂z in terms of ∂g
notice if one puts
\lambda = \frac{\partial f/\partial z}{\partial g/\partial z}
it reduces to the Lagrange multiplier equation.

(of course, there are details like implicit function only works for small neighborhoods and what not... but the details can be worked out)

for me, I think intuitively that by applying constrain, the derivative of f is somehow modified.
I.e.
\textrm{D}f= \nabla f - \lambda \nabla g
and that one proceeds with the usual procedure and solve Df=0.
 
Last edited:
Here's a sketch of a slightly different proof, avoiding the implicit function theorem and projections:

You are to maximize f(x,y,z) under the condition g(x,y,z)=0.

Now, consider instead the following four-variable function:
F(x,y,z,\lambda)=f(x,y,z)+\lambda{g}(x,y,z)
Note that in the REGION g(x,y,z)=0, F is identical with f!

Now, we are to find the extrema of F, i.e, where its gradient is 0:
0=\frac{\partial{F}}{\partial{x}},0=\frac{\partial{F}}{\partial{y}},0=\frac{\partial{F}}{\partial{z}},0=\frac{\partial{F}}{\partial\lambda}
The first three equations reduce to:
\nabla{f}+\lambda\nabla{g}=0
whereas the last equation reduces to..lo and behold: g(x,y,z)=0.

Thus, F's extrema are constrained to the region where F is identical with f, i.e, F's (standard) extrema coincide with the constraint-extrema of f..:smile:
 
Last edited:
arildno said:
Here's a sketch of a slightly different proof, avoiding the implicit function theorem and projections:

You are to maximize f(x,y,z) under the condition g(x,y,z)=0.

Now, consider instead the following four-variable function:
F(x,y,z,\lambda)=f(x,y,z)+\lambda{g}(x,y,z)
Note that in the REGION g(x,y,z)=0, F is identical with f!

Now, we are to find the extrema of F, i.e, where its gradient is 0:
0=\frac{\partial{F}}{\partial{x}},0=\frac{\partial{F}}{\partial{y}},0=\frac{\partial{F}}{\partial{z}},0=\frac{\partial{F}}{\partial\lambda}
The first three equations reduce to:
\nabla{f}+\lambda\nabla{g}=0
whereas the last equation reduces to..lo and behold: g(x,y,z)=0.

Thus, F's extrema are constrained to the region where F is identical with f, i.e, F's (standard) extrema coincide with the constraint-extrema of f..:smile:

This is indeed the standard proof as is usually explained to physicists. This can be generalized in a straightforward way to the variational calculus case. You have to wonder why math students are always tortured by their profs who give horrible non-intuitive proofs.

Another example:

Proof of the formula for nabla^2 in spherical cordinates. Can be derived in just a few lines, but most math students in first year are presented the "straightforward" derivation, i.e. simply express all the partial derivatives in terms of r , theta and phi.
 
  • #10
ehrenfest said:
I finally managed to get a copy of that book. What exercise is it exactly?

I'm away from home at the moment, but I'll get back to you tomorrow.
 
  • #11
ehrenfest said:
I finally managed to get a copy of that book. What exercise is it exactly?

It is exercise 5-16. By the time you reach this exercise, the machinery is developed enough to give a simple proof in 3 or 4 lines.
 
  • #12
Just think of inflating a balloon - isn't that the standard method of explanation?
 
  • #13
slider142 said:
It is exercise 5-16. By the time you reach this exercise, the machinery is developed enough to give a simple proof in 3 or 4 lines.

Interesting! I've never seen the theorem written in the language of differential forms but I guess it is natural. I was looking for the exercise in the Implicit Functions section. I'll post the that Exercise and the solution here once I get that far through Spivak.
 
  • #14
the derivative of a function in a given direction is done by dotting the gradient with the direction vector. hence the maximum of that function on a given surface occurs at a point where the directional derivative is zero in every direction in that surface, i.e. the gradient dots to zero with every vector tangent to a curve in the surface. this means the gradient is perpendicular to the surface, or equivalently parallel to the gradient vector of the function defining the surface as a level surface. so to maximize f on g=0, try setting gradf parallel to grad g at points of g=0.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
25
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K