Proof of the method of lagrange multipliers

Click For Summary

Discussion Overview

The discussion revolves around the proof of the method of Lagrange multipliers, focusing on its rigorous derivation and various interpretations. Participants share their experiences with different texts and proofs, exploring both geometric intuitions and formal approaches.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants express difficulty in understanding the proof of the method of Lagrange multipliers, noting that existing texts like Shifrin's and Stewart's provide either confusing or non-rigorous explanations.
  • One participant mentions that Spivak's "Calculus on Manifolds" includes the proof as an exercise, prompting inquiries about the specific exercise.
  • Another participant describes a non-rigorous method of finding extrema by following the gradient and projecting it onto a constraint surface, highlighting the geometric intuition behind the method.
  • A later reply discusses the reliance of rigorous proofs on the implicit function theorem, contrasting it with the geometric approach that does not invoke this theorem.
  • One participant outlines a proof involving the implicit function theorem, detailing how to transform the problem of maximizing a function under constraints into a form that does not involve constraints.
  • Another participant presents an alternative proof using a four-variable function that incorporates the constraint, leading to the same conditions for extrema without relying on the implicit function theorem.

Areas of Agreement / Disagreement

Participants do not reach a consensus on a single proof or method, as multiple approaches and interpretations are presented, with some preferring geometric intuition and others favoring rigorous mathematical proofs.

Contextual Notes

Participants note that the rigorous proof may depend on the implicit function theorem and that there are details regarding the applicability of these methods in specific contexts, such as the behavior of functions in small neighborhoods.

ehrenfest
Messages
2,001
Reaction score
1
I have used this method quite a lot but I have never completely understood the proof. The only book I have that provides a proof is Shifrin's "Multivariable Mathematics" which I find kind of confusing. Stewart's "proof" is more or less just geometric intuition. Does anyone know of a book that provides a rigorous proof of this result or a website that does?
 
Physics news on Phys.org
ehrenfest said:
I have used this method quite a lot but I have never completely understood the proof. The only book I have that provides a proof is Shifrin's "Multivariable Mathematics" which I find kind of confusing. Stewart's "proof" is more or less just geometric intuition. Does anyone know of a book that provides a rigorous proof of this result or a website that does?

Spivak's "Calculus on Manifolds" has the proof as an exercise. However, you are rigorously prepared to provide it by the tools in the chapter. An elegant proof is also given in Hubbard's "Vector Calculus, Linear Algebra, and Differential Forms".
 
A simple way of thinking about it, though not a rigorous proof, is this: suppose you were asked to find a maximum (or minimum) value for f(x,y,z). One method would be to calculate the gradient of f, which always points in the direction of fastest increase, follow that direction a short distance, the recalculate the gradient.Keep doing that until the is no "direction" to follow- i.e. until the gradient is the 0 vector. (I once wrote a computer program that used that idea to find max or min. It worked, but was horrendously slow!)

Now, suppose you want to find a maximum (or minimum) value for f(x,y,z) but must remain on the surface defined by g(x,y,z)= constant. Now you cannot, in general "follow the direction of the gradient" because that would lead you off the surface. But you can "come as close as possible"- take the projection of the gradient vector onto the surface and follow that. Repeat that as long as you can: not until the gradient vector is 0, but until its projection onto the surface is the 0 vector which is the same as saying grad f is perpendicular to the surface.

Since g(x,y,z)= constant is a "level surface" for g, it is easy to see that grad g is perpendicular to the surface at every point: our condition for a maximum (or minimum) value of f on the surface g(x,y,z)= constant is that the two gradients are parallel: that [itex]\nabla f= \lambda \nabla g[/itex] for some constant [/itex]\lambda[/itex].
 
HallsofIvy said:
A simple way of thinking about it, though not a rigorous proof, is this: suppose you were asked to find a maximum (or minimum) value for f(x,y,z). One method would be to calculate the gradient of f, which always points in the direction of fastest increase, follow that direction a short distance, the recalculate the gradient.Keep doing that until the is no "direction" to follow- i.e. until the gradient is the 0 vector. (I once wrote a computer program that used that idea to find max or min. It worked, but was horrendously slow!)

Now, suppose you want to find a maximum (or minimum) value for f(x,y,z) but must remain on the surface defined by g(x,y,z)= constant. Now you cannot, in general "follow the direction of the gradient" because that would lead you off the surface. But you can "come as close as possible"- take the projection of the gradient vector onto the surface and follow that. Repeat that as long as you can: not until the gradient vector is 0, but until its projection onto the surface is the 0 vector which is the same as saying grad f is perpendicular to the surface.

Since g(x,y,z)= constant is a "level surface" for g, it is easy to see that grad g is perpendicular to the surface at every point: our condition for a maximum (or minimum) value of f on the surface g(x,y,z)= constant is that the two gradients are parallel: that [itex]\nabla f= \lambda \nabla g[/itex] for some constant [/itex]\lambda[/itex].

Its interesting I think that the rigorous proof relies on the implicit function theorem (at least in Shifrin) while the nonrigorous, geometric proof you just gave does not really have any hint of the implicit function theorem.
 
slider142 said:
Spivak's "Calculus on Manifolds" has the proof as an exercise.

I finally managed to get a copy of that book. What exercise is it exactly?
 
ehrenfest said:
Its interesting I think that the rigorous proof relies on the implicit function theorem (at least in Shifrin) while the nonrigorous, geometric proof you just gave does not really have any hint of the implicit function theorem.

The devil is in the details!
 
The rigorous proof that I understand involves implicit function theorem. Basically, when a smooth function is at a critical point, you get df is degenerate.
in the case the range of f is ℝ

df=∇f is degenerate iff ∇f=0 vector.

when one has constrains,
i.e. maximizing f:ℝ3 -> ℝ, f(x,y,z) under the constrain g(x,y,z)=0,
if (∂z)g≠0, we can solve for z=z(x,y)

(if (∂z)g is zero, choose (∂x)g or other stuffs... in the case ∇g=0, you can figure things out seperatedly)

notice g(x,y,z(x,y))=0
[tex]\frac{\partial g}{\partial x} + \frac{\partial g}{\partial z}\frac{\partial z}{\partial x}=0[/tex]

similarly for y.

so that we may transform the problem into one that doesn't involve constrain,
i.e., maximize k(x,y)= f(x,y, z(x,y)), k is a function from ℝ2 -> ℝ, and then one can do

(∂x) k = 0
(∂y) k = 0

or

[tex]\frac{\partial f}{\partial x} + \frac{\partial f}{\partial z}\frac{\partial z}{\partial x}=0[/tex]

and

[tex]\frac{\partial f}{\partial y} + \frac{\partial f}{\partial z}\frac{\partial z}{\partial y}=0[/tex]

then we can transform this equation by solving ∂z in terms of ∂g
notice if one puts
[tex]\lambda = \frac{\partial f/\partial z}{\partial g/\partial z}[/tex]
it reduces to the Lagrange multiplier equation.

(of course, there are details like implicit function only works for small neighborhoods and what not... but the details can be worked out)

for me, I think intuitively that by applying constrain, the derivative of f is somehow modified.
I.e.
[tex]\textrm{D}f= \nabla f - \lambda \nabla g[/tex]
and that one proceeds with the usual procedure and solve Df=0.
 
Last edited:
Here's a sketch of a slightly different proof, avoiding the implicit function theorem and projections:

You are to maximize f(x,y,z) under the condition g(x,y,z)=0.

Now, consider instead the following four-variable function:
[tex]F(x,y,z,\lambda)=f(x,y,z)+\lambda{g}(x,y,z)[/tex]
Note that in the REGION g(x,y,z)=0, F is identical with f!

Now, we are to find the extrema of F, i.e, where its gradient is 0:
[tex]0=\frac{\partial{F}}{\partial{x}},0=\frac{\partial{F}}{\partial{y}},0=\frac{\partial{F}}{\partial{z}},0=\frac{\partial{F}}{\partial\lambda}[/tex]
The first three equations reduce to:
[tex]\nabla{f}+\lambda\nabla{g}=0[/tex]
whereas the last equation reduces to..lo and behold: g(x,y,z)=0.

Thus, F's extrema are constrained to the region where F is identical with f, i.e, F's (standard) extrema coincide with the constraint-extrema of f..:smile:
 
Last edited:
arildno said:
Here's a sketch of a slightly different proof, avoiding the implicit function theorem and projections:

You are to maximize f(x,y,z) under the condition g(x,y,z)=0.

Now, consider instead the following four-variable function:
[tex]F(x,y,z,\lambda)=f(x,y,z)+\lambda{g}(x,y,z)[/tex]
Note that in the REGION g(x,y,z)=0, F is identical with f!

Now, we are to find the extrema of F, i.e, where its gradient is 0:
[tex]0=\frac{\partial{F}}{\partial{x}},0=\frac{\partial{F}}{\partial{y}},0=\frac{\partial{F}}{\partial{z}},0=\frac{\partial{F}}{\partial\lambda}[/tex]
The first three equations reduce to:
[tex]\nabla{f}+\lambda\nabla{g}=0[/tex]
whereas the last equation reduces to..lo and behold: g(x,y,z)=0.

Thus, F's extrema are constrained to the region where F is identical with f, i.e, F's (standard) extrema coincide with the constraint-extrema of f..:smile:

This is indeed the standard proof as is usually explained to physicists. This can be generalized in a straightforward way to the variational calculus case. You have to wonder why math students are always tortured by their profs who give horrible non-intuitive proofs.

Another example:

Proof of the formula for nabla^2 in spherical cordinates. Can be derived in just a few lines, but most math students in first year are presented the "straightforward" derivation, i.e. simply express all the partial derivatives in terms of r , theta and phi.
 
  • #10
ehrenfest said:
I finally managed to get a copy of that book. What exercise is it exactly?

I'm away from home at the moment, but I'll get back to you tomorrow.
 
  • #11
ehrenfest said:
I finally managed to get a copy of that book. What exercise is it exactly?

It is exercise 5-16. By the time you reach this exercise, the machinery is developed enough to give a simple proof in 3 or 4 lines.
 
  • #12
Just think of inflating a balloon - isn't that the standard method of explanation?
 
  • #13
slider142 said:
It is exercise 5-16. By the time you reach this exercise, the machinery is developed enough to give a simple proof in 3 or 4 lines.

Interesting! I've never seen the theorem written in the language of differential forms but I guess it is natural. I was looking for the exercise in the Implicit Functions section. I'll post the that Exercise and the solution here once I get that far through Spivak.
 
  • #14
the derivative of a function in a given direction is done by dotting the gradient with the direction vector. hence the maximum of that function on a given surface occurs at a point where the directional derivative is zero in every direction in that surface, i.e. the gradient dots to zero with every vector tangent to a curve in the surface. this means the gradient is perpendicular to the surface, or equivalently parallel to the gradient vector of the function defining the surface as a level surface. so to maximize f on g=0, try setting gradf parallel to grad g at points of g=0.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 25 ·
Replies
25
Views
3K
  • · Replies 10 ·
Replies
10
Views
6K