Proof of the method of lagrange multipliers

In summary: Lagrange multiplier equation works.In summary, the conversation discusses methods for finding a maximum or minimum value for a function on a surface defined by a constraint. Spivak's "Calculus on Manifolds" has an exercise that provides a rigorous proof using the implicit function theorem. However, a simpler, non-rigorous proof is given using geometric intuition. The conversation also mentions other books that provide a proof and the use of Lagrange multipliers to solve for the maximum or minimum value.
  • #1
ehrenfest
2,020
1
I have used this method quite a lot but I have never completely understood the proof. The only book I have that provides a proof is Shifrin's "Multivariable Mathematics" which I find kind of confusing. Stewart's "proof" is more or less just geometric intuition. Does anyone know of a book that provides a rigorous proof of this result or a website that does?
 
Physics news on Phys.org
  • #2
ehrenfest said:
I have used this method quite a lot but I have never completely understood the proof. The only book I have that provides a proof is Shifrin's "Multivariable Mathematics" which I find kind of confusing. Stewart's "proof" is more or less just geometric intuition. Does anyone know of a book that provides a rigorous proof of this result or a website that does?

Spivak's "Calculus on Manifolds" has the proof as an exercise. However, you are rigorously prepared to provide it by the tools in the chapter. An elegant proof is also given in Hubbard's "Vector Calculus, Linear Algebra, and Differential Forms".
 
  • #3
A simple way of thinking about it, though not a rigorous proof, is this: suppose you were asked to find a maximum (or minimum) value for f(x,y,z). One method would be to calculate the gradient of f, which always points in the direction of fastest increase, follow that direction a short distance, the recalculate the gradient.Keep doing that until the is no "direction" to follow- i.e. until the gradient is the 0 vector. (I once wrote a computer program that used that idea to find max or min. It worked, but was horrendously slow!)

Now, suppose you want to find a maximum (or minimum) value for f(x,y,z) but must remain on the surface defined by g(x,y,z)= constant. Now you cannot, in general "follow the direction of the gradient" because that would lead you off the surface. But you can "come as close as possible"- take the projection of the gradient vector onto the surface and follow that. Repeat that as long as you can: not until the gradient vector is 0, but until its projection onto the surface is the 0 vector which is the same as saying grad f is perpendicular to the surface.

Since g(x,y,z)= constant is a "level surface" for g, it is easy to see that grad g is perpendicular to the surface at every point: our condition for a maximum (or minimum) value of f on the surface g(x,y,z)= constant is that the two gradients are parallel: that [itex]\nabla f= \lambda \nabla g[/itex] for some constant [/itex]\lambda[/itex].
 
  • #4
HallsofIvy said:
A simple way of thinking about it, though not a rigorous proof, is this: suppose you were asked to find a maximum (or minimum) value for f(x,y,z). One method would be to calculate the gradient of f, which always points in the direction of fastest increase, follow that direction a short distance, the recalculate the gradient.Keep doing that until the is no "direction" to follow- i.e. until the gradient is the 0 vector. (I once wrote a computer program that used that idea to find max or min. It worked, but was horrendously slow!)

Now, suppose you want to find a maximum (or minimum) value for f(x,y,z) but must remain on the surface defined by g(x,y,z)= constant. Now you cannot, in general "follow the direction of the gradient" because that would lead you off the surface. But you can "come as close as possible"- take the projection of the gradient vector onto the surface and follow that. Repeat that as long as you can: not until the gradient vector is 0, but until its projection onto the surface is the 0 vector which is the same as saying grad f is perpendicular to the surface.

Since g(x,y,z)= constant is a "level surface" for g, it is easy to see that grad g is perpendicular to the surface at every point: our condition for a maximum (or minimum) value of f on the surface g(x,y,z)= constant is that the two gradients are parallel: that [itex]\nabla f= \lambda \nabla g[/itex] for some constant [/itex]\lambda[/itex].

Its interesting I think that the rigorous proof relies on the implicit function theorem (at least in Shifrin) while the nonrigorous, geometric proof you just gave does not really have any hint of the implicit function theorem.
 
  • #5
slider142 said:
Spivak's "Calculus on Manifolds" has the proof as an exercise.

I finally managed to get a copy of that book. What exercise is it exactly?
 
  • #6
ehrenfest said:
Its interesting I think that the rigorous proof relies on the implicit function theorem (at least in Shifrin) while the nonrigorous, geometric proof you just gave does not really have any hint of the implicit function theorem.

The devil is in the details!
 
  • #7
The rigorous proof that I understand involves implicit function theorem. Basically, when a smooth function is at a critical point, you get df is degenerate.
in the case the range of f is ℝ

df=∇f is degenerate iff ∇f=0 vector.

when one has constrains,
i.e. maximizing f:ℝ3 -> ℝ, f(x,y,z) under the constrain g(x,y,z)=0,
if (∂z)g≠0, we can solve for z=z(x,y)

(if (∂z)g is zero, choose (∂x)g or other stuffs... in the case ∇g=0, you can figure things out seperatedly)

notice g(x,y,z(x,y))=0
[tex]\frac{\partial g}{\partial x} + \frac{\partial g}{\partial z}\frac{\partial z}{\partial x}=0[/tex]

similarly for y.

so that we may transform the problem into one that doesn't involve constrain,
i.e., maximize k(x,y)= f(x,y, z(x,y)), k is a function from ℝ2 -> ℝ, and then one can do

(∂x) k = 0
(∂y) k = 0

or

[tex]\frac{\partial f}{\partial x} + \frac{\partial f}{\partial z}\frac{\partial z}{\partial x}=0[/tex]

and

[tex]\frac{\partial f}{\partial y} + \frac{\partial f}{\partial z}\frac{\partial z}{\partial y}=0[/tex]

then we can transform this equation by solving ∂z in terms of ∂g
notice if one puts
[tex]\lambda = \frac{\partial f/\partial z}{\partial g/\partial z}[/tex]
it reduces to the Lagrange multiplier equation.

(of course, there are details like implicit function only works for small neighborhoods and what not... but the details can be worked out)

for me, I think intuitively that by applying constrain, the derivative of f is somehow modified.
I.e.
[tex]\textrm{D}f= \nabla f - \lambda \nabla g[/tex]
and that one proceeds with the usual procedure and solve Df=0.
 
Last edited:
  • #8
Here's a sketch of a slightly different proof, avoiding the implicit function theorem and projections:

You are to maximize f(x,y,z) under the condition g(x,y,z)=0.

Now, consider instead the following four-variable function:
[tex]F(x,y,z,\lambda)=f(x,y,z)+\lambda{g}(x,y,z)[/tex]
Note that in the REGION g(x,y,z)=0, F is identical with f!

Now, we are to find the extrema of F, i.e, where its gradient is 0:
[tex]0=\frac{\partial{F}}{\partial{x}},0=\frac{\partial{F}}{\partial{y}},0=\frac{\partial{F}}{\partial{z}},0=\frac{\partial{F}}{\partial\lambda}[/tex]
The first three equations reduce to:
[tex]\nabla{f}+\lambda\nabla{g}=0[/tex]
whereas the last equation reduces to..lo and behold: g(x,y,z)=0.

Thus, F's extrema are constrained to the region where F is identical with f, i.e, F's (standard) extrema coincide with the constraint-extrema of f..:smile:
 
Last edited:
  • #9
arildno said:
Here's a sketch of a slightly different proof, avoiding the implicit function theorem and projections:

You are to maximize f(x,y,z) under the condition g(x,y,z)=0.

Now, consider instead the following four-variable function:
[tex]F(x,y,z,\lambda)=f(x,y,z)+\lambda{g}(x,y,z)[/tex]
Note that in the REGION g(x,y,z)=0, F is identical with f!

Now, we are to find the extrema of F, i.e, where its gradient is 0:
[tex]0=\frac{\partial{F}}{\partial{x}},0=\frac{\partial{F}}{\partial{y}},0=\frac{\partial{F}}{\partial{z}},0=\frac{\partial{F}}{\partial\lambda}[/tex]
The first three equations reduce to:
[tex]\nabla{f}+\lambda\nabla{g}=0[/tex]
whereas the last equation reduces to..lo and behold: g(x,y,z)=0.

Thus, F's extrema are constrained to the region where F is identical with f, i.e, F's (standard) extrema coincide with the constraint-extrema of f..:smile:

This is indeed the standard proof as is usually explained to physicists. This can be generalized in a straightforward way to the variational calculus case. You have to wonder why math students are always tortured by their profs who give horrible non-intuitive proofs.

Another example:

Proof of the formula for nabla^2 in spherical cordinates. Can be derived in just a few lines, but most math students in first year are presented the "straightforward" derivation, i.e. simply express all the partial derivatives in terms of r , theta and phi.
 
  • #10
ehrenfest said:
I finally managed to get a copy of that book. What exercise is it exactly?

I'm away from home at the moment, but I'll get back to you tomorrow.
 
  • #11
ehrenfest said:
I finally managed to get a copy of that book. What exercise is it exactly?

It is exercise 5-16. By the time you reach this exercise, the machinery is developed enough to give a simple proof in 3 or 4 lines.
 
  • #12
Just think of inflating a balloon - isn't that the standard method of explanation?
 
  • #13
slider142 said:
It is exercise 5-16. By the time you reach this exercise, the machinery is developed enough to give a simple proof in 3 or 4 lines.

Interesting! I've never seen the theorem written in the language of differential forms but I guess it is natural. I was looking for the exercise in the Implicit Functions section. I'll post the that Exercise and the solution here once I get that far through Spivak.
 
  • #14
the derivative of a function in a given direction is done by dotting the gradient with the direction vector. hence the maximum of that function on a given surface occurs at a point where the directional derivative is zero in every direction in that surface, i.e. the gradient dots to zero with every vector tangent to a curve in the surface. this means the gradient is perpendicular to the surface, or equivalently parallel to the gradient vector of the function defining the surface as a level surface. so to maximize f on g=0, try setting gradf parallel to grad g at points of g=0.
 

1. What is the method of Lagrange multipliers used for?

The method of Lagrange multipliers is a mathematical technique used to optimize a function subject to one or more constraints. It is commonly used in calculus, optimization, and economics.

2. How does the method of Lagrange multipliers work?

The method involves finding the critical points of the function by setting the gradient of the function equal to the gradient of the constraint, multiplied by a constant known as the Lagrange multiplier. This allows us to find the optimal solution that satisfies the given constraints.

3. What are the key assumptions of the method of Lagrange multipliers?

The main assumptions of the method are that the function and constraints are differentiable and continuous, and that the constraints are independent of each other.

4. Can the method of Lagrange multipliers be used for nonlinear functions?

Yes, the method can be used for both linear and nonlinear functions. However, for nonlinear functions, the solution may be more complex and require additional steps to solve.

5. Are there any limitations to the method of Lagrange multipliers?

One limitation of the method is that it may not always provide the global optimum solution, but rather a local optimum. It also relies on the assumption that the constraints are independent, which may not always be the case in practical applications.

Similar threads

Replies
4
Views
1K
Replies
1
Views
1K
Replies
25
Views
1K
Replies
4
Views
1K
  • Science and Math Textbooks
Replies
4
Views
1K
Replies
1
Views
597
Replies
1
Views
809
Replies
3
Views
1K
  • Calculus
Replies
1
Views
2K
Back
Top