# Proof of the method of lagrange multipliers

1. Jun 29, 2008

### ehrenfest

I have used this method quite a lot but I have never completely understood the proof. The only book I have that provides a proof is Shifrin's "Multivariable Mathematics" which I find kind of confusing. Stewart's "proof" is more or less just geometric intuition. Does anyone know of a book that provides a rigorous proof of this result or a website that does?

2. Jun 29, 2008

### slider142

Spivak's "Calculus on Manifolds" has the proof as an exercise. However, you are rigorously prepared to provide it by the tools in the chapter. An elegant proof is also given in Hubbard's "Vector Calculus, Linear Algebra, and Differential Forms".

3. Jun 29, 2008

### HallsofIvy

A simple way of thinking about it, though not a rigorous proof, is this: suppose you were asked to find a maximum (or minimum) value for f(x,y,z). One method would be to calculate the gradient of f, which always points in the direction of fastest increase, follow that direction a short distance, the recalculate the gradient.Keep doing that until the is no "direction" to follow- i.e. until the gradient is the 0 vector. (I once wrote a computer program that used that idea to find max or min. It worked, but was horrendously slow!)

Now, suppose you want to find a maximum (or minimum) value for f(x,y,z) but must remain on the surface defined by g(x,y,z)= constant. Now you cannot, in general "follow the direction of the gradient" because that would lead you off the surface. But you can "come as close as possible"- take the projection of the gradient vector onto the surface and follow that. Repeat that as long as you can: not until the gradient vector is 0, but until its projection onto the surface is the 0 vector which is the same as saying grad f is perpendicular to the surface.

Since g(x,y,z)= constant is a "level surface" for g, it is easy to see that grad g is perpendicular to the surface at every point: our condition for a maximum (or minimum) value of f on the surface g(x,y,z)= constant is that the two gradients are parallel: that $\nabla f= \lambda \nabla g$ for some constant [/itex]\lambda[/itex].

4. Jun 29, 2008

### ehrenfest

Its interesting I think that the rigorous proof relies on the implicit function theorem (at least in Shifrin) while the nonrigorous, geometric proof you just gave does not really have any hint of the implicit function theorem.

5. Jul 8, 2008

### ehrenfest

I finally managed to get a copy of that book. What exercise is it exactly?

6. Jul 9, 2008

### HallsofIvy

The devil is in the details!

7. Jul 9, 2008

### tim_lou

The rigorous proof that I understand involves implicit function theorem. Basically, when a smooth function is at a critical point, you get df is degenerate.
in the case the range of f is ℝ

df=∇f is degenerate iff ∇f=0 vector.

when one has constrains,
i.e. maximizing f:ℝ3 -> ℝ, f(x,y,z) under the constrain g(x,y,z)=0,
if (∂z)g≠0, we can solve for z=z(x,y)

(if (∂z)g is zero, choose (∂x)g or other stuffs... in the case ∇g=0, you can figure things out seperatedly)

notice g(x,y,z(x,y))=0
$$\frac{\partial g}{\partial x} + \frac{\partial g}{\partial z}\frac{\partial z}{\partial x}=0$$

similarly for y.

so that we may transform the problem into one that doesn't involve constrain,
i.e., maximize k(x,y)= f(x,y, z(x,y)), k is a function from ℝ2 -> ℝ, and then one can do

(∂x) k = 0
(∂y) k = 0

or

$$\frac{\partial f}{\partial x} + \frac{\partial f}{\partial z}\frac{\partial z}{\partial x}=0$$

and

$$\frac{\partial f}{\partial y} + \frac{\partial f}{\partial z}\frac{\partial z}{\partial y}=0$$

then we can transform this equation by solving ∂z in terms of ∂g
notice if one puts
$$\lambda = \frac{\partial f/\partial z}{\partial g/\partial z}$$
it reduces to the Lagrange multiplier equation.

(of course, there are details like implicit function only works for small neighborhoods and what not... but the details can be worked out)

for me, I think intuitively that by applying constrain, the derivative of f is somehow modified.
I.e.
$$\textrm{D}f= \nabla f - \lambda \nabla g$$
and that one proceeds with the usual procedure and solve Df=0.

Last edited: Jul 9, 2008
8. Jul 9, 2008

### arildno

Here's a sketch of a slightly different proof, avoiding the implicit function theorem and projections:

You are to maximize f(x,y,z) under the condition g(x,y,z)=0.

Now, consider instead the following four-variable function:
$$F(x,y,z,\lambda)=f(x,y,z)+\lambda{g}(x,y,z)$$
Note that in the REGION g(x,y,z)=0, F is identical with f!

Now, we are to find the extrema of F, i.e, where its gradient is 0:
$$0=\frac{\partial{F}}{\partial{x}},0=\frac{\partial{F}}{\partial{y}},0=\frac{\partial{F}}{\partial{z}},0=\frac{\partial{F}}{\partial\lambda}$$
The first three equations reduce to:
$$\nabla{f}+\lambda\nabla{g}=0$$
whereas the last equation reduces to..lo and behold: g(x,y,z)=0.

Thus, F's extrema are constrained to the region where F is identical with f, i.e, F's (standard) extrema coincide with the constraint-extrema of f..

Last edited: Jul 9, 2008
9. Jul 12, 2008

### Count Iblis

This is indeed the standard proof as is usually explained to physicists. This can be generalized in a straightforward way to the variational calculus case. You have to wonder why math students are always tortured by their profs who give horrible non-intuitive proofs.

Another example:

Proof of the formula for nabla^2 in spherical cordinates. Can be derived in just a few lines, but most math students in first year are presented the "straightforward" derivation, i.e. simply express all the partial derivatives in terms of r , theta and phi.

10. Jul 12, 2008

### slider142

I'm away from home at the moment, but I'll get back to you tomorrow.

11. Jul 13, 2008

### slider142

It is exercise 5-16. By the time you reach this exercise, the machinery is developed enough to give a simple proof in 3 or 4 lines.

12. Jul 13, 2008

### matt grime

Just think of inflating a balloon - isn't that the standard method of explanation?

13. Jul 13, 2008

### ehrenfest

Interesting! I've never seen the theorem written in the language of differential forms but I guess it is natural. I was looking for the exercise in the Implicit Functions section. I'll post the that Exercise and the solution here once I get that far through Spivak.

14. Jul 13, 2008

### mathwonk

the derivative of a function in a given direction is done by dotting the gradient with the direction vector. hence the maximum of that function on a given surface occurs at a point where the directional derivative is zero in every direction in that surface, i.e. the gradient dots to zero with every vector tangent to a curve in the surface. this means the gradient is perpendicular to the surface, or equivalently parallel to the gradient vector of the function defining the surface as a level surface. so to maximize f on g=0, try setting gradf parallel to grad g at points of g=0.