Condition for f(x,y,z) = f(x,y,z(x,y)) being extremized

• PhDeezNutz
In summary, the Lagrange multiplier method involves defining a new function ##\mathcal{L}## that combines the original function to be minimized with a constraint function, and setting the gradient of this new function equal to zero. This results in a system of equations where the first coordinate is equal to the partial derivative of the original function with respect to one variable, the second coordinate is equal to the partial derivative with respect to another variable, and so on. These equations can be solved to find the optimal values for the variables. The method is based on geometric principles and can also be applied to the calculus of variations with constraints.
PhDeezNutz
Homework Statement
Given ##f(x,y,z) = f(x,y,z(x,y))## then what are the conditions involving the partial derivatives of $f$
Relevant Equations
Apparently they are

##\left(\frac{\partial f}{\partial x} \right)_y =\left(\frac{\partial f}{\partial y} \right)_x = \left(\frac{\partial f}{\partial z} \right)_z = 0##

where the subscript denotes the independent variable being held constant
As far as I know when a function is extremized its partial derivatives are all equal to 0 (provided we aren't dealing with a constraint)

##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##
Let's start off
##\left(\frac{\partial f}{\partial x} \right)_{y} = \left(\frac{\partial f}{\partial x}\right)_{yz} + \left( \frac{\partial f}{\partial z} \right)_{xy} \left( \frac{\partial z}{\partial x} \right)_{y}##

I really don't know where to go from here. I'm not even sure my understanding of implicit functions is up to par to even begin addressing this question.

PhDeezNutz said:
... then what are the conditions involving the partial derivatives of ##f## ...
What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.

fresh_42 said:
What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.
It is not implied. It is a bastard notation often used in physics (mainly thermodynamics) when you have the choice to use any two out of three variables as your coordinates.

fresh_42 said:
What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.
Orodruin said:
It is not implied. It is a bastard notation often used in physics (mainly thermodynamics) when you have the choice to use any two out of three variables as your coordinates.

I understand everything up until the line right after (22.63)

Why are Arken and Weber setting

##\left( \frac{\partial f}{\partial x}\right)_y = 0## instead of ##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##?

And presumably the other 2 equations in (22.66) come from setting

##\left( \frac{\partial f}{\partial y}\right)_z = 0##

##\left( \frac{\partial f}{\partial z}\right)_ x= 0##

after cyclically permuting. (It says the second equation comes from swapping x and y but I don't see it, also that doesn't explain the third equation)

Why these specific derivatives?

Given ##f(x,y,z(x,y))##

Hopefully I was able to clarify my question. Any help is appreciated because I am lost. So thanks in advance.

I find the usage of ##f## a bit confusing.

We have a function ##f## to be minimized and a constraint ##g##. What we actually do is to define a new function ##\mathcal{L}=f-\lambda g## such that we can minimize ##\mathcal{L}## now which has no constraint. Thus we can as usual solve ##\nabla \mathcal{L} = \operatorname{grad}\mathcal{L}=0##.

It is better explained here (including examples):
https://en.wikipedia.org/wiki/Lagrange_multiplier(22.64) is the first coordinate of ##\nabla \mathcal{L}=0##.

fresh_42 said:
I find the usage of ##f## a bit confusing.

We have a function ##f## to be minimized and a constraint ##g##. What we actually do is to define a new function ##\mathcal{L}=f-\lambda g## such that we can minimize ##\mathcal{L}## now which has no constraint. Thus we can as usual solve ##\nabla \mathcal{L} = \operatorname{grad}\mathcal{L}=0##.

It is better explained here (including examples):
https://en.wikipedia.org/wiki/Lagrange_multiplier(22.64) is the first coordinate of ##\nabla \mathcal{L}=0##.

I know how to use the Lagrangian multiplier method. I'm trying to prove it.

Why is the first coordinate ##\left(\frac{\partial f}{\partial x}\right)_y##? Why is the second coordinate ##\left(\frac{\partial f}{\partial y}\right)_z##? Why is the third coordinate ##\left(\frac{\partial f}{\partial z}\right)_x##? I understand the chain rule helps us confirm these expressions.

I want to know why we set them = 0? I know that once we do we get the system of equations ##\nabla f = \lambda \nabla g##.

It is much easier to think geometrically about the Lagrange multiplier method. If a constraint surface is given by ##g = c_0##. For any function ##f## it holds that the infinitesimal change ##df## of ##f## when making a displacement ##d\vec x## is given by
$$df = \vec d\vec x \cdot \nabla f.$$
Restricting ##d\vec x## to the constraint surface means that ##d\vec x \cdot \vec n = 0##, where ##\vec n \propto \nabla g## is the surface normal. Thus, in order for ##df = 0## when restricting displacements to lie in the constraint surface results in
$$df = d\vec x \cdot \nabla f = 0$$
for all ##d\vec x## in the constraint surface, i.e., the projection of ##\nabla f## to the local tangent space of the surface is zero. This is true if ##\nabla f = \lambda \nabla g## for some ##\lambda##, since ##\nabla g## is normal to the surface. Therefore
$$df = d\vec x \cdot (\nabla f - \lambda \nabla g) = 0$$
and we can adjust ##\lambda## such that this is true not only for displacements in the constraint surface, but for all displacements.

Orodruin said:
It is much easier to think geometrically about the Lagrange multiplier method. If a constraint surface is given by ##g = c_0##. For any function ##f## it holds that the infinitesimal change ##df## of ##f## when making a displacement ##d\vec x## is given by
$$df = \vec d\vec x \cdot \nabla f.$$
Restricting ##d\vec x## to the constraint surface means that ##d\vec x \cdot \vec n = 0##, where ##\vec n \propto \nabla g## is the surface normal. Thus, in order for ##df = 0## when restricting displacements to lie in the constraint surface results in
$$df = d\vec x \cdot \nabla f = 0$$
for all ##d\vec x## in the constraint surface, i.e., the projection of ##\nabla f## to the local tangent space of the surface is zero. This is true if ##\nabla f = \lambda \nabla g## for some ##\lambda##, since ##\nabla g## is normal to the surface. Therefore
$$df = d\vec x \cdot (\nabla f - \lambda \nabla g) = 0$$
and we can adjust ##\lambda## such that this is true not only for displacements in the constraint surface, but for all displacements.

The reason I want to prove it in the manner shown above is because I eventually want to use the same approach for calculus of variations when a functional that depends on coordinates has constraints on coordinates.

Where ## \lambda_j ## is a function of t instead of a mere scalar factor. And in the calculus of variation case there is no clear geometric meaning. Furthermore the derivative definitions are more complicated.

Edit the constraints are of the form ##g_j(q_i(t)) = C##

Last edited:
PhDeezNutz said:
And in the calculus of variation case there is no clear geometric meaning.
What makes you think that? The derivation is completely analogous.

Orodruin said:
What makes you think that? The derivation is completely analogous.

Maybe I misspoke but I would think to set the functional derivatives equal to each other except for a scalar multiple (I.e. the Lagrange Euler equations) which would work except for the fact lambda is a function of t. I want to do the analysis that reveals lambda is an actual function of t.

Edit: I don’t understand how derivatives can be parallel to each other but differ by a multiplicative function of t.

PhDeezNutz said:
Why are Arken and Weber setting

##\left( \frac{\partial f}{\partial x}\right)_y = 0## instead of ##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##?
I believe it's because the constraint removes one degree of freedom. You have the freedom to hold ##y## constant, but once you impose that condition, a change in ##x## generally requires a change in ##z## to remain on the surface ##g(x,y,z)=C##.

PhDeezNutz
vela said:
I believe it's because the constraint removes one degree of freedom. You have the freedom to hold ##y## constant, but once you impose that condition, a change in ##x## generally requires a change in ##z## to remain on the surface ##g(x,y,z)=C##.

Let me see if I understand

##\left( \frac{\partial f}{\partial x} \right)_y = 0## z is a function of x and y so only x and y are allowed to vary, x is varying so y is constant.

##\left(\frac{\partial f}{\partial y} \right)_x = 0## z is a function of x and y so only x and y are allowed to vary, y is varying so x is constant.

I don't see how ##\left(\frac{\partial f}{\partial z}\right)_x = 0## is consistent with these choices. And therefore I don't see how we get the third equation in 22.64.

Either way, your post definitely helped. Even If I don't totally understand it.

Last edited:
You seem to be confusing yourself because you accorded ##z## special status. The implicit function theorem doesn't single out ##z## as being a function of the other variables. You could just as easily say ##x=x(y,z)##.

PhDeezNutz
vela said:
You seem to be confusing yourself because you accorded ##z## special status. The implicit function theorem doesn't single out ##z## as being a function of the other variables. You could just as easily say ##x=x(y,z)##.

I think you're right.

With your help I think I've got a grasp on it. Would you mind giving my work a quick look through?

Lemma 1 (Implicit Function Theorem)

Let ##g : R^n \rightarrow R## and let ##L_c(g)## be a level set of ##g##, i.e.

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

it follows that

## dg = \sum \limits_{i = 1}^{n} \left(\frac{\partial g}{\partial x_i} \right) \left(x_i - a_i \right) = 0##

implies

## \left( \frac{\partial g}{\partial x_n}\right) \left( x_n - a_n \right) = - \sum \limits_{i=1}^{n-1} \left( \frac{\partial g}{\partial x_i}\right) \left( x_i - a_i \right)##

which then gives us

##x_n = a_n + \sum \limits_{i=1}^{n-1} - \frac{\left( \frac{\partial g }{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} \left( x_i - a_i \right)##

Linearizing ##x_n##

##x_n = a_n + \sum \limits_{i=1}^{n-1} \left(\frac{\partial x_n}{\partial x_i}\right) \left( x_i - a_i \right) ##

Therefore

##\left(\frac{\partial x_n}{\partial x_i}\right) = - \frac{\left( \frac{\partial g }{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} ##

Lemma 2 (We are going to need this later to show that the the Lagrange Multiplier is indeed a constant)

If ##p(x_i) - \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) s_j \left(x_i \right) = 0 \Rightarrow \lambda_j \left( x_n \right) = C_j## where ##C_j## is a constant.Proof:

##p \left( x_i \right) = \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) s_j \left(x_i \right) = \sum \limits_{j=1}^{m} \Lambda_j \left( x_i \right)##

##\Lambda_j \left( x_i \right) = \lambda_j \left( x_n \right) s_j \left(x_i \right)## implies ##\lambda_j \left( x_n \right)## is either a function of ##x_i## or a constant. It can't be a function ##x_i## so it must be a constant.

So we have

##\lambda_j \left( x_n \right) = constant##

Proof of Lagrange Multiplier Method for Real-Valued Functions:(Multiple Constraints)
Let ##f: R^n \rightarrow R## and ##g_j: R^n \rightarrow R##. Want to constrain ##f \left( \vec{x} \right)## by ##g_j \left( \vec{x} \right) = C_j##. Suppose there are ##m## such constraints.

##dg_j = 0## so by the Implicit Function Theorem ##\left( \frac{\partial x_n}{\partial x_i}\right) = - \sum\limits_{j=1}^{m} \frac{\left( \frac{\partial g_j}{\partial x_i }\right)}{\left( \frac{\partial g_j}{\partial x_n}\right)}## (This part I'm unsure about but it seems necessary to get the result we want)

If ##f## is extremized then ##\left( \frac{\partial f}{\partial x_i} \right)_{total} = 0##

## \left( \frac{\partial f}{\partial x_i} \right)_{total} = \left( \frac{\partial f}{\partial x_i} \right) + \left( \frac{\partial f}{\partial x_n}\right) \left( \frac{\partial x_n}{\partial x_i}\right) = 0##

##\left( \frac{\partial f}{\partial x_i} \right)- \sum \limits_{j=1}^{m} \frac{\left( \frac{\partial f}{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} \left(\frac{\partial g_j}{\partial x_i} \right) = \left( \frac{\partial f}{\partial x_i} \right)- \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) \left(\frac{\partial g_j}{\partial x_i} \right) = 0 ##

implies ##\lambda_j(x_n) = constant## by Lemma 2

Therefore we have a system of equations

## \nabla f - \sum\limits_{j=1}^{m} \lambda_j \nabla g_j = 0## Q.E.D.

Hopefully I didn't do anything egregiously wrong.

Can anyone confirm, deny, or motivate a proof for the following statement for the intersection of a level sets of different functions?

##g: R^n \rightarrow R##
##h: R^n \rightarrow R##
##j: R^n \rightarrow R##

and ##j(\vec{x}) = g(\vec{x}) + h(\vec{x})##

Define the following level sets

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

##L_d(h) = \left\{ \vec{x} \epsilon R^n | h(\vec{x}) = d \right\}##

Is the following a true statement? How would I prove it. Keep in mind I'm a physics student not a math student so forgive me if I'm a little slow in understanding.

##L_c(g) \cap L_d(h) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \wedge h(\vec{x}) = d \right\} = \left\{ \vec{x} \epsilon R^n | j(\vec{x}) = c + d \right\} = L_{(c+d)} (j)##

PhDeezNutz said:
Can anyone confirm, deny, or motivate a proof for the following statement for the intersection of a level sets of different functions?

##g: R^n \rightarrow R##
##h: R^n \rightarrow R##
##j: R^n \rightarrow R##

and ##j(\vec{x}) = g(\vec{x}) + h(\vec{x})##

Define the following level sets

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

##L_d(h) = \left\{ \vec{x} \epsilon R^n | h(\vec{x}) = d \right\}##

Is the following a true statement? How would I prove it. Keep in mind I'm a physics student not a math student so forgive me if I'm a little slow in understanding.

##L_c(g) \cap L_d(h) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \wedge h(\vec{x}) = d \right\} = \left\{ \vec{x} \epsilon R^n | j(\vec{x}) = c + d \right\} = L_{(c+d)} (j)##
Consider ##n=2##. What happens when ##g\left(x,y\right)=x##, ##h\left(x,y\right)=y##, ##c=2##, and ##d=3##?

1. What is the condition for a function f(x,y,z) to be extremized?

The condition for a function f(x,y,z) to be extremized is that its partial derivatives with respect to all variables (x, y, and z) must equal zero at the point of extremum.

2. How is the condition for extremization related to critical points?

The condition for extremization is closely related to critical points, as critical points are the points where the partial derivatives of a function are equal to zero. Therefore, critical points are also points of extremum for a function.

3. Can a function have multiple points of extremum?

Yes, a function can have multiple points of extremum. These points can be either maximum or minimum points, depending on the behavior of the function around these points.

4. Does the condition for extremization apply to all types of functions?

Yes, the condition for extremization applies to all types of functions, including multivariable functions, implicit functions, and vector-valued functions. It is a fundamental concept in calculus and optimization.

5. How is the condition for extremization used in practical applications?

The condition for extremization is used in practical applications to find the optimal values of variables in a given system. This can be applied in various fields such as economics, engineering, and physics to maximize profits, minimize costs, or optimize system performance.

• Calculus and Beyond Homework Help
Replies
3
Views
755
• Calculus and Beyond Homework Help
Replies
6
Views
538
• Calculus and Beyond Homework Help
Replies
3
Views
594
• Calculus and Beyond Homework Help
Replies
3
Views
1K
• Calculus and Beyond Homework Help
Replies
3
Views
230
• Calculus and Beyond Homework Help
Replies
2
Views
527
• Calculus and Beyond Homework Help
Replies
2
Views
450
• Calculus and Beyond Homework Help
Replies
18
Views
1K
• Calculus and Beyond Homework Help
Replies
8
Views
445
• Calculus and Beyond Homework Help
Replies
2
Views
645