# Condition for f(x,y,z) = f(x,y,z(x,y)) being extremized

PhDeezNutz
Homework Statement:
Given ##f(x,y,z) = f(x,y,z(x,y))## then what are the conditions involving the partial derivatives of $f$
Relevant Equations:
Apparently they are

##\left(\frac{\partial f}{\partial x} \right)_y =\left(\frac{\partial f}{\partial y} \right)_x = \left(\frac{\partial f}{\partial z} \right)_z = 0##

where the subscript denotes the independent variable being held constant
As far as I know when a function is extremized its partial derivatives are all equal to 0 (provided we aren't dealing with a constraint)

##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##
Let's start off
##\left(\frac{\partial f}{\partial x} \right)_{y} = \left(\frac{\partial f}{\partial x}\right)_{yz} + \left( \frac{\partial f}{\partial z} \right)_{xy} \left( \frac{\partial z}{\partial x} \right)_{y}##

I really don't know where to go from here. I'm not even sure my understanding of implicit functions is up to par to even begin addressing this question.

Mentor
2022 Award
... then what are the conditions involving the partial derivatives of ##f## ...
What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.

Staff Emeritus
Homework Helper
Gold Member
What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.
It is not implied. It is a bastard notation often used in physics (mainly thermodynamics) when you have the choice to use any two out of three variables as your coordinates.

PhDeezNutz
What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.
It is not implied. It is a bastard notation often used in physics (mainly thermodynamics) when you have the choice to use any two out of three variables as your coordinates.

I understand everything up until the line right after (22.63)

Why are Arken and Weber setting

##\left( \frac{\partial f}{\partial x}\right)_y = 0## instead of ##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##?

And presumably the other 2 equations in (22.66) come from setting

##\left( \frac{\partial f}{\partial y}\right)_z = 0##

##\left( \frac{\partial f}{\partial z}\right)_ x= 0##

after cyclically permuting. (It says the second equation comes from swapping x and y but I don't see it, also that doesn't explain the third equation)

Why these specific derivatives?

Given ##f(x,y,z(x,y))##

Hopefully I was able to clarify my question. Any help is appreciated because I am lost. So thanks in advance.

Mentor
2022 Award
I find the usage of ##f## a bit confusing.

We have a function ##f## to be minimized and a constraint ##g##. What we actually do is to define a new function ##\mathcal{L}=f-\lambda g## such that we can minimize ##\mathcal{L}## now which has no constraint. Thus we can as usual solve ##\nabla \mathcal{L} = \operatorname{grad}\mathcal{L}=0##.

It is better explained here (including examples):
https://en.wikipedia.org/wiki/Lagrange_multiplier(22.64) is the first coordinate of ##\nabla \mathcal{L}=0##.

PhDeezNutz
I find the usage of ##f## a bit confusing.

We have a function ##f## to be minimized and a constraint ##g##. What we actually do is to define a new function ##\mathcal{L}=f-\lambda g## such that we can minimize ##\mathcal{L}## now which has no constraint. Thus we can as usual solve ##\nabla \mathcal{L} = \operatorname{grad}\mathcal{L}=0##.

It is better explained here (including examples):
https://en.wikipedia.org/wiki/Lagrange_multiplier(22.64) is the first coordinate of ##\nabla \mathcal{L}=0##.

I know how to use the Lagrangian multiplier method. I'm trying to prove it.

Why is the first coordinate ##\left(\frac{\partial f}{\partial x}\right)_y##? Why is the second coordinate ##\left(\frac{\partial f}{\partial y}\right)_z##? Why is the third coordinate ##\left(\frac{\partial f}{\partial z}\right)_x##? I understand the chain rule helps us confirm these expressions.

I want to know why we set them = 0? I know that once we do we get the system of equations ##\nabla f = \lambda \nabla g##.

Staff Emeritus
Homework Helper
Gold Member
It is much easier to think geometrically about the Lagrange multiplier method. If a constraint surface is given by ##g = c_0##. For any function ##f## it holds that the infinitesimal change ##df## of ##f## when making a displacement ##d\vec x## is given by
$$df = \vec d\vec x \cdot \nabla f.$$
Restricting ##d\vec x## to the constraint surface means that ##d\vec x \cdot \vec n = 0##, where ##\vec n \propto \nabla g## is the surface normal. Thus, in order for ##df = 0## when restricting displacements to lie in the constraint surface results in
$$df = d\vec x \cdot \nabla f = 0$$
for all ##d\vec x## in the constraint surface, i.e., the projection of ##\nabla f## to the local tangent space of the surface is zero. This is true if ##\nabla f = \lambda \nabla g## for some ##\lambda##, since ##\nabla g## is normal to the surface. Therefore
$$df = d\vec x \cdot (\nabla f - \lambda \nabla g) = 0$$
and we can adjust ##\lambda## such that this is true not only for displacements in the constraint surface, but for all displacements.

PhDeezNutz
It is much easier to think geometrically about the Lagrange multiplier method. If a constraint surface is given by ##g = c_0##. For any function ##f## it holds that the infinitesimal change ##df## of ##f## when making a displacement ##d\vec x## is given by
$$df = \vec d\vec x \cdot \nabla f.$$
Restricting ##d\vec x## to the constraint surface means that ##d\vec x \cdot \vec n = 0##, where ##\vec n \propto \nabla g## is the surface normal. Thus, in order for ##df = 0## when restricting displacements to lie in the constraint surface results in
$$df = d\vec x \cdot \nabla f = 0$$
for all ##d\vec x## in the constraint surface, i.e., the projection of ##\nabla f## to the local tangent space of the surface is zero. This is true if ##\nabla f = \lambda \nabla g## for some ##\lambda##, since ##\nabla g## is normal to the surface. Therefore
$$df = d\vec x \cdot (\nabla f - \lambda \nabla g) = 0$$
and we can adjust ##\lambda## such that this is true not only for displacements in the constraint surface, but for all displacements.

The reason I want to prove it in the manner shown above is because I eventually want to use the same approach for calculus of variations when a functional that depends on coordinates has constraints on coordinates.

Where ## \lambda_j ## is a function of t instead of a mere scalar factor. And in the calculus of variation case there is no clear geometric meaning. Furthermore the derivative definitions are more complicated.

Edit the constraints are of the form ##g_j(q_i(t)) = C##

Last edited:
Staff Emeritus
Homework Helper
Gold Member
And in the calculus of variation case there is no clear geometric meaning.
What makes you think that? The derivation is completely analogous.

PhDeezNutz
What makes you think that? The derivation is completely analogous.

Maybe I misspoke but I would think to set the functional derivatives equal to each other except for a scalar multiple (I.e. the Lagrange Euler equations) which would work except for the fact lambda is a function of t. I want to do the analysis that reveals lambda is an actual function of t.

Edit: I don’t understand how derivatives can be parallel to each other but differ by a multiplicative function of t.

Staff Emeritus
Homework Helper
Why are Arken and Weber setting

##\left( \frac{\partial f}{\partial x}\right)_y = 0## instead of ##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##?
I believe it's because the constraint removes one degree of freedom. You have the freedom to hold ##y## constant, but once you impose that condition, a change in ##x## generally requires a change in ##z## to remain on the surface ##g(x,y,z)=C##.

PhDeezNutz
PhDeezNutz
I believe it's because the constraint removes one degree of freedom. You have the freedom to hold ##y## constant, but once you impose that condition, a change in ##x## generally requires a change in ##z## to remain on the surface ##g(x,y,z)=C##.

Let me see if I understand

##\left( \frac{\partial f}{\partial x} \right)_y = 0## z is a function of x and y so only x and y are allowed to vary, x is varying so y is constant.

##\left(\frac{\partial f}{\partial y} \right)_x = 0## z is a function of x and y so only x and y are allowed to vary, y is varying so x is constant.

I don't see how ##\left(\frac{\partial f}{\partial z}\right)_x = 0## is consistent with these choices. And therefore I don't see how we get the third equation in 22.64.

Either way, your post definitely helped. Even If I don't totally understand it.

Last edited:
Staff Emeritus
Homework Helper
You seem to be confusing yourself because you accorded ##z## special status. The implicit function theorem doesn't single out ##z## as being a function of the other variables. You could just as easily say ##x=x(y,z)##.

PhDeezNutz
PhDeezNutz
You seem to be confusing yourself because you accorded ##z## special status. The implicit function theorem doesn't single out ##z## as being a function of the other variables. You could just as easily say ##x=x(y,z)##.

I think you're right.

With your help I think I've got a grasp on it. Would you mind giving my work a quick look through?

Lemma 1 (Implicit Function Theorem)

Let ##g : R^n \rightarrow R## and let ##L_c(g)## be a level set of ##g##, i.e.

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

it follows that

## dg = \sum \limits_{i = 1}^{n} \left(\frac{\partial g}{\partial x_i} \right) \left(x_i - a_i \right) = 0##

implies

## \left( \frac{\partial g}{\partial x_n}\right) \left( x_n - a_n \right) = - \sum \limits_{i=1}^{n-1} \left( \frac{\partial g}{\partial x_i}\right) \left( x_i - a_i \right)##

which then gives us

##x_n = a_n + \sum \limits_{i=1}^{n-1} - \frac{\left( \frac{\partial g }{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} \left( x_i - a_i \right)##

Linearizing ##x_n##

##x_n = a_n + \sum \limits_{i=1}^{n-1} \left(\frac{\partial x_n}{\partial x_i}\right) \left( x_i - a_i \right) ##

Therefore

##\left(\frac{\partial x_n}{\partial x_i}\right) = - \frac{\left( \frac{\partial g }{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} ##

Lemma 2 (We are going to need this later to show that the the Lagrange Multiplier is indeed a constant)

If ##p(x_i) - \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) s_j \left(x_i \right) = 0 \Rightarrow \lambda_j \left( x_n \right) = C_j## where ##C_j## is a constant.

Proof:

##p \left( x_i \right) = \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) s_j \left(x_i \right) = \sum \limits_{j=1}^{m} \Lambda_j \left( x_i \right)##

##\Lambda_j \left( x_i \right) = \lambda_j \left( x_n \right) s_j \left(x_i \right)## implies ##\lambda_j \left( x_n \right)## is either a function of ##x_i## or a constant. It can't be a function ##x_i## so it must be a constant.

So we have

##\lambda_j \left( x_n \right) = constant##

Proof of Lagrange Multiplier Method for Real-Valued Functions:(Multiple Constraints)
Let ##f: R^n \rightarrow R## and ##g_j: R^n \rightarrow R##. Want to constrain ##f \left( \vec{x} \right)## by ##g_j \left( \vec{x} \right) = C_j##. Suppose there are ##m## such constraints.

##dg_j = 0## so by the Implicit Function Theorem ##\left( \frac{\partial x_n}{\partial x_i}\right) = - \sum\limits_{j=1}^{m} \frac{\left( \frac{\partial g_j}{\partial x_i }\right)}{\left( \frac{\partial g_j}{\partial x_n}\right)}## (This part I'm unsure about but it seems necessary to get the result we want)

If ##f## is extremized then ##\left( \frac{\partial f}{\partial x_i} \right)_{total} = 0##

## \left( \frac{\partial f}{\partial x_i} \right)_{total} = \left( \frac{\partial f}{\partial x_i} \right) + \left( \frac{\partial f}{\partial x_n}\right) \left( \frac{\partial x_n}{\partial x_i}\right) = 0##

##\left( \frac{\partial f}{\partial x_i} \right)- \sum \limits_{j=1}^{m} \frac{\left( \frac{\partial f}{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} \left(\frac{\partial g_j}{\partial x_i} \right) = \left( \frac{\partial f}{\partial x_i} \right)- \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) \left(\frac{\partial g_j}{\partial x_i} \right) = 0 ##

implies ##\lambda_j(x_n) = constant## by Lemma 2

Therefore we have a system of equations

## \nabla f - \sum\limits_{j=1}^{m} \lambda_j \nabla g_j = 0## Q.E.D.

Hopefully I didn't do anything egregiously wrong.

PhDeezNutz
Can anyone confirm, deny, or motivate a proof for the following statement for the intersection of a level sets of different functions?

##g: R^n \rightarrow R##
##h: R^n \rightarrow R##
##j: R^n \rightarrow R##

and ##j(\vec{x}) = g(\vec{x}) + h(\vec{x})##

Define the following level sets

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

##L_d(h) = \left\{ \vec{x} \epsilon R^n | h(\vec{x}) = d \right\}##

Is the following a true statement? How would I prove it. Keep in mind I'm a physics student not a math student so forgive me if I'm a little slow in understanding.

##L_c(g) \cap L_d(h) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \wedge h(\vec{x}) = d \right\} = \left\{ \vec{x} \epsilon R^n | j(\vec{x}) = c + d \right\} = L_{(c+d)} (j)##

Staff Emeritus
Gold Member
Can anyone confirm, deny, or motivate a proof for the following statement for the intersection of a level sets of different functions?

##g: R^n \rightarrow R##
##h: R^n \rightarrow R##
##j: R^n \rightarrow R##

and ##j(\vec{x}) = g(\vec{x}) + h(\vec{x})##

Define the following level sets

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

##L_d(h) = \left\{ \vec{x} \epsilon R^n | h(\vec{x}) = d \right\}##

Is the following a true statement? How would I prove it. Keep in mind I'm a physics student not a math student so forgive me if I'm a little slow in understanding.

##L_c(g) \cap L_d(h) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \wedge h(\vec{x}) = d \right\} = \left\{ \vec{x} \epsilon R^n | j(\vec{x}) = c + d \right\} = L_{(c+d)} (j)##

Consider ##n=2##. What happens when ##g\left(x,y\right)=x##, ##h\left(x,y\right)=y##, ##c=2##, and ##d=3##?