Condition for f(x,y,z) = f(x,y,z(x,y)) being extremized

PhDeezNutz · Jun 28, 2019

As far as I know when a function is extremized its partial derivatives are all equal to 0 (provided we aren't dealing with a constraint)

##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##
Let's start off
##\left(\frac{\partial f}{\partial x} \right)_{y} = \left(\frac{\partial f}{\partial x}\right)_{yz} + \left( \frac{\partial f}{\partial z} \right)_{xy} \left( \frac{\partial z}{\partial x} \right)_{y}##

I really don't know where to go from here. I'm not even sure my understanding of implicit functions is up to par to even begin addressing this question.

fresh_42 · Jun 28, 2019

PhDeezNutz said:

... then what are the conditions involving the partial derivatives of ##f## ...

What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.

Orodruin · Jun 28, 2019

fresh_42 said:

What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.

It is not implied. It is a bastard notation often used in physics (mainly thermodynamics) when you have the choice to use any two out of three variables as your coordinates.

PhDeezNutz · Jun 28, 2019

fresh_42 said:

What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.

Orodruin said:

It is not implied. It is a bastard notation often used in physics (mainly thermodynamics) when you have the choice to use any two out of three variables as your coordinates.

I understand everything up until the line right after (22.63)

Why are Arken and Weber setting

##\left( \frac{\partial f}{\partial x}\right)_y = 0## instead of ##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##?

And presumably the other 2 equations in (22.66) come from setting

##\left( \frac{\partial f}{\partial y}\right)_z = 0##

##\left( \frac{\partial f}{\partial z}\right)_ x= 0##

after cyclically permuting. (It says the second equation comes from swapping x and y but I don't see it, also that doesn't explain the third equation)

Why these specific derivatives?

Given ##f(x,y,z(x,y))##

Hopefully I was able to clarify my question. Any help is appreciated because I am lost. So thanks in advance.

fresh_42 · Jun 28, 2019

I find the usage of ##f## a bit confusing.

We have a function ##f## to be minimized and a constraint ##g##. What we actually do is to define a new function ##\mathcal{L}=f-\lambda g## such that we can minimize ##\mathcal{L}## now which has no constraint. Thus we can as usual solve ##\nabla \mathcal{L} = \operatorname{grad}\mathcal{L}=0##.

It is better explained here (including examples):
https://en.wikipedia.org/wiki/Lagrange_multiplier(22.64) is the first coordinate of ##\nabla \mathcal{L}=0##.

PhDeezNutz · Jun 28, 2019

fresh_42 said:

I find the usage of ##f## a bit confusing.

We have a function ##f## to be minimized and a constraint ##g##. What we actually do is to define a new function ##\mathcal{L}=f-\lambda g## such that we can minimize ##\mathcal{L}## now which has no constraint. Thus we can as usual solve ##\nabla \mathcal{L} = \operatorname{grad}\mathcal{L}=0##.

It is better explained here (including examples):
https://en.wikipedia.org/wiki/Lagrange_multiplier(22.64) is the first coordinate of ##\nabla \mathcal{L}=0##.

I know how to use the Lagrangian multiplier method. I'm trying to prove it.

Why is the first coordinate ##\left(\frac{\partial f}{\partial x}\right)_y##? Why is the second coordinate ##\left(\frac{\partial f}{\partial y}\right)_z##? Why is the third coordinate ##\left(\frac{\partial f}{\partial z}\right)_x##? I understand the chain rule helps us confirm these expressions.

I want to know why we set them = 0? I know that once we do we get the system of equations ##\nabla f = \lambda \nabla g##.

Orodruin · Jun 28, 2019

It is much easier to think geometrically about the Lagrange multiplier method. If a constraint surface is given by ##g = c_0##. For any function ##f## it holds that the infinitesimal change ##df## of ##f## when making a displacement ##d\vec x## is given by
$$
df = \vec d\vec x \cdot \nabla f.
$$
Restricting ##d\vec x## to the constraint surface means that ##d\vec x \cdot \vec n = 0##, where ##\vec n \propto \nabla g## is the surface normal. Thus, in order for ##df = 0## when restricting displacements to lie in the constraint surface results in
$$
df = d\vec x \cdot \nabla f = 0
$$
for all ##d\vec x## in the constraint surface, i.e., the projection of ##\nabla f## to the local tangent space of the surface is zero. This is true if ##\nabla f = \lambda \nabla g## for some ##\lambda##, since ##\nabla g## is normal to the surface. Therefore
$$
df = d\vec x \cdot (\nabla f - \lambda \nabla g) = 0
$$
and we can adjust ##\lambda## such that this is true not only for displacements in the constraint surface, but for all displacements.

PhDeezNutz · Jun 28, 2019

Orodruin said:

It is much easier to think geometrically about the Lagrange multiplier method. If a constraint surface is given by ##g = c_0##. For any function ##f## it holds that the infinitesimal change ##df## of ##f## when making a displacement ##d\vec x## is given by
$$
df = \vec d\vec x \cdot \nabla f.
$$
Restricting ##d\vec x## to the constraint surface means that ##d\vec x \cdot \vec n = 0##, where ##\vec n \propto \nabla g## is the surface normal. Thus, in order for ##df = 0## when restricting displacements to lie in the constraint surface results in
$$
df = d\vec x \cdot \nabla f = 0
$$
for all ##d\vec x## in the constraint surface, i.e., the projection of ##\nabla f## to the local tangent space of the surface is zero. This is true if ##\nabla f = \lambda \nabla g## for some ##\lambda##, since ##\nabla g## is normal to the surface. Therefore
$$
df = d\vec x \cdot (\nabla f - \lambda \nabla g) = 0
$$
and we can adjust ##\lambda## such that this is true not only for displacements in the constraint surface, but for all displacements.

The reason I want to prove it in the manner shown above is because I eventually want to use the same approach for calculus of variations when a functional that depends on coordinates has constraints on coordinates.

Where ## \lambda_j ## is a function of t instead of a mere scalar factor. And in the calculus of variation case there is no clear geometric meaning. Furthermore the derivative definitions are more complicated.

Edit the constraints are of the form ##g_j(q_i(t)) = C##

Orodruin · Jun 28, 2019

PhDeezNutz said:

And in the calculus of variation case there is no clear geometric meaning.

What makes you think that? The derivation is completely analogous.

PhDeezNutz · Jun 28, 2019

Orodruin said:

What makes you think that? The derivation is completely analogous.

Maybe I misspoke but I would think to set the functional derivatives equal to each other except for a scalar multiple (I.e. the Lagrange Euler equations) which would work except for the fact lambda is a function of t. I want to do the analysis that reveals lambda is an actual function of t.

Edit: I don’t understand how derivatives can be parallel to each other but differ by a multiplicative function of t.

vela · Jun 28, 2019

PhDeezNutz said:

Why are Arken and Weber setting

##\left( \frac{\partial f}{\partial x}\right)_y = 0## instead of ##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##?

I believe it's because the constraint removes one degree of freedom. You have the freedom to hold ##y## constant, but once you impose that condition, a change in ##x## generally requires a change in ##z## to remain on the surface ##g(x,y,z)=C##.

PhDeezNutz · Jun 28, 2019

vela said:

I believe it's because the constraint removes one degree of freedom. You have the freedom to hold ##y## constant, but once you impose that condition, a change in ##x## generally requires a change in ##z## to remain on the surface ##g(x,y,z)=C##.

Let me see if I understand

##\left( \frac{\partial f}{\partial x} \right)_y = 0## z is a function of x and y so only x and y are allowed to vary, x is varying so y is constant.

##\left(\frac{\partial f}{\partial y} \right)_x = 0## z is a function of x and y so only x and y are allowed to vary, y is varying so x is constant.

I don't see how ##\left(\frac{\partial f}{\partial z}\right)_x = 0## is consistent with these choices. And therefore I don't see how we get the third equation in 22.64.

Either way, your post definitely helped. Even If I don't totally understand it.

vela · Jun 29, 2019

You seem to be confusing yourself because you accorded ##z## special status. The implicit function theorem doesn't single out ##z## as being a function of the other variables. You could just as easily say ##x=x(y,z)##.

PhDeezNutz · Jul 1, 2019

vela said:

You seem to be confusing yourself because you accorded ##z## special status. The implicit function theorem doesn't single out ##z## as being a function of the other variables. You could just as easily say ##x=x(y,z)##.

I think you're right.

With your help I think I've got a grasp on it. Would you mind giving my work a quick look through?

Lemma 1 (Implicit Function Theorem)

Let ##g : R^n \rightarrow R## and let ##L_c(g)## be a level set of ##g##, i.e.

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

it follows that

## dg = \sum \limits_{i = 1}^{n} \left(\frac{\partial g}{\partial x_i} \right) \left(x_i - a_i \right) = 0##

implies

## \left( \frac{\partial g}{\partial x_n}\right) \left( x_n - a_n \right) = - \sum \limits_{i=1}^{n-1} \left( \frac{\partial g}{\partial x_i}\right) \left( x_i - a_i \right)##

which then gives us

##x_n = a_n + \sum \limits_{i=1}^{n-1} - \frac{\left( \frac{\partial g }{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} \left( x_i - a_i \right)##

Linearizing ##x_n##

##x_n = a_n + \sum \limits_{i=1}^{n-1} \left(\frac{\partial x_n}{\partial x_i}\right) \left( x_i - a_i \right) ##

Therefore

##\left(\frac{\partial x_n}{\partial x_i}\right) = - \frac{\left( \frac{\partial g }{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} ##

Lemma 2 (We are going to need this later to show that the the Lagrange Multiplier is indeed a constant)

If ##p(x_i) - \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) s_j \left(x_i \right) = 0 \Rightarrow \lambda_j \left( x_n \right) = C_j## where ##C_j## is a constant.Proof:

##p \left( x_i \right) = \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) s_j \left(x_i \right) = \sum \limits_{j=1}^{m} \Lambda_j \left( x_i \right)##

##\Lambda_j \left( x_i \right) = \lambda_j \left( x_n \right) s_j \left(x_i \right)## implies ##\lambda_j \left( x_n \right)## is either a function of ##x_i## or a constant. It can't be a function ##x_i## so it must be a constant.

So we have

##\lambda_j \left( x_n \right) = constant##

Proof of Lagrange Multiplier Method for Real-Valued Functions:(Multiple Constraints)
Let ##f: R^n \rightarrow R## and ##g_j: R^n \rightarrow R##. Want to constrain ##f \left( \vec{x} \right)## by ##g_j \left( \vec{x} \right) = C_j##. Suppose there are ##m## such constraints.

##dg_j = 0## so by the Implicit Function Theorem ##\left( \frac{\partial x_n}{\partial x_i}\right) = - \sum\limits_{j=1}^{m} \frac{\left( \frac{\partial g_j}{\partial x_i }\right)}{\left( \frac{\partial g_j}{\partial x_n}\right)}## (This part I'm unsure about but it seems necessary to get the result we want)

If ##f## is extremized then ##\left( \frac{\partial f}{\partial x_i} \right)_{total} = 0##

## \left( \frac{\partial f}{\partial x_i} \right)_{total} = \left( \frac{\partial f}{\partial x_i} \right) + \left( \frac{\partial f}{\partial x_n}\right) \left( \frac{\partial x_n}{\partial x_i}\right) = 0##

##\left( \frac{\partial f}{\partial x_i} \right)- \sum \limits_{j=1}^{m} \frac{\left( \frac{\partial f}{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} \left(\frac{\partial g_j}{\partial x_i} \right) = \left( \frac{\partial f}{\partial x_i} \right)- \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) \left(\frac{\partial g_j}{\partial x_i} \right) = 0 ##

implies ##\lambda_j(x_n) = constant## by Lemma 2

Therefore we have a system of equations

## \nabla f - \sum\limits_{j=1}^{m} \lambda_j \nabla g_j = 0## Q.E.D.

Hopefully I didn't do anything egregiously wrong.

PhDeezNutz · Jul 9, 2019

Can anyone confirm, deny, or motivate a proof for the following statement for the intersection of a level sets of different functions?

##g: R^n \rightarrow R##
##h: R^n \rightarrow R##
##j: R^n \rightarrow R##

and ##j(\vec{x}) = g(\vec{x}) + h(\vec{x})##

Define the following level sets

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

##L_d(h) = \left\{ \vec{x} \epsilon R^n | h(\vec{x}) = d \right\}##

Is the following a true statement? How would I prove it. Keep in mind I'm a physics student not a math student so forgive me if I'm a little slow in understanding.

##L_c(g) \cap L_d(h) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \wedge h(\vec{x}) = d \right\} = \left\{ \vec{x} \epsilon R^n | j(\vec{x}) = c + d \right\} = L_{(c+d)} (j)##

George Jones · Jul 9, 2019

PhDeezNutz said:

Can anyone confirm, deny, or motivate a proof for the following statement for the intersection of a level sets of different functions?

##g: R^n \rightarrow R##
##h: R^n \rightarrow R##
##j: R^n \rightarrow R##

and ##j(\vec{x}) = g(\vec{x}) + h(\vec{x})##

Define the following level sets

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

##L_d(h) = \left\{ \vec{x} \epsilon R^n | h(\vec{x}) = d \right\}##

Is the following a true statement? How would I prove it. Keep in mind I'm a physics student not a math student so forgive me if I'm a little slow in understanding.

##L_c(g) \cap L_d(h) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \wedge h(\vec{x}) = d \right\} = \left\{ \vec{x} \epsilon R^n | j(\vec{x}) = c + d \right\} = L_{(c+d)} (j)##

Consider ##n=2##. What happens when ##g\left(x,y\right)=x##, ##h\left(x,y\right)=y##, ##c=2##, and ##d=3##?

Condition for f(x,y,z) = f(x,y,z(x,y)) being extremized

Similar threads

Distance between a Clock's hands when the distance is increasing most rapidly

Volume with spherical coordinates

Does this series converge uniformly?

Use greedy vertex coloring algorithm to prove the upper bound of χ

Conflicting definitions of linear independence

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers