Condition for f(x,y,z) = f(x,y,z(x,y)) being extremized

PhDeezNutz · Jun 28, 2019

As far as I know when a function is extremized its partial derivatives are all equal to 0 (provided we aren't dealing with a constraint)

##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##
Let's start off
##\left(\frac{\partial f}{\partial x} \right)_{y} = \left(\frac{\partial f}{\partial x}\right)_{yz} + \left( \frac{\partial f}{\partial z} \right)_{xy} \left( \frac{\partial z}{\partial x} \right)_{y}##

I really don't know where to go from here. I'm not even sure my understanding of implicit functions is up to par to even begin addressing this question.

fresh_42 · Jun 28, 2019

PhDeezNutz said:

... then what are the conditions involving the partial derivatives of ##f## ...

What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.

Orodruin · Jun 28, 2019

fresh_42 said:

What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.

It is not implied. It is a bastard notation often used in physics (mainly thermodynamics) when you have the choice to use any two out of three variables as your coordinates.

PhDeezNutz · Jun 28, 2019

fresh_42 said:

What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.

Orodruin said:

It is not implied. It is a bastard notation often used in physics (mainly thermodynamics) when you have the choice to use any two out of three variables as your coordinates.

I understand everything up until the line right after (22.63)

Why are Arken and Weber setting

##\left( \frac{\partial f}{\partial x}\right)_y = 0## instead of ##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##?

And presumably the other 2 equations in (22.66) come from setting

##\left( \frac{\partial f}{\partial y}\right)_z = 0##

##\left( \frac{\partial f}{\partial z}\right)_ x= 0##

after cyclically permuting. (It says the second equation comes from swapping x and y but I don't see it, also that doesn't explain the third equation)

Why these specific derivatives?

Given ##f(x,y,z(x,y))##

Hopefully I was able to clarify my question. Any help is appreciated because I am lost. So thanks in advance.

fresh_42 · Jun 28, 2019

I find the usage of ##f## a bit confusing.

We have a function ##f## to be minimized and a constraint ##g##. What we actually do is to define a new function ##\mathcal{L}=f-\lambda g## such that we can minimize ##\mathcal{L}## now which has no constraint. Thus we can as usual solve ##\nabla \mathcal{L} = \operatorname{grad}\mathcal{L}=0##.

It is better explained here (including examples):
https://en.wikipedia.org/wiki/Lagrange_multiplier(22.64) is the first coordinate of ##\nabla \mathcal{L}=0##.

PhDeezNutz · Jun 28, 2019

fresh_42 said:

I find the usage of ##f## a bit confusing.

We have a function ##f## to be minimized and a constraint ##g##. What we actually do is to define a new function ##\mathcal{L}=f-\lambda g## such that we can minimize ##\mathcal{L}## now which has no constraint. Thus we can as usual solve ##\nabla \mathcal{L} = \operatorname{grad}\mathcal{L}=0##.

It is better explained here (including examples):
https://en.wikipedia.org/wiki/Lagrange_multiplier(22.64) is the first coordinate of ##\nabla \mathcal{L}=0##.

I know how to use the Lagrangian multiplier method. I'm trying to prove it.

Why is the first coordinate ##\left(\frac{\partial f}{\partial x}\right)_y##? Why is the second coordinate ##\left(\frac{\partial f}{\partial y}\right)_z##? Why is the third coordinate ##\left(\frac{\partial f}{\partial z}\right)_x##? I understand the chain rule helps us confirm these expressions.

I want to know why we set them = 0? I know that once we do we get the system of equations ##\nabla f = \lambda \nabla g##.

Orodruin · Jun 28, 2019

It is much easier to think geometrically about the Lagrange multiplier method. If a constraint surface is given by ##g = c_0##. For any function ##f## it holds that the infinitesimal change ##df## of ##f## when making a displacement ##d\vec x## is given by
$$
df = \vec d\vec x \cdot \nabla f.
$$
Restricting ##d\vec x## to the constraint surface means that ##d\vec x \cdot \vec n = 0##, where ##\vec n \propto \nabla g## is the surface normal. Thus, in order for ##df = 0## when restricting displacements to lie in the constraint surface results in
$$
df = d\vec x \cdot \nabla f = 0
$$
for all ##d\vec x## in the constraint surface, i.e., the projection of ##\nabla f## to the local tangent space of the surface is zero. This is true if ##\nabla f = \lambda \nabla g## for some ##\lambda##, since ##\nabla g## is normal to the surface. Therefore
$$
df = d\vec x \cdot (\nabla f - \lambda \nabla g) = 0
$$
and we can adjust ##\lambda## such that this is true not only for displacements in the constraint surface, but for all displacements.

PhDeezNutz · Jun 28, 2019

Orodruin said:

It is much easier to think geometrically about the Lagrange multiplier method. If a constraint surface is given by ##g = c_0##. For any function ##f## it holds that the infinitesimal change ##df## of ##f## when making a displacement ##d\vec x## is given by
$$
df = \vec d\vec x \cdot \nabla f.
$$
Restricting ##d\vec x## to the constraint surface means that ##d\vec x \cdot \vec n = 0##, where ##\vec n \propto \nabla g## is the surface normal. Thus, in order for ##df = 0## when restricting displacements to lie in the constraint surface results in
$$
df = d\vec x \cdot \nabla f = 0
$$
for all ##d\vec x## in the constraint surface, i.e., the projection of ##\nabla f## to the local tangent space of the surface is zero. This is true if ##\nabla f = \lambda \nabla g## for some ##\lambda##, since ##\nabla g## is normal to the surface. Therefore
$$
df = d\vec x \cdot (\nabla f - \lambda \nabla g) = 0
$$
and we can adjust ##\lambda## such that this is true not only for displacements in the constraint surface, but for all displacements.

The reason I want to prove it in the manner shown above is because I eventually want to use the same approach for calculus of variations when a functional that depends on coordinates has constraints on coordinates.

Where ## \lambda_j ## is a function of t instead of a mere scalar factor. And in the calculus of variation case there is no clear geometric meaning. Furthermore the derivative definitions are more complicated.

Edit the constraints are of the form ##g_j(q_i(t)) = C##

Orodruin · Jun 28, 2019

PhDeezNutz said:

And in the calculus of variation case there is no clear geometric meaning.

What makes you think that? The derivation is completely analogous.

PhDeezNutz · Jun 28, 2019

Orodruin said:

What makes you think that? The derivation is completely analogous.

Maybe I misspoke but I would think to set the functional derivatives equal to each other except for a scalar multiple (I.e. the Lagrange Euler equations) which would work except for the fact lambda is a function of t. I want to do the analysis that reveals lambda is an actual function of t.

Edit: I don’t understand how derivatives can be parallel to each other but differ by a multiplicative function of t.

vela · Jun 28, 2019

PhDeezNutz said:

Why are Arken and Weber setting

##\left( \frac{\partial f}{\partial x}\right)_y = 0## instead of ##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##?

I believe it's because the constraint removes one degree of freedom. You have the freedom to hold ##y## constant, but once you impose that condition, a change in ##x## generally requires a change in ##z## to remain on the surface ##g(x,y,z)=C##.

PhDeezNutz · Jun 28, 2019

vela said:

I believe it's because the constraint removes one degree of freedom. You have the freedom to hold ##y## constant, but once you impose that condition, a change in ##x## generally requires a change in ##z## to remain on the surface ##g(x,y,z)=C##.

Let me see if I understand

##\left( \frac{\partial f}{\partial x} \right)_y = 0## z is a function of x and y so only x and y are allowed to vary, x is varying so y is constant.

##\left(\frac{\partial f}{\partial y} \right)_x = 0## z is a function of x and y so only x and y are allowed to vary, y is varying so x is constant.

I don't see how ##\left(\frac{\partial f}{\partial z}\right)_x = 0## is consistent with these choices. And therefore I don't see how we get the third equation in 22.64.

Either way, your post definitely helped. Even If I don't totally understand it.

vela · Jun 29, 2019

You seem to be confusing yourself because you accorded ##z## special status. The implicit function theorem doesn't single out ##z## as being a function of the other variables. You could just as easily say ##x=x(y,z)##.

PhDeezNutz · Jul 1, 2019

vela said:

You seem to be confusing yourself because you accorded ##z## special status. The implicit function theorem doesn't single out ##z## as being a function of the other variables. You could just as easily say ##x=x(y,z)##.

I think you're right.

With your help I think I've got a grasp on it. Would you mind giving my work a quick look through?

Lemma 1 (Implicit Function Theorem)

Let ##g : R^n \rightarrow R## and let ##L_c(g)## be a level set of ##g##, i.e.

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

it follows that

## dg = \sum \limits_{i = 1}^{n} \left(\frac{\partial g}{\partial x_i} \right) \left(x_i - a_i \right) = 0##

implies

## \left( \frac{\partial g}{\partial x_n}\right) \left( x_n - a_n \right) = - \sum \limits_{i=1}^{n-1} \left( \frac{\partial g}{\partial x_i}\right) \left( x_i - a_i \right)##

which then gives us

##x_n = a_n + \sum \limits_{i=1}^{n-1} - \frac{\left( \frac{\partial g }{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} \left( x_i - a_i \right)##

Linearizing ##x_n##

##x_n = a_n + \sum \limits_{i=1}^{n-1} \left(\frac{\partial x_n}{\partial x_i}\right) \left( x_i - a_i \right) ##

Therefore

##\left(\frac{\partial x_n}{\partial x_i}\right) = - \frac{\left( \frac{\partial g }{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} ##

Lemma 2 (We are going to need this later to show that the the Lagrange Multiplier is indeed a constant)

If ##p(x_i) - \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) s_j \left(x_i \right) = 0 \Rightarrow \lambda_j \left( x_n \right) = C_j## where ##C_j## is a constant.Proof:

##p \left( x_i \right) = \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) s_j \left(x_i \right) = \sum \limits_{j=1}^{m} \Lambda_j \left( x_i \right)##

##\Lambda_j \left( x_i \right) = \lambda_j \left( x_n \right) s_j \left(x_i \right)## implies ##\lambda_j \left( x_n \right)## is either a function of ##x_i## or a constant. It can't be a function ##x_i## so it must be a constant.

So we have

##\lambda_j \left( x_n \right) = constant##

Proof of Lagrange Multiplier Method for Real-Valued Functions:(Multiple Constraints)
Let ##f: R^n \rightarrow R## and ##g_j: R^n \rightarrow R##. Want to constrain ##f \left( \vec{x} \right)## by ##g_j \left( \vec{x} \right) = C_j##. Suppose there are ##m## such constraints.

##dg_j = 0## so by the Implicit Function Theorem ##\left( \frac{\partial x_n}{\partial x_i}\right) = - \sum\limits_{j=1}^{m} \frac{\left( \frac{\partial g_j}{\partial x_i }\right)}{\left( \frac{\partial g_j}{\partial x_n}\right)}## (This part I'm unsure about but it seems necessary to get the result we want)

If ##f## is extremized then ##\left( \frac{\partial f}{\partial x_i} \right)_{total} = 0##

## \left( \frac{\partial f}{\partial x_i} \right)_{total} = \left( \frac{\partial f}{\partial x_i} \right) + \left( \frac{\partial f}{\partial x_n}\right) \left( \frac{\partial x_n}{\partial x_i}\right) = 0##

##\left( \frac{\partial f}{\partial x_i} \right)- \sum \limits_{j=1}^{m} \frac{\left( \frac{\partial f}{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} \left(\frac{\partial g_j}{\partial x_i} \right) = \left( \frac{\partial f}{\partial x_i} \right)- \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) \left(\frac{\partial g_j}{\partial x_i} \right) = 0 ##

implies ##\lambda_j(x_n) = constant## by Lemma 2

Therefore we have a system of equations

## \nabla f - \sum\limits_{j=1}^{m} \lambda_j \nabla g_j = 0## Q.E.D.

Hopefully I didn't do anything egregiously wrong.

PhDeezNutz · Jul 9, 2019

Can anyone confirm, deny, or motivate a proof for the following statement for the intersection of a level sets of different functions?

##g: R^n \rightarrow R##
##h: R^n \rightarrow R##
##j: R^n \rightarrow R##

and ##j(\vec{x}) = g(\vec{x}) + h(\vec{x})##

Define the following level sets

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

##L_d(h) = \left\{ \vec{x} \epsilon R^n | h(\vec{x}) = d \right\}##

Is the following a true statement? How would I prove it. Keep in mind I'm a physics student not a math student so forgive me if I'm a little slow in understanding.

##L_c(g) \cap L_d(h) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \wedge h(\vec{x}) = d \right\} = \left\{ \vec{x} \epsilon R^n | j(\vec{x}) = c + d \right\} = L_{(c+d)} (j)##

George Jones · Jul 9, 2019

PhDeezNutz said:

Can anyone confirm, deny, or motivate a proof for the following statement for the intersection of a level sets of different functions?

##g: R^n \rightarrow R##
##h: R^n \rightarrow R##
##j: R^n \rightarrow R##

and ##j(\vec{x}) = g(\vec{x}) + h(\vec{x})##

Define the following level sets

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

##L_d(h) = \left\{ \vec{x} \epsilon R^n | h(\vec{x}) = d \right\}##

Is the following a true statement? How would I prove it. Keep in mind I'm a physics student not a math student so forgive me if I'm a little slow in understanding.

##L_c(g) \cap L_d(h) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \wedge h(\vec{x}) = d \right\} = \left\{ \vec{x} \epsilon R^n | j(\vec{x}) = c + d \right\} = L_{(c+d)} (j)##

Consider ##n=2##. What happens when ##g\left(x,y\right)=x##, ##h\left(x,y\right)=y##, ##c=2##, and ##d=3##?

Condition for f(x,y,z) = f(x,y,z(x,y)) being extremized

1. What is the condition for a function f(x,y,z) to be extremized?

2. How is the condition for extremization related to critical points?

3. Can a function have multiple points of extremum?

4. Does the condition for extremization apply to all types of functions?

5. How is the condition for extremization used in practical applications?

Similar threads

Hot Threads

Recent Insights