Condition for f(x,y,z) = f(x,y,z(x,y)) being extremized

  • Thread starter Thread starter PhDeezNutz
  • Start date Start date
  • Tags Tags
    Condition
Click For Summary
SUMMARY

The discussion centers on the conditions for extremizing a function f(x,y,z) with respect to its variables, particularly when z is a function of x and y. Participants clarify that the partial derivatives of f must equal zero under certain constraints, specifically using the notation of the implicit function theorem. The conversation also touches on the Lagrange multiplier method, emphasizing the necessity of understanding the relationship between the variables and the constraints imposed by functions g(x,y,z). Key references include the implicit function theorem and the Lagrange multiplier method.

PREREQUISITES
  • Understanding of partial derivatives and their notation
  • Familiarity with the implicit function theorem
  • Knowledge of the Lagrange multiplier method for optimization
  • Basic concepts of calculus of variations
NEXT STEPS
  • Study the implications of the implicit function theorem in multivariable calculus
  • Learn about the geometric interpretation of the Lagrange multiplier method
  • Explore the calculus of variations and its applications in constrained optimization
  • Investigate advanced topics in optimization, such as the use of functionals and their derivatives
USEFUL FOR

Mathematicians, physicists, and engineers involved in optimization problems, particularly those dealing with multivariable functions and constraints.

PhDeezNutz
Messages
851
Reaction score
561
Homework Statement
Given ##f(x,y,z) = f(x,y,z(x,y))## then what are the conditions involving the partial derivatives of $f$
Relevant Equations
Apparently they are

##\left(\frac{\partial f}{\partial x} \right)_y =\left(\frac{\partial f}{\partial y} \right)_x = \left(\frac{\partial f}{\partial z} \right)_z = 0##

where the subscript denotes the independent variable being held constant
As far as I know when a function is extremized its partial derivatives are all equal to 0 (provided we aren't dealing with a constraint)

##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##
Let's start off
##\left(\frac{\partial f}{\partial x} \right)_{y} = \left(\frac{\partial f}{\partial x}\right)_{yz} + \left( \frac{\partial f}{\partial z} \right)_{xy} \left( \frac{\partial z}{\partial x} \right)_{y}##

I really don't know where to go from here. I'm not even sure my understanding of implicit functions is up to par to even begin addressing this question.
 
Physics news on Phys.org
PhDeezNutz said:
... then what are the conditions involving the partial derivatives of ##f## ...
What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.
 
fresh_42 said:
What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.
It is not implied. It is a bastard notation often used in physics (mainly thermodynamics) when you have the choice to use any two out of three variables as your coordinates.
 
fresh_42 said:
What do you mean by conditions? The implicit function theorem goes as:
https://en.wikipedia.org/wiki/Implicit_function_theorem
The notation of the subscript isn't necessary, because this is automatically implied by the use of partial ##\partial## derivatives.
Orodruin said:
It is not implied. It is a bastard notation often used in physics (mainly thermodynamics) when you have the choice to use any two out of three variables as your coordinates.

LfifNIa.jpg


I understand everything up until the line right after (22.63)

Why are Arken and Weber setting

##\left( \frac{\partial f}{\partial x}\right)_y = 0## instead of ##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##?

And presumably the other 2 equations in (22.66) come from setting

##\left( \frac{\partial f}{\partial y}\right)_z = 0##

##\left( \frac{\partial f}{\partial z}\right)_ x= 0##

after cyclically permuting. (It says the second equation comes from swapping x and y but I don't see it, also that doesn't explain the third equation)

Why these specific derivatives?

Given ##f(x,y,z(x,y))##

Hopefully I was able to clarify my question. Any help is appreciated because I am lost. So thanks in advance.
 
I find the usage of ##f## a bit confusing.

We have a function ##f## to be minimized and a constraint ##g##. What we actually do is to define a new function ##\mathcal{L}=f-\lambda g## such that we can minimize ##\mathcal{L}## now which has no constraint. Thus we can as usual solve ##\nabla \mathcal{L} = \operatorname{grad}\mathcal{L}=0##.

It is better explained here (including examples):
https://en.wikipedia.org/wiki/Lagrange_multiplier(22.64) is the first coordinate of ##\nabla \mathcal{L}=0##.
 
fresh_42 said:
I find the usage of ##f## a bit confusing.

We have a function ##f## to be minimized and a constraint ##g##. What we actually do is to define a new function ##\mathcal{L}=f-\lambda g## such that we can minimize ##\mathcal{L}## now which has no constraint. Thus we can as usual solve ##\nabla \mathcal{L} = \operatorname{grad}\mathcal{L}=0##.

It is better explained here (including examples):
https://en.wikipedia.org/wiki/Lagrange_multiplier(22.64) is the first coordinate of ##\nabla \mathcal{L}=0##.

I know how to use the Lagrangian multiplier method. I'm trying to prove it.

Why is the first coordinate ##\left(\frac{\partial f}{\partial x}\right)_y##? Why is the second coordinate ##\left(\frac{\partial f}{\partial y}\right)_z##? Why is the third coordinate ##\left(\frac{\partial f}{\partial z}\right)_x##? I understand the chain rule helps us confirm these expressions.

I want to know why we set them = 0? I know that once we do we get the system of equations ##\nabla f = \lambda \nabla g##.
 
It is much easier to think geometrically about the Lagrange multiplier method. If a constraint surface is given by ##g = c_0##. For any function ##f## it holds that the infinitesimal change ##df## of ##f## when making a displacement ##d\vec x## is given by
$$
df = \vec d\vec x \cdot \nabla f.
$$
Restricting ##d\vec x## to the constraint surface means that ##d\vec x \cdot \vec n = 0##, where ##\vec n \propto \nabla g## is the surface normal. Thus, in order for ##df = 0## when restricting displacements to lie in the constraint surface results in
$$
df = d\vec x \cdot \nabla f = 0
$$
for all ##d\vec x## in the constraint surface, i.e., the projection of ##\nabla f## to the local tangent space of the surface is zero. This is true if ##\nabla f = \lambda \nabla g## for some ##\lambda##, since ##\nabla g## is normal to the surface. Therefore
$$
df = d\vec x \cdot (\nabla f - \lambda \nabla g) = 0
$$
and we can adjust ##\lambda## such that this is true not only for displacements in the constraint surface, but for all displacements.
 
Orodruin said:
It is much easier to think geometrically about the Lagrange multiplier method. If a constraint surface is given by ##g = c_0##. For any function ##f## it holds that the infinitesimal change ##df## of ##f## when making a displacement ##d\vec x## is given by
$$
df = \vec d\vec x \cdot \nabla f.
$$
Restricting ##d\vec x## to the constraint surface means that ##d\vec x \cdot \vec n = 0##, where ##\vec n \propto \nabla g## is the surface normal. Thus, in order for ##df = 0## when restricting displacements to lie in the constraint surface results in
$$
df = d\vec x \cdot \nabla f = 0
$$
for all ##d\vec x## in the constraint surface, i.e., the projection of ##\nabla f## to the local tangent space of the surface is zero. This is true if ##\nabla f = \lambda \nabla g## for some ##\lambda##, since ##\nabla g## is normal to the surface. Therefore
$$
df = d\vec x \cdot (\nabla f - \lambda \nabla g) = 0
$$
and we can adjust ##\lambda## such that this is true not only for displacements in the constraint surface, but for all displacements.

The reason I want to prove it in the manner shown above is because I eventually want to use the same approach for calculus of variations when a functional that depends on coordinates has constraints on coordinates.

U9oM3bl.jpg


Where ## \lambda_j ## is a function of t instead of a mere scalar factor. And in the calculus of variation case there is no clear geometric meaning. Furthermore the derivative definitions are more complicated.

Edit the constraints are of the form ##g_j(q_i(t)) = C##
 
Last edited:
PhDeezNutz said:
And in the calculus of variation case there is no clear geometric meaning.
What makes you think that? The derivation is completely analogous.
 
  • #10
Orodruin said:
What makes you think that? The derivation is completely analogous.

Maybe I misspoke but I would think to set the functional derivatives equal to each other except for a scalar multiple (I.e. the Lagrange Euler equations) which would work except for the fact lambda is a function of t. I want to do the analysis that reveals lambda is an actual function of t.

Edit: I don’t understand how derivatives can be parallel to each other but differ by a multiplicative function of t.
 
  • #11
PhDeezNutz said:
Why are Arken and Weber setting

##\left( \frac{\partial f}{\partial x}\right)_y = 0## instead of ##\left(\frac{\partial f}{\partial x} \right)_{yz} = \left(\frac{\partial f}{\partial y}\right)_{xz} = \left(\frac{\partial f}{\partial z}\right)_{xy} =0##?
I believe it's because the constraint removes one degree of freedom. You have the freedom to hold ##y## constant, but once you impose that condition, a change in ##x## generally requires a change in ##z## to remain on the surface ##g(x,y,z)=C##.
 
  • Like
Likes   Reactions: PhDeezNutz
  • #12
vela said:
I believe it's because the constraint removes one degree of freedom. You have the freedom to hold ##y## constant, but once you impose that condition, a change in ##x## generally requires a change in ##z## to remain on the surface ##g(x,y,z)=C##.

Let me see if I understand

##\left( \frac{\partial f}{\partial x} \right)_y = 0## z is a function of x and y so only x and y are allowed to vary, x is varying so y is constant.

##\left(\frac{\partial f}{\partial y} \right)_x = 0## z is a function of x and y so only x and y are allowed to vary, y is varying so x is constant.

I don't see how ##\left(\frac{\partial f}{\partial z}\right)_x = 0## is consistent with these choices. And therefore I don't see how we get the third equation in 22.64.

Either way, your post definitely helped. Even If I don't totally understand it.
 
Last edited:
  • #13
You seem to be confusing yourself because you accorded ##z## special status. The implicit function theorem doesn't single out ##z## as being a function of the other variables. You could just as easily say ##x=x(y,z)##.
 
  • Like
Likes   Reactions: PhDeezNutz
  • #14
vela said:
You seem to be confusing yourself because you accorded ##z## special status. The implicit function theorem doesn't single out ##z## as being a function of the other variables. You could just as easily say ##x=x(y,z)##.

I think you're right.

With your help I think I've got a grasp on it. Would you mind giving my work a quick look through?

Lemma 1 (Implicit Function Theorem)

Let ##g : R^n \rightarrow R## and let ##L_c(g)## be a level set of ##g##, i.e.

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

it follows that

## dg = \sum \limits_{i = 1}^{n} \left(\frac{\partial g}{\partial x_i} \right) \left(x_i - a_i \right) = 0##

implies

## \left( \frac{\partial g}{\partial x_n}\right) \left( x_n - a_n \right) = - \sum \limits_{i=1}^{n-1} \left( \frac{\partial g}{\partial x_i}\right) \left( x_i - a_i \right)##

which then gives us

##x_n = a_n + \sum \limits_{i=1}^{n-1} - \frac{\left( \frac{\partial g }{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} \left( x_i - a_i \right)##

Linearizing ##x_n##

##x_n = a_n + \sum \limits_{i=1}^{n-1} \left(\frac{\partial x_n}{\partial x_i}\right) \left( x_i - a_i \right) ##

Therefore

##\left(\frac{\partial x_n}{\partial x_i}\right) = - \frac{\left( \frac{\partial g }{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} ##

Lemma 2 (We are going to need this later to show that the the Lagrange Multiplier is indeed a constant)

If ##p(x_i) - \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) s_j \left(x_i \right) = 0 \Rightarrow \lambda_j \left( x_n \right) = C_j## where ##C_j## is a constant.Proof:

##p \left( x_i \right) = \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) s_j \left(x_i \right) = \sum \limits_{j=1}^{m} \Lambda_j \left( x_i \right)##

##\Lambda_j \left( x_i \right) = \lambda_j \left( x_n \right) s_j \left(x_i \right)## implies ##\lambda_j \left( x_n \right)## is either a function of ##x_i## or a constant. It can't be a function ##x_i## so it must be a constant.

So we have

##\lambda_j \left( x_n \right) = constant##

Proof of Lagrange Multiplier Method for Real-Valued Functions:(Multiple Constraints)
Let ##f: R^n \rightarrow R## and ##g_j: R^n \rightarrow R##. Want to constrain ##f \left( \vec{x} \right)## by ##g_j \left( \vec{x} \right) = C_j##. Suppose there are ##m## such constraints.

##dg_j = 0## so by the Implicit Function Theorem ##\left( \frac{\partial x_n}{\partial x_i}\right) = - \sum\limits_{j=1}^{m} \frac{\left( \frac{\partial g_j}{\partial x_i }\right)}{\left( \frac{\partial g_j}{\partial x_n}\right)}## (This part I'm unsure about but it seems necessary to get the result we want)

If ##f## is extremized then ##\left( \frac{\partial f}{\partial x_i} \right)_{total} = 0##

## \left( \frac{\partial f}{\partial x_i} \right)_{total} = \left( \frac{\partial f}{\partial x_i} \right) + \left( \frac{\partial f}{\partial x_n}\right) \left( \frac{\partial x_n}{\partial x_i}\right) = 0##

##\left( \frac{\partial f}{\partial x_i} \right)- \sum \limits_{j=1}^{m} \frac{\left( \frac{\partial f}{\partial x_i}\right)}{\left( \frac{\partial g}{\partial x_n}\right)} \left(\frac{\partial g_j}{\partial x_i} \right) = \left( \frac{\partial f}{\partial x_i} \right)- \sum \limits_{j=1}^{m} \lambda_j \left( x_n \right) \left(\frac{\partial g_j}{\partial x_i} \right) = 0 ##

implies ##\lambda_j(x_n) = constant## by Lemma 2

Therefore we have a system of equations

## \nabla f - \sum\limits_{j=1}^{m} \lambda_j \nabla g_j = 0## Q.E.D.

Hopefully I didn't do anything egregiously wrong.
 
  • #15
Can anyone confirm, deny, or motivate a proof for the following statement for the intersection of a level sets of different functions?

##g: R^n \rightarrow R##
##h: R^n \rightarrow R##
##j: R^n \rightarrow R##

and ##j(\vec{x}) = g(\vec{x}) + h(\vec{x})##

Define the following level sets

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

##L_d(h) = \left\{ \vec{x} \epsilon R^n | h(\vec{x}) = d \right\}##

Is the following a true statement? How would I prove it. Keep in mind I'm a physics student not a math student so forgive me if I'm a little slow in understanding.

##L_c(g) \cap L_d(h) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \wedge h(\vec{x}) = d \right\} = \left\{ \vec{x} \epsilon R^n | j(\vec{x}) = c + d \right\} = L_{(c+d)} (j)##
 
  • #16
PhDeezNutz said:
Can anyone confirm, deny, or motivate a proof for the following statement for the intersection of a level sets of different functions?

##g: R^n \rightarrow R##
##h: R^n \rightarrow R##
##j: R^n \rightarrow R##

and ##j(\vec{x}) = g(\vec{x}) + h(\vec{x})##

Define the following level sets

##L_c(g) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \right\}##

##L_d(h) = \left\{ \vec{x} \epsilon R^n | h(\vec{x}) = d \right\}##

Is the following a true statement? How would I prove it. Keep in mind I'm a physics student not a math student so forgive me if I'm a little slow in understanding.

##L_c(g) \cap L_d(h) = \left\{ \vec{x} \epsilon R^n | g(\vec{x}) = c \wedge h(\vec{x}) = d \right\} = \left\{ \vec{x} \epsilon R^n | j(\vec{x}) = c + d \right\} = L_{(c+d)} (j)##
Consider ##n=2##. What happens when ##g\left(x,y\right)=x##, ##h\left(x,y\right)=y##, ##c=2##, and ##d=3##?
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
3
Views
2K
  • · Replies 18 ·
Replies
18
Views
3K