What is the Coordinate-Free Formulation of the Hessian?

ergospherical · Aug 5, 2022

In local coordinates, the hessian of the function ##f## at point ##p## is ##H = \partial_i \partial_k f dx^i \otimes dx^k##. A coordinate-free generalisation is (see) ##H = \nabla df##, or explicitly ##H = \nabla_i (df)_k dx^i \otimes dx^k = \nabla_i \partial_k f dx^i \otimes dx^k##. How is this motivated?

martinbn · Aug 6, 2022

ergospherical said:

In local coordinates, the hessian of the function ##f## at point ##p## is ##H = \partial_i \partial_k f dx^i \otimes dx^k##. A coordinate-free generalisation is (see) ##H = \nabla df##, or explicitly ##H = \nabla_i (df)_k dx^i \otimes dx^k = \nabla_i \partial_k f dx^i \otimes dx^k##. How is this motivated?

How is what motivated?

ergospherical · Aug 6, 2022

I'm curious to know why ##H = \nabla df## is the correct generalisation of the usual expression (for local coordinates).

jbergman · Aug 6, 2022

I am not sure if this qualifies but from Lee's Introduction to Riemannian Manifolds, if we have a covariant derivative operator, ##\nabla##, then ##\nabla f## is just the 1-form ##df## because both have the same action on tangent vectors.

$$ \nabla f(X) = \nabla_X f = Xf = df(X) $$

The 2- tensor ##\nabla^2 f## is called the covariant Hessian of f and

$$\nabla^2 f = \nabla(df)$$

The last two formulas are just local coordinate formulas for the above which can be computed from the standard formulas for the covariant derivative.

Lastly, it's action on two tangent vectors is given by,
$$\nabla^2f(Y,X) = \nabla^2_{X,Y}f = \nabla_X(\nabla_Y f)-\nabla_{(\nabla_X Y)}f = Y(Xf)-(\nabla_Y X)f$$

jbergman · Aug 6, 2022

jbergman said:

Lastly, it's action on two tangent vectors is given by,
$$\nabla^2f(Y,X) = \nabla^2_{X,Y}f = \nabla_X(\nabla_Y f)-\nabla_{(\nabla_X Y)}f = Y(Xf)-(\nabla_Y X)f$$

Correction to the last line, it should be,

$$\nabla^2f(Y,X) = \nabla^2_{X,Y}f = \nabla_X(\nabla_Y f)-\nabla_{(\nabla_X Y)}f = X(Yf)-(\nabla_Y X)f$$

jbergman · Aug 7, 2022

jbergman said:

Correction to the last line, it should be,

$$\nabla^2f(Y,X) = \nabla^2_{X,Y}f = \nabla_X(\nabla_Y f)-\nabla_{(\nabla_X Y)}f = X(Yf)-(\nabla_Y X)f$$

Messed it up again... There must be a better way to remember the formula.
$$\nabla^2f(Y,X) = \nabla^2_{X,Y}f = \nabla_X(\nabla_Y f)-\nabla_{(\nabla_X Y)}f = X(Yf)-(\nabla_X Y)f$$

quasar987 · Aug 13, 2022

ergospherical said:

In local coordinates, the hessian of the function ##f## at point ##p## is ##H = \partial_i \partial_k f dx^i \otimes dx^k##. A coordinate-free generalisation is (see) ##H = \nabla df##, or explicitly ##H = \nabla_i (df)_k dx^i \otimes dx^k = \nabla_i \partial_k f dx^i \otimes dx^k##. How is this motivated?

I may be off, and I apologize if I am but my read on this question is that you've overlooked the fact that ##\nabla df## is not just a fancy coordinate-free generalization of the hessian, it is a coordinate independent version of the hessian. I.e. sure you can pick coordinates around a point and look at ##H = \partial_i \partial_k f dx^i \otimes dx^k## but this quantity is meaningless because you can change the coordinates and have it turn into whatever you want. Try it yourself: how does the quantity ##\partial_i \partial_k f dx^i \otimes dx^k## transforms when you look at it from another set of coordinates, say ##(y^i)##?

But the hessian was a pretty good ally... its index at the critical points of f told us about their nature (Morse Lemma). So maybe the hessian is worth fighting for. And by that I mean find a coordinate-independent entity which generalizes the Hessian and maybe share some of its nice properties. We could try looking at ##d^2f## since we know this will be coordinate independent, but no luck, ##d^2=0## always. Turns out we need a connection to make sense of second derivatives. Not too surprising since a connection is the apparatus whose raison d'être is to relate tangent vectors at different points and taking a second derivative involves looking at derivatives defined at different points. So the natural candidate seems to be ##\nabla df##. This a covariant 2-tensor, it is symmetric if ##\nabla## is symmetric and, in local coordinates, it is ##\partial_i \partial_k f dx^i \otimes dx^k## if ##\nabla## is flat.

lavinia · Dec 24, 2022

This thread got me thinking about how one might arrive conceptually at the definition of the covariant derivative of a 1 form given an affine connection on the tangent bundle. Here are some thoughts. I apologize in advance for any errors.

One can perhaps motivate the covariant derivative of a 1 form by first taking the case where the affine connection is compatible with a Riemannian metric.

If ##w## is parallel along a curve fitting ##X## and the vector field ##Y## is also parallel then one would want ##w(Y)## to be constant. So the derivative ##X⋅<w_{*},Y>## must equal zero and metric compatibility implies that ##<∇_{X}w_{*},Y>## is also zero. So the dual of ##w## under the Riemannian metric must be parallel along the curve. This suggests that a good definition of the covariant derivative of ##w## is the metric dual of the covariant derivative of its metric dual vector field.

This yields the formula ## (∇_{X}w)(Y)= X⋅w(Y)-w(∇_{X}Y)##

Since this formula does not involve a metric it suggests a definition of covariant derivative of a 1 form for any affine connection.

A similar line of thought might be to start with a set of coordinates for the tangent space at a point and ask how does one extend these coordinates along a curve so that measurement of coordinates of a vector field will not depend on changes in the coordinate 1 forms themselves but only on changes in the vector field. As in standard coordinates in Euclidean space, this can be done by setting the coordinates of the derivative of a vector field to be the derivative of its coordinates. Formally, ##X.w(Y)=w(∇_{X}Y)## for each coordinate 1 form.

When ##w## is not parallel ##X⋅w(Y) - w(∇_{X}Y)## is a 1 form in ##Y## and this again suggests the formula ## (∇_{X}w)(Y)= X⋅w(Y)-w(∇_{X}Y)##

To justify this, it must be shown that ##∇_{X}w## satisfies the definition of a covariant derivative.

Definition

A covariant derivative at a point ##p## on the cotangent bundle of a manifold assigns for each tangent vector ##X_{p}## and each 1 form ##w## a cotangent vector at ##p##. This assignment is linear in both tangent vectors at ##p## and in 1 forms and satisfies the Leibniz rule, ##∇_(X_{p})fw = (X_{p}.f)w_{p} + f∇_{X_{p}}w##. An affine connection is a covariant derivative at each point and is smooth in the sense that the covariant derivatives of a smooth 1 form with respect to a smooth vector field is also a smooth 1 form.

These properties are easily verified.

E.g. by the Leibniz rule, ##(∇_{X}fw)(Y) =X⋅(fw)(Y)- fw(∇_{X}Y)= (X⋅f)w(Y)+f(X⋅w(Y) - fw(∇_{X}Y##

##∇_{X}w## also determines an affine connection on the cotangent bundle.

Orodruin · Dec 26, 2022

lavinia said:

This thread got me thinking about how one might arrive conceptually at the definition of the covariant derivative of a 1 form given an affine connection on the tangent bundle. Here are some thoughts. I apologize in advance for any errors.

One can perhaps motivate the covariant derivative of a 1 form by first taking the case where the affine connection is compatible with a Riemannian metric.

If ##w## is parallel along a curve fitting ##X## and the vector field ##Y## is also parallel then one would want ##w(Y)## to be constant. So the derivative ##X⋅<w_{*},Y>## must equal zero and metric compatibility implies that ##<∇_{X}w_{*},Y>## is also zero. So the dual of ##w## under the Riemannian metric must be parallel along the curve. This suggests that a good definition of the covariant derivative of ##w## is the metric dual of the covariant derivative of its metric dual vector field.

This yields the formula ## (∇_{X}w)(Y)= X⋅w(Y)-w(∇_{X}Y)##

Since this formula does not involve a metric it suggests a definition of covariant derivative of a 1 form for any affine connection.

A similar line of thought might be to start with a set of coordinates for the tangent space at a point and ask how does one extend these coordinates along a curve so that measurement of coordinates of a vector field will not depend on changes in the coordinate 1 forms themselves but only on changes in the vector field. As in standard coordinates in Euclidean space, this can be done by setting the coordinates of the derivative of a vector field to be the derivative of its coordinates. Formally, ##X.w(Y)=w(∇_{X}Y)## for each coordinate 1 form.

When ##w## is not parallel ##X⋅w(Y) - w(∇_{X}Y)## is a 1 form in ##Y## and this again suggests the formula ## (∇_{X}w)(Y)= X⋅w(Y)-w(∇_{X}Y)##

To justify this, it must be shown that ##∇_{X}w## satisfies the definition of a covariant derivative.

Definition

A covariant derivative at a point ##p## on the cotangent bundle of a manifold assigns for each tangent vector ##X_{p}## and each 1 form ##w## a cotangent vector at ##p##. This assignment is linear in both tangent vectors at ##p## and in 1 forms and satisfies the Leibniz rule, ##∇_(X_{p})fw = (X_{p}.f)w_{p} + f∇_{X_{p}}w##. An affine connection is a covariant derivative at each point and is smooth in the sense that the covariant derivatives of a smooth 1 form with respect to a smooth vector field is also a smooth 1 form.

These properties are easily verified.

E.g. by the Leibniz rule, ##(∇_{X}fw)(Y) =X⋅(fw)(Y)- fw(∇_{X}Y)= (X⋅f)w(Y)+f(X⋅w(Y) - fw(∇_{X}Y##

##∇_{X}w## also determines an affine connection on the cotangent bundle.

You only need the Leibniz rule and the action of the covariant derivative on scalars and tangent vectors. There is nowhere where metric compatibility enters since cotangent space is the dual of the tangent space. You do not need a metric to do anything of what you describe. Simply define
$$
(\nabla_Y w)(X) = Y[w(X)] - w(\nabla_Y X)
$$

quasar987 said:

This a covariant 2-tensor, it is symmetric if ∇ is symmetric and, in local coordinates, it is ∂i∂kfdxi⊗dxk if ∇ is flat.

This is not the case. You are thinking of the case where local coordinates are chosen such that the connection coefficients vanish. While this can be accomplished in a flat space, it is not necessary for all local coordinate systems in a flat space. For example, consider polar coordinates in the two-dimensional Euclidean space.

What is the Coordinate-Free Formulation of the Hessian?

1. What is a "coordinate-free Hessian"?

2. How is a coordinate-free Hessian different from a regular Hessian?

3. What are the advantages of using a coordinate-free Hessian?

4. How is a coordinate-free Hessian used in optimization?

5. Can a coordinate-free Hessian be used in any coordinate system?

Similar threads

Hot Threads

Recent Insights