Stationary points classification using definiteness of the Lagrangian

Click For Summary
SUMMARY

This discussion focuses on classifying stationary points using the definiteness of the Hessian in the context of constrained optimization with Lagrange multipliers. The user successfully identified stationary points for the Lagrangian but faced confusion regarding the classification of these points as minima or saddle points. The analysis revealed that the eigenvalues of the Hessian matrix indicate that the critical points are minima, despite the presence of a negative eigenvalue, due to the orthogonality condition with the gradient of the constraint function.

PREREQUISITES
  • Understanding of Lagrange multipliers for constrained optimization
  • Knowledge of Hessian matrices and their definiteness
  • Familiarity with eigenvalues and eigenvectors
  • Concept of gradients and their geometric interpretation
NEXT STEPS
  • Study the method of Lagrange multipliers in depth
  • Learn about the geometric interpretation of eigenvalues in optimization
  • Explore the implications of the definiteness of Hessians in constrained optimization
  • Investigate the relationship between gradients and eigenvectors in optimization problems
USEFUL FOR

Students and professionals in mathematics, engineering, and economics who are involved in optimization problems, particularly those dealing with constrained optimization and Lagrange multipliers.

fatpotato
Homework Statement
Find the max/min/saddle points of ##f(x,y) = x^4 - y^4## subject to the constraint ##g(x,y) = x^2-2y^2 -1 =0##
Use Lagrange multipliers method
Classify the stationnary points (max/min/saddle) using the definiteness of the Hessian
Relevant Equations
Positive/Negative definite matrix
Hello,

I am using the Lagrange multipliers method to find the extremums of ##f(x,y)## subjected to the constraint ##g(x,y)##, an ellipse.

So far, I have successfully identified several triplets ##(x^∗,y^∗,λ^∗)## such that each triplet is a stationary point for the Lagrangian: ##\nabla \mathscr{L} (x^∗,y^∗,λ^∗) = 0##

Now, I want to classify my triplets as max/min/saddle points, using the positive/negative definiteness of the Hessian like I have been doing for unconstrained optimization, so I compute what I think is the Hessian of the Lagrangian:

$$H_{\mathscr{L}}(x,y,λ)= \begin{pmatrix} 12x^2 - 2\lambda & 0 \\ 0 & -12y^2 - 4\lambda \end{pmatrix}$$

Evaluating the Hessian for my first triplet ##(0,\pm \frac{\sqrt{2}}{2},−\frac{1}{2})## gives me:

$$H_{\mathscr{L}}(0,\pm \frac{\sqrt{2}}{2},−\frac{1}{2}) = \begin{pmatrix} 1 & 0 \\ 0 & - 4\end{pmatrix}$$

This matrix is diagonal, meaning that we immediately read its eigenvalues on the diagonal: ##\lambda_1 = 1 > 0## and ##\lambda_2 = -4 < 0##. A positive/negative definite matrix has only positive/negative eigenvalues, thus I conclude that this matrix is neither, due to its eigenvalues' opposite signs.

When I was studying unconstrained optimization, I learned that we have in this case a saddle point, so I would like to think that the points ##(0,\pm \frac{\sqrt{2}}{2})## are both saddle points for my function f, however, the solution to this problem affirms these points are minimums, using the following argument:

Lagrange_Mult_Sol.PNG

Using the fact that ##\nabla g(x,y) = (0,\pm \frac{\sqrt{2}}{2})## and that ##w^T \nabla g(x,y) = 0## if and only if ##w = (\alpha, 0), \alpha \in \mathbb{R}^{\ast}##

I thought that it was enough to check for the definiteness of the Hessian, and now I am really confused...

Here are my questions:
  1. When is it enough to check the definiteness of the Hessian to classify stationnary points?
  2. Why is there this additional step in constrained optimization?
  3. What am I missing?
Thank you for your time.

Edit: PF destroyed my LaTeX formatting.
 
Last edited by a moderator:
  • Like
Likes   Reactions: Delta2
Physics news on Phys.org
You are constrained to look at the behaviour of f restricted to the one-dimensional hyperbola g(x,y) = x^2 - 2y^2 - 1 = 0. If f increases or decreases as you move off this curve, then that does not concern you.

To remain on the curve, your direction of travel must be orthogonal to \nabla g. In this case, the eigenvector corresponding to the negative eigenvalue is parallel to \nabla g and the eigenvector corresponding to the positive eigenvalue is orthogonal to it, so as far as you are concerned the negative eigenvalue is irrelevant: this critical point is a minimum.

In general, \nabla g will not be an eigenvector of the hessian of f. The text therefore defines a vector \mathbf{w} orthogonal to \nabla g) and looks at <br /> f(\mathbf{x} + \alpha\mathbf{w}) \approx f(\mathbf{x}) + \tfrac12\alpha^2 \mathbf{w}^T H<br /> \mathbf{w} to determine whether a critical point of f subject to this constraint is a minimum (\mathbf{w}^T H\mathbf{w} &gt; 0) or a maximum (\mathbf{w}^T H\mathbf{w} &lt; 0).
 
Last edited:
  • Informative
  • Like
Likes   Reactions: fatpotato and Delta2
Thank you for your answer. There is a new point I do not understand:
pasmith said:
In this case, the eigenvector corresponding to the negative eigenvalue is parallel to and the eigenvector corresponding to the positive eigenvalue is orthogonal to it, so as far as you are concerned the negative eigenvalue is irrelevant: this critical point is a minimum.
How do you deduce that the eigenvector corresponding to the negative eigenvalue is parallel to the gradient? Same question for the eigenvector corresponding to the positive eigenvalue?
 
  • Like
Likes   Reactions: Delta2
fatpotato said:
Thank you for your answer. There is a new point I do not understand:

How do you deduce that the eigenvector corresponding to the negative eigenvalue is parallel to the gradient? Same question for the eigenvector corresponding to the positive eigenvalue?

By inspection. In this case the Hessian is diagonal, so we immediately see that the eigenvectors are (1,0) with eigenvalue H_{11} = 1 and (0,1) with eigenvalue H_{22} = -4. \nabla g is easily computed to be (2x,-4y) and at the critical point this is (0, \pm \sqrt{2}). The direction orthogonal to it is therefore (1,0).
 
  • Like
Likes   Reactions: Delta2 and fatpotato
Yes of course, we look for the kernel of ##H - \lambda_i I## so the vector ##(1,0)^T## is mapped to ##(0,0)^T## with eigenvalue ##\lambda_1 = 1##, thus being the associated eigenvector.

Sorry, I knew the concept, but could not deduce it myself. Now it is perfectly clear, thank you!
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
1
Views
2K
  • · Replies 8 ·
Replies
8
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K