Office_Shredder
Staff Emeritus
Science Advisor
Gold Member
- 5,705
- 1,589
For number 10
short proof:
The hessian matrix is symmetric with real eigenvalues. There's a theorem that if all its eigenvalues are negative then you have a local maximum, if they are all positive then it's a local minimum, and if there's at least one of each sign then it's neither a minimum or a maximum.
Since the dimension of the space is even, the determinant is the product of an even number of real numbers, and since that gives us a negative result there must be at least one positive and one negative eigenvalue, and hence ##a## is neither a maximum or a minimum
Here's a full proof with details, using Taylor polynomials in multiple dimensions:we write down the second degree Taylor polynomial with its error term. ##(Hf)(a)## is the ##2n\times 2n## matrix of second degree partial derivatives at ##a##, and ##(Df)(a)## is a ##1\times 2n## row vector of the first order partial derivatives. ##x## and ##a## are column vectors, and everything below is regular matrix multiplication. Then a generally true thing for twice continuously differentiable functions
$$f(x)= f(a)+\left((Df)(a)\right)(x-a) + (x-a)^t\left((Hf)(a)\right)(x-a)+ (x-a)^t h(x)(x-a)$$
Where ##h(x)## is a matrix with entries ##h_{ij}(x)## that are continuous and ##\lim_{x\to a}h_{ij}(x)=0##.
For the specific situation in #10, ##(Df)(a)=0##, and the determinant of ##(Hf)(a)## is negative. Since ##(Hf)(a)## is a symmetric matrix, its eigenvalues are all real numbers. Furthermore since it's symmetric it is diagonalizable with the diagonal entries being the eigenvalues. There are an even number of them, so for the product of them to be a negative number there must be both a positive and a negative eigenvalue.
Suppose ##v## is a unit eigenvector with eigenvalue ##\lambda## and consider ##f(a+\epsilon v)## for ##\epsilon >0##. Let's also write ##H=(Hf)(a)##. Then we get
$$f(a+\epsilon v) = f(a) + \epsilon^2 v^t H v + \epsilon^2 v^t h(a+\epsilon v) v = f(a) + \epsilon^2 \left(\lambda+v^t h(a+\epsilon v) v\right)$$.
We will now show that if ##\epsilon## is small enough, ##|v^t h(a+\epsilon v) v| < |\lambda/2|##. In general for a matrix ##M## whose largest entry is ##\kappa##, and for any unit vector ##v##,
$$|v^t M v| = |\sum_{i,j} M_{ij} v_i v_j | \leq \sum_{ij} |M_{ij} v_i v_j|$$
We use the fact that ##|M_{ij}|\leq \kappa## and ##|v_k|\leq 1|## for all ##k## since ##v## is a unit vector (obviously this is a crude bound, I think you can probably do ##\sqrt{n}## better but we don't care) to get (remember ##M## is ##2n\times 2n##)
$$|v^t M v| \leq \sum_{i,j} \kappa = 4n^2 \kappa$$.
We know that the entries of the error term ##h(a+\epsilon v)## go to 0 as ##\epsilon## goes to zero, so if we pick ##\epsilon## small enough that all its entries are smaller in magnitude than ##\frac{|\lambda|}{8n^2}##, and then ##|v^t h(a+\epsilon v) v| \leq \lambda/2##. Let's let ##g(\epsilon)= v^t h(a+\epsilon v) v##, and restrict ourselves to ##\epsilon## small enough that ##|g(\epsilon)| \leq \lambda/2##.So we have ##f(a+\epsilon v)= f(a)+\epsilon^2(\lambda +g(\epsilon))##. If we pick ##v## to be an eigenvector with positive eigenvalue, then ##f(a+\epsilon v) \geq f(a) + \epsilon^2 \lambda/2 > f(a)##. If we pick ##v## to be an eigenvector with a negative eigenvalue , which I'll write as ##-\lambda## for ##\lambda > 0##, then ##f(a+\epsilon v) \leq f(a) - \epsilon^2 \lambda/2 < f(a)##. So ##a## is neither a local maximum or a local minimum.
The hessian matrix is symmetric with real eigenvalues. There's a theorem that if all its eigenvalues are negative then you have a local maximum, if they are all positive then it's a local minimum, and if there's at least one of each sign then it's neither a minimum or a maximum.
Since the dimension of the space is even, the determinant is the product of an even number of real numbers, and since that gives us a negative result there must be at least one positive and one negative eigenvalue, and hence ##a## is neither a maximum or a minimum
Here's a full proof with details, using Taylor polynomials in multiple dimensions:we write down the second degree Taylor polynomial with its error term. ##(Hf)(a)## is the ##2n\times 2n## matrix of second degree partial derivatives at ##a##, and ##(Df)(a)## is a ##1\times 2n## row vector of the first order partial derivatives. ##x## and ##a## are column vectors, and everything below is regular matrix multiplication. Then a generally true thing for twice continuously differentiable functions
$$f(x)= f(a)+\left((Df)(a)\right)(x-a) + (x-a)^t\left((Hf)(a)\right)(x-a)+ (x-a)^t h(x)(x-a)$$
Where ##h(x)## is a matrix with entries ##h_{ij}(x)## that are continuous and ##\lim_{x\to a}h_{ij}(x)=0##.
For the specific situation in #10, ##(Df)(a)=0##, and the determinant of ##(Hf)(a)## is negative. Since ##(Hf)(a)## is a symmetric matrix, its eigenvalues are all real numbers. Furthermore since it's symmetric it is diagonalizable with the diagonal entries being the eigenvalues. There are an even number of them, so for the product of them to be a negative number there must be both a positive and a negative eigenvalue.
Suppose ##v## is a unit eigenvector with eigenvalue ##\lambda## and consider ##f(a+\epsilon v)## for ##\epsilon >0##. Let's also write ##H=(Hf)(a)##. Then we get
$$f(a+\epsilon v) = f(a) + \epsilon^2 v^t H v + \epsilon^2 v^t h(a+\epsilon v) v = f(a) + \epsilon^2 \left(\lambda+v^t h(a+\epsilon v) v\right)$$.
We will now show that if ##\epsilon## is small enough, ##|v^t h(a+\epsilon v) v| < |\lambda/2|##. In general for a matrix ##M## whose largest entry is ##\kappa##, and for any unit vector ##v##,
$$|v^t M v| = |\sum_{i,j} M_{ij} v_i v_j | \leq \sum_{ij} |M_{ij} v_i v_j|$$
We use the fact that ##|M_{ij}|\leq \kappa## and ##|v_k|\leq 1|## for all ##k## since ##v## is a unit vector (obviously this is a crude bound, I think you can probably do ##\sqrt{n}## better but we don't care) to get (remember ##M## is ##2n\times 2n##)
$$|v^t M v| \leq \sum_{i,j} \kappa = 4n^2 \kappa$$.
We know that the entries of the error term ##h(a+\epsilon v)## go to 0 as ##\epsilon## goes to zero, so if we pick ##\epsilon## small enough that all its entries are smaller in magnitude than ##\frac{|\lambda|}{8n^2}##, and then ##|v^t h(a+\epsilon v) v| \leq \lambda/2##. Let's let ##g(\epsilon)= v^t h(a+\epsilon v) v##, and restrict ourselves to ##\epsilon## small enough that ##|g(\epsilon)| \leq \lambda/2##.So we have ##f(a+\epsilon v)= f(a)+\epsilon^2(\lambda +g(\epsilon))##. If we pick ##v## to be an eigenvector with positive eigenvalue, then ##f(a+\epsilon v) \geq f(a) + \epsilon^2 \lambda/2 > f(a)##. If we pick ##v## to be an eigenvector with a negative eigenvalue , which I'll write as ##-\lambda## for ##\lambda > 0##, then ##f(a+\epsilon v) \leq f(a) - \epsilon^2 \lambda/2 < f(a)##. So ##a## is neither a local maximum or a local minimum.
Last edited: