Show Hessian is positive definite

  • Thread starter Thread starter kingwinner
  • Start date Start date
  • Tags Tags
    Hessian Positive
Click For Summary
SUMMARY

The discussion centers on proving the positive definiteness of the Hessian matrix associated with the function f(a) defined by the integral of the squared difference between a polynomial and a known function g(x) over the interval [0,1]. The Hessian matrix is explicitly given by M[j,k] = 2/(j+k+1) for j, k = 0, 1, ..., n. The participants establish that the Hessian is positive definite by demonstrating that the quadratic form vT M v is greater than zero for all non-zero vectors v in R^(n+1), thus confirming that f(a) is strictly convex.

PREREQUISITES
  • Understanding of Hessian matrices and their properties
  • Knowledge of convex functions and strict convexity
  • Familiarity with polynomial functions and integrals
  • Basic linear algebra concepts, particularly quadratic forms
NEXT STEPS
  • Study the properties of positive definite matrices in linear algebra
  • Learn about strict convexity and its implications in optimization
  • Explore the Gram-Schmidt orthogonalization process for polynomials
  • Investigate methods for proving positive definiteness using quadratic forms
USEFUL FOR

Mathematicians, students in advanced calculus or optimization courses, and anyone interested in the properties of convex functions and their applications in optimization problems.

kingwinner
Messages
1,266
Reaction score
0

Homework Statement


Consider the function f(a)=
1
∫ [g(x)-(anxn+an-1xn-1+...+a0)]2 dx
0
where a=(a0,a1,...an) and g is some known function defined on [0,1].

From this, we can show that
optim1.JPG


Thus, the Hessian of f at a = [2/(j+k+1)] j=0,1,2,...n; k=0,1,2,...,n.

Fact: This Hessian matrix is positive definite.

Now how can we prove that this (n+1) x (n+1) matrix is positive definite? (i.e. vT (Hessian) v >0 for all v E Rn+1, v≠0.)
When I multiply out the vT (Hessian) v, it just doesn't seem clear to me at all that it is >0.

Homework Equations


N/A

The Attempt at a Solution


Shown above.

Any help is appreciated!
 
Physics news on Phys.org
kingwinner said:

Homework Statement


Consider the function f(a)=
1
∫ [g(x)-(anxn+an-1xn-1+...+a0)]2 dx
0
where a=(a0,a1,...an) and g is some known function defined on [0,1].

From this, we can show that
optim1.JPG


Thus, the Hessian of f at a = [2/(j+k+1)] j=0,1,2,...n; k=0,1,2,...,n.

Fact: This Hessian matrix is positive definite.

Now how can we prove that this (n+1) x (n+1) matrix is positive definite? (i.e. vT (Hessian) v >0 for all v E Rn+1, v≠0.)
When I multiply out the vT (Hessian) v, it just doesn't seem clear to me at all that it is >0.

Homework Equations


N/A

The Attempt at a Solution


Shown above.

Any help is appreciated!

You can do it indirectly: for each fixed x, the function F_x(\vec{a}) = [g(x) - a_0 - a_1 x - ... - a_n x^n]^2 is a strictly convex function of \vec{a}; you can either get this by looking at the Hessian of F_x in \vec{a}-space, or establish it directly from the definition of strict convexity (i.e., show directly that F_x(\lambda \vec{a} + (1-\lambda) \vec{b}) < \lambda F_x(\vec{a}) + (1-\lambda) F_x(\vec{b}) for \vec{a} \not= \vec{b} \mbox{ and all } 0 < \lambda < 1. Strict convexity is preserved by integrating (again, easy to show from the above inequality), so your f(\vec{a}) is strictly convex and quadratic in \vec{a}. Then, by a standard theorem, the Hessian of f is positive definite.

RGV
 
(I've just found some typos and realize some formatting error in my previous post (post #3), but for some reason I'm unable to edit it now (weird?!) So sorry I have to re-post it again. Moderator if you see this message, please delete my post #3)

Mod note: Done[/color]

But the problem is acutally I'm trying to demonstrate that the converse of your theorem is true in this example. I want to show that the Hessian matrix of f is positive definite directly, and hence conclude it is strictly convex.

In this example, the Hessian of f at a is an (n+1) x (n+1) matrix M given by
[Mjk]=[2/(j+k+1)] for j=0,1,2,...n; k=0,1,2,...,n.
I need to show vT (Hessian) v >0 for all v E Rn+1, v≠0.

When I multiply out the vT (Hessian) v, I will get some sum of squares, BUT then I will also have a whole bunch of cross terms, how can I show that it is >0?

I'm also pretty confused generally about how to use the definition of positive definiteness (i.e. vT A v >0) to actually SHOW a matrix is positive definite. How can we actually prove those things? When I multiply out the vT A v, I will always get a sum of squares plus many cross terms (unless the matrix A is a diagonal matrix), and it's not at all obvious whether the result is >0 or not.

Thanks for helping.
 
Last edited:
kingwinner said:
(I've just found some typos and realize some formatting error in my previous post (post #3), but for some reason I'm unable to edit it now (weird?!) So sorry I have to re-post it again. Moderator if you see this message, please delete my post #3)

Mod note: Done[/color]

But the problem is acutally I'm trying to demonstrate that the converse of your theorem is true in this example. I want to show that the Hessian matrix of f is positive definite directly, and hence conclude it is strictly convex.

In this example, the Hessian of f at a is an (n+1) x (n+1) matrix M given by
[Mjk]=[2/(j+k+1)] for j=0,1,2,...n; k=0,1,2,...,n.
I need to show vT (Hessian) v >0 for all v E Rn+1, v≠0.

When I multiply out the vT (Hessian) v, I will get some sum of squares, BUT then I will also have a whole bunch of cross terms, how can I show that it is >0?

I'm also pretty confused generally about how to use the definition of positive definiteness (i.e. vT A v >0) to actually SHOW a matrix is positive definite. How can we actually prove those things? When I multiply out the vT A v, I will always get a sum of squares plus many cross terms (unless the matrix A is a diagonal matrix), and it's not at all obvious whether the result is >0 or not.

Thanks for helping.

Here is an partial approach that gets at the essence of the problem.

Note that your f(a) and the simpler function g(a) = \int_0^1 [a_0 + a_1 x + \cdots + a_n x^n]^2 dx have the same a-Hessian, so it is enough to look at g(a). Now, let us replace the integration by a summation of more than n+1 terms:
g_N(a) = \sum_{j=1}^N w_j [a_0 + a_1 x_j + \cdots + a_n x_j^n]^2, \; (\mbox{all }w_j > 0),
and note that each term in the summation is a convex function of the a_k (but not strictly convex). The jth term vanishes on a hyperplane <p_j,a> = 0, where p_j = (1, x_j, x_j^2, \ldots, x_j^n). If we have N > n+1 terms we cannot have all N terms vanishing for any nonzero a, because the vectors {p_j} are linearly independent: the determinant of (n+1) of the p_j is a Vandermonde determinant, which is non-vanishing for distinct x_j. That means that there is no nonzero vector lying on more than n+1 of the hyperplanes, and that means that the summation from 1 to N does not vanish for any nonzero vector a. That makes g_N(a) a positive-definite quadratic form (that is, a strictly convex function).

Now the clincher: the integrand q(x) of g(a) is a polynomial of degree 2n in x, so there is a scheme that renders the approximation g_N(a) *exact*; for example, a Gauss-Legendre formula of high order will have the form \sum w_j q(x_j) \;\mbox{ with all } w_j > 0 and will evaluate g(a) exactly.

RGV
 
Last edited:
hmm...I think your method only works for summations, and sorry I don't have background in Gauss-Legendre formula. My book also doesn't assume advanced math background, so I think this problem should be solvable without that...but thanks anyways...

If I write my Hessian matrix (call this matrix M) entries back in terms of an integral:

2/(j+k+1) =

1
∫ 2 xj xk dx
0

, is this going to help us to prove positive definiteness of the Hessian M?

vT M v =
----1
Ʃ Ʃ [∫ 2 xj xk dx] vj vk
j k 0

Is there any way to show that this is >0?

Thanks.
 
Actually I think I've found a method of prove positive definiteness of the Hessian of f.

If I write my Hessian matrix (call this matrix M) entries back in terms of an integral:

2/(j+k+1) =

1
∫ 2 xj xk dx
0

Then vT M v =
----1
Ʃ Ʃ [∫ 2 xj xk dx] vj vk
j k 0

= 2 * ∫(vn xn +...+v1x+v0)2 dx (the integral is over [0,1])
>0 for all non-zero v.

So this works!

But this idea of writing the expression back in terms of an integral looks very tricky to me. How come we can't prove positive definiteness when we write out the entries of the Hessian matrix as [2/(j+k+1)]? They should, in principle, be the same, right? Where is the loss of information occurring?
 
kingwinner said:
Actually I think I've found a method of prove positive definiteness of the Hessian of f.

If I write my Hessian matrix (call this matrix M) entries back in terms of an integral:

2/(j+k+1) =

1
∫ 2 xj xk dx
0

Then vT M v =
----1
Ʃ Ʃ [∫ 2 xj xk dx] vj vk
j k 0

= 2 * ∫(vn xn +...+v1x+v0)2 dx (the integral is over [0,1])
>0 for all non-zero v.

So this works!

But this idea of writing the expression back in terms of an integral looks very tricky to me. How come we can't prove positive definiteness when we write out the entries of the Hessian matrix as [2/(j+k+1)]? They should, in principle, be the same, right? Where is the loss of information occurring?

Sometimes a result about one thing is obtained most easily (if at all) by examining a seemingly-different result or system or method, or whatever. That is common, and you get used to it after a while. This is such a case.

There is another way you could see the result, but it still involves integration: change to an orthonormal set of polynomials on [0,1]: u_i is a polynomial of degree i and <u_i,u_i> = 1, \;<u_i,u_j> = 0 \mbox{ for } i \not= j. Here, <u,v> means <u,v> = \int_0^1 u(x)v(x) dx. We can obtain u_0, u_1, ... u_n from p_0 =1, p_1 = x, p_2 = x^2, \ldots, p_n = x^n using Gram-Schmidt orthogonalization. Therefore, in principle we can get the constant coefficients \{c_{ij}\} and \{e_{ij}\} giving u_i(x) = \sum_j c_{ij} p_j(x) and p_i(x) = \sum_j e_{ij }u_j(x).. Your quadratic form Q = (1/2)a^T H a = \int_0^1 q(x)^2 dx, where q(x) = \sum_i a_i p_i(x). Writing, instead, q(x) = \sum_j b_j u_j(x), we get Q = b_0^2 + b_1^2 + b_2^2 + \cdots + b_n^2. Since each b_i is a linear combination of the a_j, with known coefficients, we end up with Q as a sum of squares of linear functions of the a_j, as you wanted originally.

RGV
 
I see. I also think in lower dimensions, completeting the square may be helpful to show positive definiteness.

In this example, there is exactly one local minimum.
By positive definiteness of the Hessian of f at all a E Rn+1, f is strictly convex on Rn+1. This implies that every local minimum must be a global minimum. So I know that the local minimum must be a global minimum. But there could be more than one global minimum?

Claim: the global minimum point is unique.

How can uniqueness of the global minimum be proved/justified in this example?

Thanks.
 
kingwinner said:
I see. I also think in lower dimensions, completeting the square may be helpful to show positive definiteness.

In this example, there is exactly one local minimum.
By positive definiteness of the Hessian of f at all a E Rn+1, f is strictly convex on Rn+1. This implies that every local minimum must be a global minimum. So I know that the local minimum must be a global minimum. But there could be more than one global minimum?

Claim: the global minimum point is unique.

How can uniqueness of the global minimum be proved/justified in this example?

Thanks.

The global minimum is unique for a strictly convex function. You can get an easy contradiction if you assume otherwise.

RGV
 

Similar threads

Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 8 ·
Replies
8
Views
4K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
Replies
1
Views
2K