# Homework Help: Show Hessian is positive definite

1. Jan 26, 2012

### kingwinner

1. The problem statement, all variables and given/known data
Consider the function f(a)=
1
∫ [g(x)-(anxn+an-1xn-1+...+a0)]2 dx
0
where a=(a0,a1,...an) and g is some known function defined on [0,1].

From this, we can show that

Thus, the Hessian of f at a = [2/(j+k+1)] j=0,1,2,...n; k=0,1,2,...,n.

Fact: This Hessian matrix is positive definite.

Now how can we prove that this (n+1) x (n+1) matrix is positive definite? (i.e. vT (Hessian) v >0 for all v E Rn+1, v≠0.)
When I multiply out the vT (Hessian) v, it just doesn't seem clear to me at all that it is >0.

2. Relevant equations
N/A

3. The attempt at a solution
Shown above.

Any help is appreciated!

2. Jan 26, 2012

### Ray Vickson

You can do it indirectly: for each fixed x, the function $F_x(\vec{a}) = [g(x) - a_0 - a_1 x - ... - a_n x^n]^2$ is a strictly convex function of $\vec{a}$; you can either get this by looking at the Hessian of $F_x$ in $\vec{a}-$space, or establish it directly from the definition of strict convexity (i.e., show directly that $F_x(\lambda \vec{a} + (1-\lambda) \vec{b}) < \lambda F_x(\vec{a}) + (1-\lambda) F_x(\vec{b})$ for $\vec{a} \not= \vec{b} \mbox{ and all } 0 < \lambda < 1.$ Strict convexity is preserved by integrating (again, easy to show from the above inequality), so your $f(\vec{a})$ is strictly convex and quadratic in $\vec{a}.$ Then, by a standard theorem, the Hessian of f is positive definite.

RGV

3. Jan 28, 2012

### kingwinner

(I've just found some typos and realize some formatting error in my previous post (post #3), but for some reason I'm unable to edit it now (weird?!) So sorry I have to re-post it again. Moderator if you see this message, please delete my post #3)

Mod note: Done

But the problem is acutally I'm trying to demonstrate that the converse of your theorem is true in this example. I want to show that the Hessian matrix of f is positive definite directly, and hence conclude it is strictly convex.

In this example, the Hessian of f at a is an (n+1) x (n+1) matrix M given by
[Mjk]=[2/(j+k+1)] for j=0,1,2,...n; k=0,1,2,...,n.
I need to show vT (Hessian) v >0 for all v E Rn+1, v≠0.

When I multiply out the vT (Hessian) v, I will get some sum of squares, BUT then I will also have a whole bunch of cross terms, how can I show that it is >0?

I'm also pretty confused generally about how to use the definition of positive definiteness (i.e. vT A v >0) to actually SHOW a matrix is positive definite. How can we actually prove those things? When I multiply out the vT A v, I will always get a sum of squares plus many cross terms (unless the matrix A is a diagonal matrix), and it's not at all obvious whether the result is >0 or not.

Thanks for helping.

Last edited: Jan 28, 2012
4. Jan 28, 2012

### Ray Vickson

Here is an partial approach that gets at the essence of the problem.

Note that your f(a) and the simpler function $g(a) = \int_0^1 [a_0 + a_1 x + \cdots + a_n x^n]^2 dx$ have the same a-Hessian, so it is enough to look at g(a). Now, let us replace the integration by a summation of more than n+1 terms:
$$g_N(a) = \sum_{j=1}^N w_j [a_0 + a_1 x_j + \cdots + a_n x_j^n]^2, \; (\mbox{all }w_j > 0),$$
and note that each term in the summation is a convex function of the a_k (but not strictly convex). The jth term vanishes on a hyperplane $<p_j,a> = 0,$ where $p_j = (1, x_j, x_j^2, \ldots, x_j^n).$ If we have N > n+1 terms we cannot have all N terms vanishing for any nonzero a, because the vectors {p_j} are linearly independent: the determinant of (n+1) of the p_j is a Vandermonde determinant, which is non-vanishing for distinct x_j. That means that there is no nonzero vector lying on more than n+1 of the hyperplanes, and that means that the summation from 1 to N does not vanish for any nonzero vector a. That makes g_N(a) a positive-definite quadratic form (that is, a strictly convex function).

Now the clincher: the integrand q(x) of g(a) is a polynomial of degree 2n in x, so there is a scheme that renders the approximation g_N(a) *exact*; for example, a Gauss-Legendre formula of high order will have the form $\sum w_j q(x_j) \;\mbox{ with all } w_j > 0$ and will evaluate g(a) exactly.

RGV

Last edited: Jan 28, 2012
5. Jan 28, 2012

### kingwinner

hmm...I think your method only works for summations, and sorry I don't have background in Gauss-Legendre formula. My book also doesn't assume advanced math background, so I think this problem should be solvable without that...but thanks anyways...

If I write my Hessian matrix (call this matrix M) entries back in terms of an integral:

2/(j+k+1) =

1
∫ 2 xj xk dx
0

, is this going to help us to prove positive definiteness of the Hessian M?

vT M v =
----1
Ʃ Ʃ [∫ 2 xj xk dx] vj vk
j k 0

Is there any way to show that this is >0?

Thanks.

6. Jan 29, 2012

### kingwinner

Actually I think I've found a method of prove positive definiteness of the Hessian of f.

If I write my Hessian matrix (call this matrix M) entries back in terms of an integral:

2/(j+k+1) =

1
∫ 2 xj xk dx
0

Then vT M v =
----1
Ʃ Ʃ [∫ 2 xj xk dx] vj vk
j k 0

= 2 * ∫(vn xn +...+v1x+v0)2 dx (the integral is over [0,1])
>0 for all non-zero v.

So this works!

But this idea of writing the expression back in terms of an integral looks very tricky to me. How come we can't prove positive definiteness when we write out the entries of the Hessian matrix as [2/(j+k+1)]? They should, in principle, be the same, right? Where is the loss of information occurring?

7. Jan 29, 2012

### Ray Vickson

Sometimes a result about one thing is obtained most easily (if at all) by examining a seemingly-different result or system or method, or whatever. That is common, and you get used to it after a while. This is such a case.

There is another way you could see the result, but it still involves integration: change to an orthonormal set of polynomials on [0,1]: $u_i$ is a polynomial of degree i and $<u_i,u_i> = 1, \;<u_i,u_j> = 0 \mbox{ for } i \not= j.$ Here, $<u,v>$ means $<u,v> = \int_0^1 u(x)v(x) dx.$ We can obtain $u_0, u_1, ... u_n$ from $p_0 =1, p_1 = x, p_2 = x^2, \ldots, p_n = x^n$ using Gram-Schmidt orthogonalization. Therefore, in principle we can get the constant coefficients $\{c_{ij}\}$ and $\{e_{ij}\}$ giving $u_i(x) = \sum_j c_{ij} p_j(x)$ and $p_i(x) = \sum_j e_{ij }u_j(x).$. Your quadratic form $Q = (1/2)a^T H a = \int_0^1 q(x)^2 dx,$ where $q(x) = \sum_i a_i p_i(x).$ Writing, instead, $q(x) = \sum_j b_j u_j(x),$ we get $Q = b_0^2 + b_1^2 + b_2^2 + \cdots + b_n^2.$ Since each b_i is a linear combination of the a_j, with known coefficients, we end up with Q as a sum of squares of linear functions of the a_j, as you wanted originally.

RGV

8. Jan 29, 2012

### kingwinner

I see. I also think in lower dimensions, completeting the square may be helpful to show positive definiteness.

In this example, there is exactly one local minimum.
By positive definiteness of the Hessian of f at all a E Rn+1, f is strictly convex on Rn+1. This implies that every local minimum must be a global minimum. So I know that the local minimum must be a global minimum. But there could be more than one global minimum?

Claim: the global minimum point is unique.

How can uniqueness of the global minimum be proved/justified in this example?

Thanks.

9. Jan 29, 2012

### Ray Vickson

The global minimum is unique for a strictly convex function. You can get an easy contradiction if you assume otherwise.

RGV