Show Hessian is positive definite

kingwinner · Jan 26, 2012

Homework Statement

Consider the function f(a)=
1
∫ [g(x)-(a_nxⁿ+a_n-1x^n-1+...+a₀)]² dx
0
where a=(a₀,a₁,...a_n) and g is some known function defined on [0,1].

From this, we can show that

Thus, the Hessian of f at a = [2/(j+k+1)] j=0,1,2,...n; k=0,1,2,...,n.

Fact: This Hessian matrix is positive definite.

Now how can we prove that this (n+1) x (n+1) matrix is positive definite? (i.e. v^T (Hessian) v >0 for all v E Rⁿ⁺¹, v≠0.)
When I multiply out the v^T (Hessian) v, it just doesn't seem clear to me at all that it is >0.

Homework Equations

N/A

The Attempt at a Solution

Shown above.

Any help is appreciated!

Ray Vickson · Jan 26, 2012

kingwinner said:

Homework Statement

Consider the function f(a)=
1
∫ [g(x)-(a_nxⁿ+a_n-1x^n-1+...+a₀)]² dx
0
where a=(a₀,a₁,...a_n) and g is some known function defined on [0,1].

From this, we can show that

Thus, the Hessian of f at a = [2/(j+k+1)] j=0,1,2,...n; k=0,1,2,...,n.

Fact: This Hessian matrix is positive definite.

Now how can we prove that this (n+1) x (n+1) matrix is positive definite? (i.e. v^T (Hessian) v >0 for all v E Rⁿ⁺¹, v≠0.)
When I multiply out the v^T (Hessian) v, it just doesn't seem clear to me at all that it is >0.

Homework Equations

N/A

The Attempt at a Solution

Shown above.

Any help is appreciated!

You can do it indirectly: for each fixed x, the function F_x(\vec{a}) = [g(x) - a_0 - a_1 x - ... - a_n x^n]^2 is a strictly convex function of \vec{a}; you can either get this by looking at the Hessian of F_x in \vec{a}-space, or establish it directly from the definition of strict convexity (i.e., show directly that F_x(\lambda \vec{a} + (1-\lambda) \vec{b}) < \lambda F_x(\vec{a}) + (1-\lambda) F_x(\vec{b}) for \vec{a} \not= \vec{b} \mbox{ and all } 0 < \lambda < 1. Strict convexity is preserved by integrating (again, easy to show from the above inequality), so your f(\vec{a}) is strictly convex and quadratic in \vec{a}. Then, by a standard theorem, the Hessian of f is positive definite.

RGV

kingwinner · Jan 28, 2012

(I've just found some typos and realize some formatting error in my previous post (post #3), but for some reason I'm unable to edit it now (weird?!) So sorry I have to re-post it again. Moderator if you see this message, please delete my post #3)

Mod note: Done[/color]

But the problem is acutally I'm trying to demonstrate that the converse of your theorem is true in this example. I want to show that the Hessian matrix of f is positive definite directly, and hence conclude it is strictly convex.

In this example, the Hessian of f at a is an (n+1) x (n+1) matrix M given by
[M_jk]=[2/(j+k+1)] for j=0,1,2,...n; k=0,1,2,...,n.
I need to show v^T (Hessian) v >0 for all v E Rⁿ⁺¹, v≠0.

When I multiply out the v^T (Hessian) v, I will get some sum of squares, BUT then I will also have a whole bunch of cross terms, how can I show that it is >0?

I'm also pretty confused generally about how to use the definition of positive definiteness (i.e. v^T A v >0) to actually SHOW a matrix is positive definite. How can we actually prove those things? When I multiply out the v^T A v, I will always get a sum of squares plus many cross terms (unless the matrix A is a diagonal matrix), and it's not at all obvious whether the result is >0 or not.

Thanks for helping.

Ray Vickson · Jan 28, 2012

kingwinner said:

(I've just found some typos and realize some formatting error in my previous post (post #3), but for some reason I'm unable to edit it now (weird?!) So sorry I have to re-post it again. Moderator if you see this message, please delete my post #3)

Mod note: Done[/color]

But the problem is acutally I'm trying to demonstrate that the converse of your theorem is true in this example. I want to show that the Hessian matrix of f is positive definite directly, and hence conclude it is strictly convex.

In this example, the Hessian of f at a is an (n+1) x (n+1) matrix M given by
[M_jk]=[2/(j+k+1)] for j=0,1,2,...n; k=0,1,2,...,n.
I need to show v^T (Hessian) v >0 for all v E Rⁿ⁺¹, v≠0.

When I multiply out the v^T (Hessian) v, I will get some sum of squares, BUT then I will also have a whole bunch of cross terms, how can I show that it is >0?

I'm also pretty confused generally about how to use the definition of positive definiteness (i.e. v^T A v >0) to actually SHOW a matrix is positive definite. How can we actually prove those things? When I multiply out the v^T A v, I will always get a sum of squares plus many cross terms (unless the matrix A is a diagonal matrix), and it's not at all obvious whether the result is >0 or not.

Thanks for helping.

Here is an partial approach that gets at the essence of the problem.

Note that your f(a) and the simpler function g(a) = \int_0^1 [a_0 + a_1 x + \cdots + a_n x^n]^2 dx have the same a-Hessian, so it is enough to look at g(a). Now, let us replace the integration by a summation of more than n+1 terms:
g_N(a) = \sum_{j=1}^N w_j [a_0 + a_1 x_j + \cdots + a_n x_j^n]^2, \; (\mbox{all }w_j > 0),
and note that each term in the summation is a convex function of the a_k (but not strictly convex). The jth term vanishes on a hyperplane <p_j,a> = 0, where p_j = (1, x_j, x_j^2, \ldots, x_j^n). If we have N > n+1 terms we cannot have all N terms vanishing for any nonzero a, because the vectors {p_j} are linearly independent: the determinant of (n+1) of the p_j is a Vandermonde determinant, which is non-vanishing for distinct x_j. That means that there is no nonzero vector lying on more than n+1 of the hyperplanes, and that means that the summation from 1 to N does not vanish for any nonzero vector a. That makes g_N(a) a positive-definite quadratic form (that is, a strictly convex function).

Now the clincher: the integrand q(x) of g(a) is a polynomial of degree 2n in x, so there is a scheme that renders the approximation g_N(a) *exact*; for example, a Gauss-Legendre formula of high order will have the form \sum w_j q(x_j) \;\mbox{ with all } w_j > 0 and will evaluate g(a) exactly.

RGV

kingwinner · Jan 28, 2012

hmm...I think your method only works for summations, and sorry I don't have background in Gauss-Legendre formula. My book also doesn't assume advanced math background, so I think this problem should be solvable without that...but thanks anyways...

If I write my Hessian matrix (call this matrix M) entries back in terms of an integral:

2/(j+k+1) =

1
∫ 2 x^j x^k dx
0

, is this going to help us to prove positive definiteness of the Hessian M?

v^T M v =
----1
Ʃ Ʃ [∫ 2 x^j x^k dx] v_j v_k
j k 0

Is there any way to show that this is >0?

Thanks.

kingwinner · Jan 29, 2012

Actually I think I've found a method of prove positive definiteness of the Hessian of f.

If I write my Hessian matrix (call this matrix M) entries back in terms of an integral:

2/(j+k+1) =

1
∫ 2 x^j x^k dx
0

Then v^T M v =
----1
Ʃ Ʃ [∫ 2 x^j x^k dx] v_j v_k
j k 0

= 2 * ∫(v_n xⁿ +...+v₁x+v₀)² dx (the integral is over [0,1])
>0 for all non-zero v.

So this works!

But this idea of writing the expression back in terms of an integral looks very tricky to me. How come we can't prove positive definiteness when we write out the entries of the Hessian matrix as [2/(j+k+1)]? They should, in principle, be the same, right? Where is the loss of information occurring?

Ray Vickson · Jan 29, 2012

kingwinner said:

Actually I think I've found a method of prove positive definiteness of the Hessian of f.

If I write my Hessian matrix (call this matrix M) entries back in terms of an integral:

2/(j+k+1) =

1
∫ 2 x^j x^k dx
0

Then v^T M v =
----1
Ʃ Ʃ [∫ 2 x^j x^k dx] v_j v_k
j k 0

= 2 * ∫(v_n xⁿ +...+v₁x+v₀)² dx (the integral is over [0,1])
>0 for all non-zero v.

So this works!

But this idea of writing the expression back in terms of an integral looks very tricky to me. How come we can't prove positive definiteness when we write out the entries of the Hessian matrix as [2/(j+k+1)]? They should, in principle, be the same, right? Where is the loss of information occurring?

Sometimes a result about one thing is obtained most easily (if at all) by examining a seemingly-different result or system or method, or whatever. That is common, and you get used to it after a while. This is such a case.

There is another way you could see the result, but it still involves integration: change to an orthonormal set of polynomials on [0,1]: u_i is a polynomial of degree i and <u_i,u_i> = 1, \;<u_i,u_j> = 0 \mbox{ for } i \not= j. Here, <u,v> means <u,v> = \int_0^1 u(x)v(x) dx. We can obtain u_0, u_1, ... u_n from p_0 =1, p_1 = x, p_2 = x^2, \ldots, p_n = x^n using Gram-Schmidt orthogonalization. Therefore, in principle we can get the constant coefficients \{c_{ij}\} and \{e_{ij}\} giving u_i(x) = \sum_j c_{ij} p_j(x) and p_i(x) = \sum_j e_{ij }u_j(x).. Your quadratic form Q = (1/2)a^T H a = \int_0^1 q(x)^2 dx, where q(x) = \sum_i a_i p_i(x). Writing, instead, q(x) = \sum_j b_j u_j(x), we get Q = b_0^2 + b_1^2 + b_2^2 + \cdots + b_n^2. Since each b_i is a linear combination of the a_j, with known coefficients, we end up with Q as a sum of squares of linear functions of the a_j, as you wanted originally.

RGV

kingwinner · Jan 29, 2012

I see. I also think in lower dimensions, completeting the square may be helpful to show positive definiteness.

In this example, there is exactly one local minimum.
By positive definiteness of the Hessian of f at all a E Rⁿ⁺¹, f is strictly convex on Rⁿ⁺¹. This implies that every local minimum must be a global minimum. So I know that the local minimum must be a global minimum. But there could be more than one global minimum?

Claim: the global minimum point is unique.

How can uniqueness of the global minimum be proved/justified in this example?

Thanks.

Ray Vickson · Jan 29, 2012

kingwinner said:

I see. I also think in lower dimensions, completeting the square may be helpful to show positive definiteness.

In this example, there is exactly one local minimum.
By positive definiteness of the Hessian of f at all a E Rⁿ⁺¹, f is strictly convex on Rⁿ⁺¹. This implies that every local minimum must be a global minimum. So I know that the local minimum must be a global minimum. But there could be more than one global minimum?

Claim: the global minimum point is unique.

How can uniqueness of the global minimum be proved/justified in this example?

Thanks.

The global minimum is unique for a strictly convex function. You can get an easy contradiction if you assume otherwise.

RGV

Show Hessian is positive definite

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

Similar threads

Distance between a Clock's hands when the distance is increasing most rapidly

Volume with spherical coordinates

Does this series converge uniformly?

Use greedy vertex coloring algorithm to prove the upper bound of χ

Conflicting definitions of linear independence

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers