MHB F convex iff Hessian matrix positive semidefinite

mathmari · May 21, 2021

Hey!

A function $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is convex if for all $x,y\in \mathbb{R}^n$ the inequality $$f(tx+(1-t)y)\leq tf(x)+(1-t)f(y)$$ holds for all $t\in [0,1]$.

Show that a twice continuously differentiable funtion $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is convex iff the Hessian matrix is positive semidefinite for all $x\in \mathbb{R}^n$. I have read on some sites the idea of the proof but I am not sure that I understood that correctly. (Worried)

We suppose that the Hessian matrix is positive semidefinite for all $x\in \mathbb{R}^n$. We define the function:
$$g(t) = f(t \mathbf{x} + (1-t) \mathbf{y})$$
Then we can compute the derivatives of $g$:
$$g'(t) = ( \mathbf{x} - \mathbf{y})^T \mathbf{\nabla}f(t \mathbf{x} + (1-t) \mathbf{y})$$
$$g''(t) = ( \mathbf{x} - \mathbf{y})^T \mathbf{\nabla^2}f(t \mathbf{x} + (1-t) \mathbf{y}) ( \mathbf{x} - \mathbf{y})$$

For the derivatives we use the chain rule, right? Do we use this rule as follows? $$g'(t) = \frac{\partial}{\partial{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{\partial{f_i}}{\partial{t}}=\sum_i\frac{\partial}{\partial{f_1}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
:unsure:

Since the Hessian is positive semidefinite, we have $g''(t) \ge 0$ for all $t$. I haven't really understood this one. How do we get that $g''(t) \ge 0$ ?

Then we use this with Taylor's theorem to prove that:
$$
\begin{aligned}
g(0) &\ge g(t) + g'(t)(-t)\\
g(1) &\ge g(t) + g'(t)(1-t)
\end{aligned}
$$
Then if $t \in [0,1]$, these can then be combined to give:
$$
g(t) \le tg(1) + (1-t)g(0)
$$
which is equivalent to the inequality we wanted to prove. :unsure:

I like Serena · May 21, 2021

mathmari said:

For the derivatives we use the chain rule, right? Do we use this rule as follows? $$g'(t) = \frac{\partial}{\partial{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{\partial{f_i}}{\partial{t}}=\sum_i\frac{\partial}{\partial{f_1}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

Hey mathmari!

We cannot take partial derivatives with respect to $t$, because $t$ is not a parameter of $f$.

Instead we need to take the total derivative $\frac d{dt}$:
$$g'(t) = \frac{d}{d{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

mathmari said:

Since the Hessian is positive semidefinite, we have $g''(t) \ge 0$ for all $t$. I haven't really understood this one. How do we get that $g''(t) \ge 0$ ?

We have that $g''(t)=(\mathbf x-\mathbf y)^T H(\mathbf x-\mathbf y)$ with $H$ positive semidefinite.

By definition matrix $H$ is called positive semidefinite if for all $\mathbf z$ we have that $\mathbf z^T H \mathbf z \ge 0$.

mathmari · May 21, 2021

Klaas van Aarsen said:

Instead we need to take the total derivative $\frac d{dt}$:
$$g'(t) = \frac{d}{d{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

Ah ok! This is clear now! (Malthe)

So the second derivative of $g$ is then calculated as follows?
$$g''(t) = \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

:unsure:

I like Serena · May 21, 2021

mathmari said:

So the second derivative of $g$ is then calculated as follows?
$$g''(t) = \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

Nope. (Shake)

It should be:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\end{align*}

mathmari · May 21, 2021

Klaas van Aarsen said:

It should be:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\end{align*}

Ah ok!

Then do we get \begin{align*}\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\\ & =\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\end{align*}

But how do we get the form $ (\mathbf{x}-\mathbf{y})^TH (\mathbf{x}-\mathbf{y})$ ?

I like Serena · May 21, 2021

mathmari said:

But how do we get the form $ (\mathbf{x}-\mathbf{y})^TH (\mathbf{x}-\mathbf{y})$ ?

Don't we have $\mathbf a \cdot \mathbf b = \mathbf b^T\mathbf a$?

mathmari · May 21, 2021

Klaas van Aarsen said:

Don't we have $\mathbf a \cdot \mathbf b = \mathbf b^T\mathbf a$?

Ahh yes!

So we have in this case:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\\ &=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\\ & =\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\\ & =(\mathbf{x}-\mathbf{y})^T\cdot\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\end{align*}
The middle term is the Hessian matrix?

I like Serena · May 21, 2021

mathmari said:

The middle term is the Hessian matrix?

Yep. (Nod)

mathmari · May 21, 2021

Klaas van Aarsen said:

Yep. (Nod)

Great!

How do we get from Taylor's theorem that
$$
\begin{aligned}
g(0) &\ge g(t) + g'(t)(-t)\\
g(1) &\ge g(t) + g'(t)(1-t)
\end{aligned}
$$
? Do we use this formula fro $k=2$ ? :unsure:

I like Serena · May 21, 2021

mathmari said:

Do we use this formula fro $k=2$ ? :unsure:

Yep. (Nod)
Well, actually it's for $k=1$ and the remainder $R_k$ refers to $(k+1)=2$.

mathmari · May 21, 2021

Klaas van Aarsen said:

Yep. (Nod)
Well, actually it's for $k=1$ and the remainder $R_k$ refers to $(k+1)=2$.

So we have
$$g(t)=g(0)+g'(0)(t-0)+\frac{g''(\xi)}{2}(t-0)^2 \Rightarrow \frac{g''(\xi)}{2}(t-0)^2=g(t)-g(0)-g'(0)(t-0)$$ Since $g''(x)\geq 0$ for every $x$ we get $$\frac{g''(\xi)}{2}(t-0)^2\geq 0 \Rightarrow g(t)-g(0)-g'(0)(t-0)\geq0 $$

$$g(t)=g(1)+g'(1)(t-1)+\frac{g''(\xi)}{2}(t-1)^2 \Rightarrow \frac{g''(\xi)}{2}(t-1)^2=g(t)-g(1)-g'(1)(t-1)$$ Since $g''(x)\geq 0$ for every $x$ we get $$\frac{g''(\xi)}{2}(t-1)^2\geq 0 \Rightarrow g(t)-g(1)-g'(1)(t-1)\geq0 $$

Is that correct so far? :unsure:

I like Serena · May 21, 2021

mathmari said:

Is that correct so far?

It is correct, but we won't be able to eliminate both $g'(0)$ and $g'(1)$ like that.
Instead we should pick a different expansion.

mathmari · May 21, 2021

Klaas van Aarsen said:

It is correct, but we won't be able to eliminate both $g'(0)$ and $g'(1)$ like that.
Instead we should pick a different expansion.

You mean at a different point?

I like Serena · May 21, 2021

mathmari said:

You mean at a different point?

Yep. (Nod)

mathmari · May 21, 2021

Klaas van Aarsen said:

Yep. (Nod)

I don't really see what point we could take so that $g'(t)$ is still in the inequality.
We would get the inequality $g(0) \ge g(t) + g'(t)(-t)$ if we take at the formula of Wikipedia $a=t$ and $x=0$. But can we take these values? :unsure:

I like Serena · May 21, 2021

mathmari said:

I don't really see what point we could take so that $g'(t)$ is still in the inequality.
We would get the inequality $g(0) \ge g(t) + g'(t)(-t)$ if we take at the formula of Wikipedia $a=t$ and $x=0$. But can we take these values?

We are free to choose any values that we want.
So yes, we can take those values. :geek:

mathmari · May 21, 2021

Klaas van Aarsen said:

We are free to choose any values that we want.
So yes, we can take those values. :geek:

So that means that we expand at $a=t$ ? :unsure:

I like Serena · May 21, 2021

mathmari said:

So that means that we expand at $a=t$?

Let's try. :unsure:

mathmari · May 21, 2021

Klaas van Aarsen said:

Let's try. :unsure:

We have then that $$g(x)=g(t)+g'(t)(x-t)+\frac{g''(\xi)}{2}(x-t)^2\Rightarrow \frac{g''(\xi)}{2}(x-t)^2=g(x)-g(t)-g'(t)(x-t)\geq 0$$ For $x=0$ we get $$g(0)-g(t)-g'(t)(0-t)\geq 0$$ For $x=1$ we get $$g(1)-g(t)-g'(t)(1-t)\geq 0$$

Is that correct? :unsure:

I like Serena · May 21, 2021

Looks good. (Nod)

mathmari · May 21, 2021

Klaas van Aarsen said:

Looks good. (Nod)

I think this direction is now clear to me!

For the other direction, we suppose that $f$ is convex and we want to show that the Hessian matrix is positiv semidefinite.
Do we need in this direction again the function $g$ ? :unsure:

I like Serena · May 21, 2021

mathmari said:

For the other direction, we suppose that $f$ is convex and we want to show that the Hessian matrix is positiv semidefinite.
Do we need in this direction again the function $g$ ?

Do you have a hint for the other direction?

mathmari · May 22, 2021

Klaas van Aarsen said:

Do you have a hint for the other direction?

I found a proof online:

So doing it in the way as above, we needto take the $g$ as before, or not? :unsure:

I like Serena · May 22, 2021

mathmari said:

So doing it in the way as above, we needto take the $g$ as before, or not?

I don't think so.
The $g(t)=f(tx+(1-t)y)$ that we had, matches $f(x+t(y-x)$ in the $H \succcurlyeq 0 \implies \text{convex}$ direction, which is used in a Taylor expansion.
But for the converse we need a different $g$.

mathmari · May 26, 2021

I found in some other notes a proof using the function $g(t)=f(\mathbf{x}+t\mathbf{v})$ and now it is clear! :)

MHB F convex iff Hessian matrix positive semidefinite

Similar threads

I Apparent counterexample to Cauchy-Goursat theorem (Complex Analysis)

I No structure in ##(d,x)## in ##\textbf{Met*}(X)## is admissible

I How are the following three definitions subtly different?

I About the definition of topological manifold using closed sets

I Convergence not defined by any metric

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers