MHB F convex iff Hessian matrix positive semidefinite

mathmari
Gold Member
MHB
Messages
4,984
Reaction score
7
Hey!

A function $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is convex if for all $x,y\in \mathbb{R}^n$ the inequality $$f(tx+(1-t)y)\leq tf(x)+(1-t)f(y)$$ holds for all $t\in [0,1]$.

Show that a twice continuously differentiable funtion $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is convex iff the Hessian matrix is positive semidefinite for all $x\in \mathbb{R}^n$. I have read on some sites the idea of the proof but I am not sure that I understood that correctly. (Worried)

We suppose that the Hessian matrix is positive semidefinite for all $x\in \mathbb{R}^n$. We define the function:
$$g(t) = f(t \mathbf{x} + (1-t) \mathbf{y})$$
Then we can compute the derivatives of $g$:
$$g'(t) = ( \mathbf{x} - \mathbf{y})^T \mathbf{\nabla}f(t \mathbf{x} + (1-t) \mathbf{y})$$
$$g''(t) = ( \mathbf{x} - \mathbf{y})^T \mathbf{\nabla^2}f(t \mathbf{x} + (1-t) \mathbf{y}) ( \mathbf{x} - \mathbf{y})$$

For the derivatives we use the chain rule, right? Do we use this rule as follows? $$g'(t) = \frac{\partial}{\partial{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{\partial{f_i}}{\partial{t}}=\sum_i\frac{\partial}{\partial{f_1}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
:unsure:

Since the Hessian is positive semidefinite, we have $g''(t) \ge 0$ for all $t$. I haven't really understood this one. How do we get that $g''(t) \ge 0$ ?

Then we use this with Taylor's theorem to prove that:
$$
\begin{aligned}
g(0) &\ge g(t) + g'(t)(-t)\\
g(1) &\ge g(t) + g'(t)(1-t)
\end{aligned}
$$
Then if $t \in [0,1]$, these can then be combined to give:
$$
g(t) \le tg(1) + (1-t)g(0)
$$
which is equivalent to the inequality we wanted to prove. :unsure:
 
Physics news on Phys.org
mathmari said:
For the derivatives we use the chain rule, right? Do we use this rule as follows? $$g'(t) = \frac{\partial}{\partial{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{\partial{f_i}}{\partial{t}}=\sum_i\frac{\partial}{\partial{f_1}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

Hey mathmari!

We cannot take partial derivatives with respect to $t$, because $t$ is not a parameter of $f$. :eek:

Instead we need to take the total derivative $\frac d{dt}$:
$$g'(t) = \frac{d}{d{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
🧐

mathmari said:
Since the Hessian is positive semidefinite, we have $g''(t) \ge 0$ for all $t$. I haven't really understood this one. How do we get that $g''(t) \ge 0$ ?

We have that $g''(t)=(\mathbf x-\mathbf y)^T H(\mathbf x-\mathbf y)$ with $H$ positive semidefinite.

By definition matrix $H$ is called positive semidefinite if for all $\mathbf z$ we have that $\mathbf z^T H \mathbf z \ge 0$. 🤔
 
Klaas van Aarsen said:
Instead we need to take the total derivative $\frac d{dt}$:
$$g'(t) = \frac{d}{d{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
🧐

Ah ok! This is clear now! (Malthe)

So the second derivative of $g$ is then calculated as follows?
$$g''(t) = \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

:unsure:
 
mathmari said:
So the second derivative of $g$ is then calculated as follows?
$$g''(t) = \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
Nope. (Shake)

It should be:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\end{align*}
🤔
 
Klaas van Aarsen said:
It should be:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\end{align*}
🤔
Ah ok!

Then do we get \begin{align*}\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\\ & =\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\end{align*}

🤔

But how do we get the form $ (\mathbf{x}-\mathbf{y})^TH (\mathbf{x}-\mathbf{y})$ ? 🤔
 
mathmari said:
But how do we get the form $ (\mathbf{x}-\mathbf{y})^TH (\mathbf{x}-\mathbf{y})$ ?
Don't we have $\mathbf a \cdot \mathbf b = \mathbf b^T\mathbf a$? 🤔
 
Klaas van Aarsen said:
Don't we have $\mathbf a \cdot \mathbf b = \mathbf b^T\mathbf a$? 🤔
Ahh yes!

So we have in this case:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\\ &=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\\ & =\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\\ & =(\mathbf{x}-\mathbf{y})^T\cdot\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\end{align*}
The middle term is the Hessian matrix? 🤔

 
mathmari said:
The middle term is the Hessian matrix?
Yep. (Nod)
 
Klaas van Aarsen said:
Yep. (Nod)

Great!

How do we get from Taylor's theorem that
$$
\begin{aligned}
g(0) &\ge g(t) + g'(t)(-t)\\
g(1) &\ge g(t) + g'(t)(1-t)
\end{aligned}
$$
? Do we use this formula fro $k=2$ ? :unsure:
 
  • #10
mathmari said:
Do we use this formula fro $k=2$ ? :unsure:
Yep. (Nod)
Well, actually it's for $k=1$ and the remainder $R_k$ refers to $(k+1)=2$. 🧐
 
  • #11
Klaas van Aarsen said:
Yep. (Nod)
Well, actually it's for $k=1$ and the remainder $R_k$ refers to $(k+1)=2$. 🧐

So we have
$$g(t)=g(0)+g'(0)(t-0)+\frac{g''(\xi)}{2}(t-0)^2 \Rightarrow \frac{g''(\xi)}{2}(t-0)^2=g(t)-g(0)-g'(0)(t-0)$$ Since $g''(x)\geq 0$ for every $x$ we get $$\frac{g''(\xi)}{2}(t-0)^2\geq 0 \Rightarrow g(t)-g(0)-g'(0)(t-0)\geq0 $$

$$g(t)=g(1)+g'(1)(t-1)+\frac{g''(\xi)}{2}(t-1)^2 \Rightarrow \frac{g''(\xi)}{2}(t-1)^2=g(t)-g(1)-g'(1)(t-1)$$ Since $g''(x)\geq 0$ for every $x$ we get $$\frac{g''(\xi)}{2}(t-1)^2\geq 0 \Rightarrow g(t)-g(1)-g'(1)(t-1)\geq0 $$

Is that correct so far? :unsure:
 
  • #12
mathmari said:
Is that correct so far?
It is correct, but we won't be able to eliminate both $g'(0)$ and $g'(1)$ like that.
Instead we should pick a different expansion. 🤔
 
  • #13
Klaas van Aarsen said:
It is correct, but we won't be able to eliminate both $g'(0)$ and $g'(1)$ like that.
Instead we should pick a different expansion. 🤔

You mean at a different point? 🤔
 
  • #14
mathmari said:
You mean at a different point?
Yep. (Nod)
 
  • #15
Klaas van Aarsen said:
Yep. (Nod)

I don't really see what point we could take so that $g'(t)$ is still in the inequality.
We would get the inequality $g(0) \ge g(t) + g'(t)(-t)$ if we take at the formula of Wikipedia $a=t$ and $x=0$. But can we take these values? :unsure:

 
  • #16
mathmari said:
I don't really see what point we could take so that $g'(t)$ is still in the inequality.
We would get the inequality $g(0) \ge g(t) + g'(t)(-t)$ if we take at the formula of Wikipedia $a=t$ and $x=0$. But can we take these values?
We are free to choose any values that we want.
So yes, we can take those values. :geek:
 
  • #17
Klaas van Aarsen said:
We are free to choose any values that we want.
So yes, we can take those values. :geek:

So that means that we expand at $a=t$ ? :unsure:
 
  • #18
mathmari said:
So that means that we expand at $a=t$?
Let's try. :unsure:
 
  • #19
Klaas van Aarsen said:
Let's try. :unsure:

We have then that $$g(x)=g(t)+g'(t)(x-t)+\frac{g''(\xi)}{2}(x-t)^2\Rightarrow \frac{g''(\xi)}{2}(x-t)^2=g(x)-g(t)-g'(t)(x-t)\geq 0$$ For $x=0$ we get $$g(0)-g(t)-g'(t)(0-t)\geq 0$$ For $x=1$ we get $$g(1)-g(t)-g'(t)(1-t)\geq 0$$

Is that correct? :unsure:
 
  • #20
Looks good. (Nod)
 
  • #21
Klaas van Aarsen said:
Looks good. (Nod)

I think this direction is now clear to me! 🤩

For the other direction, we suppose that $f$ is convex and we want to show that the Hessian matrix is positiv semidefinite.
Do we need in this direction again the function $g$ ? :unsure:
 
  • #22
mathmari said:
For the other direction, we suppose that $f$ is convex and we want to show that the Hessian matrix is positiv semidefinite.
Do we need in this direction again the function $g$ ?

Do you have a hint for the other direction? 🤔
 
  • #23
Klaas van Aarsen said:
Do you have a hint for the other direction? 🤔

I found a proof online:

1621666181354.png
So doing it in the way as above, we needto take the $g$ as before, or not? :unsure:
 
  • #24
mathmari said:
So doing it in the way as above, we needto take the $g$ as before, or not?

I don't think so.
The $g(t)=f(tx+(1-t)y)$ that we had, matches $f(x+t(y-x)$ in the $H \succcurlyeq 0 \implies \text{convex}$ direction, which is used in a Taylor expansion.
But for the converse we need a different $g$. 🤔
 
  • #25
I found in some other notes a proof using the function $g(t)=f(\mathbf{x}+t\mathbf{v})$ and now it is clear! :)
 
Back
Top