F convex iff Hessian matrix positive semidefinite

Click For Summary

Discussion Overview

The discussion revolves around the characterization of convex functions through the properties of their Hessian matrices. Participants explore the relationship between the convexity of a twice continuously differentiable function and the condition that its Hessian matrix is positive semidefinite. The conversation includes attempts to prove this relationship using derivatives and Taylor's theorem, with a focus on the mathematical reasoning involved.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • Some participants assert that a function is convex if the inequality involving the Hessian matrix being positive semidefinite holds for all points in its domain.
  • There is a discussion about the correct application of the chain rule versus the total derivative when computing derivatives of the function defined in terms of two points.
  • Some participants express uncertainty about how to derive that the second derivative of the function is non-negative given the positive semidefiniteness of the Hessian.
  • One participant proposes that the second derivative can be expressed in terms of the Hessian matrix and the difference between the two points.
  • There is a clarification regarding the use of Taylor's theorem and how it applies to the inequalities being discussed.

Areas of Agreement / Disagreement

Participants generally agree on the definitions and properties of convex functions and Hessians, but there is disagreement and uncertainty regarding the application of derivatives and the specific steps in the proof. The discussion remains unresolved in terms of the complete proof and understanding of the relationships involved.

Contextual Notes

Some participants express confusion about the mathematical steps involved, particularly regarding the application of derivatives and the implications of positive semidefiniteness. There are also unresolved questions about the application of Taylor's theorem in this context.

mathmari
Gold Member
MHB
Messages
4,984
Reaction score
7
Hey!

A function $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is convex if for all $x,y\in \mathbb{R}^n$ the inequality $$f(tx+(1-t)y)\leq tf(x)+(1-t)f(y)$$ holds for all $t\in [0,1]$.

Show that a twice continuously differentiable funtion $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is convex iff the Hessian matrix is positive semidefinite for all $x\in \mathbb{R}^n$. I have read on some sites the idea of the proof but I am not sure that I understood that correctly. (Worried)

We suppose that the Hessian matrix is positive semidefinite for all $x\in \mathbb{R}^n$. We define the function:
$$g(t) = f(t \mathbf{x} + (1-t) \mathbf{y})$$
Then we can compute the derivatives of $g$:
$$g'(t) = ( \mathbf{x} - \mathbf{y})^T \mathbf{\nabla}f(t \mathbf{x} + (1-t) \mathbf{y})$$
$$g''(t) = ( \mathbf{x} - \mathbf{y})^T \mathbf{\nabla^2}f(t \mathbf{x} + (1-t) \mathbf{y}) ( \mathbf{x} - \mathbf{y})$$

For the derivatives we use the chain rule, right? Do we use this rule as follows? $$g'(t) = \frac{\partial}{\partial{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{\partial{f_i}}{\partial{t}}=\sum_i\frac{\partial}{\partial{f_1}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
:unsure:

Since the Hessian is positive semidefinite, we have $g''(t) \ge 0$ for all $t$. I haven't really understood this one. How do we get that $g''(t) \ge 0$ ?

Then we use this with Taylor's theorem to prove that:
$$
\begin{aligned}
g(0) &\ge g(t) + g'(t)(-t)\\
g(1) &\ge g(t) + g'(t)(1-t)
\end{aligned}
$$
Then if $t \in [0,1]$, these can then be combined to give:
$$
g(t) \le tg(1) + (1-t)g(0)
$$
which is equivalent to the inequality we wanted to prove. :unsure:
 
Physics news on Phys.org
mathmari said:
For the derivatives we use the chain rule, right? Do we use this rule as follows? $$g'(t) = \frac{\partial}{\partial{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{\partial{f_i}}{\partial{t}}=\sum_i\frac{\partial}{\partial{f_1}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

Hey mathmari!

We cannot take partial derivatives with respect to $t$, because $t$ is not a parameter of $f$. :eek:

Instead we need to take the total derivative $\frac d{dt}$:
$$g'(t) = \frac{d}{d{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
🧐

mathmari said:
Since the Hessian is positive semidefinite, we have $g''(t) \ge 0$ for all $t$. I haven't really understood this one. How do we get that $g''(t) \ge 0$ ?

We have that $g''(t)=(\mathbf x-\mathbf y)^T H(\mathbf x-\mathbf y)$ with $H$ positive semidefinite.

By definition matrix $H$ is called positive semidefinite if for all $\mathbf z$ we have that $\mathbf z^T H \mathbf z \ge 0$. 🤔
 
Klaas van Aarsen said:
Instead we need to take the total derivative $\frac d{dt}$:
$$g'(t) = \frac{d}{d{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
🧐

Ah ok! This is clear now! (Malthe)

So the second derivative of $g$ is then calculated as follows?
$$g''(t) = \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

:unsure:
 
mathmari said:
So the second derivative of $g$ is then calculated as follows?
$$g''(t) = \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
Nope. (Shake)

It should be:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\end{align*}
🤔
 
Klaas van Aarsen said:
It should be:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\end{align*}
🤔
Ah ok!

Then do we get \begin{align*}\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\\ & =\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\end{align*}

🤔

But how do we get the form $ (\mathbf{x}-\mathbf{y})^TH (\mathbf{x}-\mathbf{y})$ ? 🤔
 
mathmari said:
But how do we get the form $ (\mathbf{x}-\mathbf{y})^TH (\mathbf{x}-\mathbf{y})$ ?
Don't we have $\mathbf a \cdot \mathbf b = \mathbf b^T\mathbf a$? 🤔
 
Klaas van Aarsen said:
Don't we have $\mathbf a \cdot \mathbf b = \mathbf b^T\mathbf a$? 🤔
Ahh yes!

So we have in this case:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\\ &=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\\ & =\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\\ & =(\mathbf{x}-\mathbf{y})^T\cdot\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\end{align*}
The middle term is the Hessian matrix? 🤔

 
mathmari said:
The middle term is the Hessian matrix?
Yep. (Nod)
 
Klaas van Aarsen said:
Yep. (Nod)

Great!

How do we get from Taylor's theorem that
$$
\begin{aligned}
g(0) &\ge g(t) + g'(t)(-t)\\
g(1) &\ge g(t) + g'(t)(1-t)
\end{aligned}
$$
? Do we use this formula fro $k=2$ ? :unsure:
 
  • #10
mathmari said:
Do we use this formula fro $k=2$ ? :unsure:
Yep. (Nod)
Well, actually it's for $k=1$ and the remainder $R_k$ refers to $(k+1)=2$. 🧐
 
  • #11
Klaas van Aarsen said:
Yep. (Nod)
Well, actually it's for $k=1$ and the remainder $R_k$ refers to $(k+1)=2$. 🧐

So we have
$$g(t)=g(0)+g'(0)(t-0)+\frac{g''(\xi)}{2}(t-0)^2 \Rightarrow \frac{g''(\xi)}{2}(t-0)^2=g(t)-g(0)-g'(0)(t-0)$$ Since $g''(x)\geq 0$ for every $x$ we get $$\frac{g''(\xi)}{2}(t-0)^2\geq 0 \Rightarrow g(t)-g(0)-g'(0)(t-0)\geq0 $$

$$g(t)=g(1)+g'(1)(t-1)+\frac{g''(\xi)}{2}(t-1)^2 \Rightarrow \frac{g''(\xi)}{2}(t-1)^2=g(t)-g(1)-g'(1)(t-1)$$ Since $g''(x)\geq 0$ for every $x$ we get $$\frac{g''(\xi)}{2}(t-1)^2\geq 0 \Rightarrow g(t)-g(1)-g'(1)(t-1)\geq0 $$

Is that correct so far? :unsure:
 
  • #12
mathmari said:
Is that correct so far?
It is correct, but we won't be able to eliminate both $g'(0)$ and $g'(1)$ like that.
Instead we should pick a different expansion. 🤔
 
  • #13
Klaas van Aarsen said:
It is correct, but we won't be able to eliminate both $g'(0)$ and $g'(1)$ like that.
Instead we should pick a different expansion. 🤔

You mean at a different point? 🤔
 
  • #14
mathmari said:
You mean at a different point?
Yep. (Nod)
 
  • #15
Klaas van Aarsen said:
Yep. (Nod)

I don't really see what point we could take so that $g'(t)$ is still in the inequality.
We would get the inequality $g(0) \ge g(t) + g'(t)(-t)$ if we take at the formula of Wikipedia $a=t$ and $x=0$. But can we take these values? :unsure:

 
  • #16
mathmari said:
I don't really see what point we could take so that $g'(t)$ is still in the inequality.
We would get the inequality $g(0) \ge g(t) + g'(t)(-t)$ if we take at the formula of Wikipedia $a=t$ and $x=0$. But can we take these values?
We are free to choose any values that we want.
So yes, we can take those values. :geek:
 
  • #17
Klaas van Aarsen said:
We are free to choose any values that we want.
So yes, we can take those values. :geek:

So that means that we expand at $a=t$ ? :unsure:
 
  • #18
mathmari said:
So that means that we expand at $a=t$?
Let's try. :unsure:
 
  • #19
Klaas van Aarsen said:
Let's try. :unsure:

We have then that $$g(x)=g(t)+g'(t)(x-t)+\frac{g''(\xi)}{2}(x-t)^2\Rightarrow \frac{g''(\xi)}{2}(x-t)^2=g(x)-g(t)-g'(t)(x-t)\geq 0$$ For $x=0$ we get $$g(0)-g(t)-g'(t)(0-t)\geq 0$$ For $x=1$ we get $$g(1)-g(t)-g'(t)(1-t)\geq 0$$

Is that correct? :unsure:
 
  • #20
Looks good. (Nod)
 
  • #21
Klaas van Aarsen said:
Looks good. (Nod)

I think this direction is now clear to me! 🤩

For the other direction, we suppose that $f$ is convex and we want to show that the Hessian matrix is positiv semidefinite.
Do we need in this direction again the function $g$ ? :unsure:
 
  • #22
mathmari said:
For the other direction, we suppose that $f$ is convex and we want to show that the Hessian matrix is positiv semidefinite.
Do we need in this direction again the function $g$ ?

Do you have a hint for the other direction? 🤔
 
  • #23
Klaas van Aarsen said:
Do you have a hint for the other direction? 🤔

I found a proof online:

1621666181354.png
So doing it in the way as above, we needto take the $g$ as before, or not? :unsure:
 
  • #24
mathmari said:
So doing it in the way as above, we needto take the $g$ as before, or not?

I don't think so.
The $g(t)=f(tx+(1-t)y)$ that we had, matches $f(x+t(y-x)$ in the $H \succcurlyeq 0 \implies \text{convex}$ direction, which is used in a Taylor expansion.
But for the converse we need a different $g$. 🤔
 
  • #25
I found in some other notes a proof using the function $g(t)=f(\mathbf{x}+t\mathbf{v})$ and now it is clear! :)
 

Similar threads

  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 4 ·
Replies
4
Views
1K
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
1K
  • · Replies 1 ·
Replies
1
Views
698