MHB F convex iff Hessian matrix positive semidefinite

Click For Summary
SUMMARY

A function \( f:\mathbb{R}^n\rightarrow \mathbb{R} \) is convex if the Hessian matrix is positive semidefinite for all \( x \in \mathbb{R}^n \). The discussion confirms that for a twice continuously differentiable function, the condition \( g''(t) \ge 0 \) follows from the positive semidefiniteness of the Hessian matrix \( H \). The participants clarify the use of total derivatives instead of partial derivatives when computing \( g'(t) \) and \( g''(t) \). The proof involves Taylor's theorem and the relationship between the second derivative and the Hessian matrix.

PREREQUISITES
  • Understanding of convex functions in \( \mathbb{R}^n \)
  • Knowledge of Hessian matrices and their properties
  • Familiarity with Taylor's theorem
  • Ability to compute total derivatives
NEXT STEPS
  • Study the properties of positive semidefinite matrices
  • Learn about Taylor series expansions in multiple dimensions
  • Explore the implications of convexity in optimization problems
  • Investigate the relationship between Hessians and local minima
USEFUL FOR

Mathematicians, data scientists, and optimization specialists interested in the theoretical foundations of convex analysis and its applications in optimization and machine learning.

mathmari
Gold Member
MHB
Messages
4,984
Reaction score
7
Hey!

A function $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is convex if for all $x,y\in \mathbb{R}^n$ the inequality $$f(tx+(1-t)y)\leq tf(x)+(1-t)f(y)$$ holds for all $t\in [0,1]$.

Show that a twice continuously differentiable funtion $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is convex iff the Hessian matrix is positive semidefinite for all $x\in \mathbb{R}^n$. I have read on some sites the idea of the proof but I am not sure that I understood that correctly. (Worried)

We suppose that the Hessian matrix is positive semidefinite for all $x\in \mathbb{R}^n$. We define the function:
$$g(t) = f(t \mathbf{x} + (1-t) \mathbf{y})$$
Then we can compute the derivatives of $g$:
$$g'(t) = ( \mathbf{x} - \mathbf{y})^T \mathbf{\nabla}f(t \mathbf{x} + (1-t) \mathbf{y})$$
$$g''(t) = ( \mathbf{x} - \mathbf{y})^T \mathbf{\nabla^2}f(t \mathbf{x} + (1-t) \mathbf{y}) ( \mathbf{x} - \mathbf{y})$$

For the derivatives we use the chain rule, right? Do we use this rule as follows? $$g'(t) = \frac{\partial}{\partial{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{\partial{f_i}}{\partial{t}}=\sum_i\frac{\partial}{\partial{f_1}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
:unsure:

Since the Hessian is positive semidefinite, we have $g''(t) \ge 0$ for all $t$. I haven't really understood this one. How do we get that $g''(t) \ge 0$ ?

Then we use this with Taylor's theorem to prove that:
$$
\begin{aligned}
g(0) &\ge g(t) + g'(t)(-t)\\
g(1) &\ge g(t) + g'(t)(1-t)
\end{aligned}
$$
Then if $t \in [0,1]$, these can then be combined to give:
$$
g(t) \le tg(1) + (1-t)g(0)
$$
which is equivalent to the inequality we wanted to prove. :unsure:
 
Physics news on Phys.org
mathmari said:
For the derivatives we use the chain rule, right? Do we use this rule as follows? $$g'(t) = \frac{\partial}{\partial{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{\partial{f_i}}{\partial{t}}=\sum_i\frac{\partial}{\partial{f_1}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

Hey mathmari!

We cannot take partial derivatives with respect to $t$, because $t$ is not a parameter of $f$. :eek:

Instead we need to take the total derivative $\frac d{dt}$:
$$g'(t) = \frac{d}{d{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
🧐

mathmari said:
Since the Hessian is positive semidefinite, we have $g''(t) \ge 0$ for all $t$. I haven't really understood this one. How do we get that $g''(t) \ge 0$ ?

We have that $g''(t)=(\mathbf x-\mathbf y)^T H(\mathbf x-\mathbf y)$ with $H$ positive semidefinite.

By definition matrix $H$ is called positive semidefinite if for all $\mathbf z$ we have that $\mathbf z^T H \mathbf z \ge 0$. 🤔
 
Klaas van Aarsen said:
Instead we need to take the total derivative $\frac d{dt}$:
$$g'(t) = \frac{d}{d{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
🧐

Ah ok! This is clear now! (Malthe)

So the second derivative of $g$ is then calculated as follows?
$$g''(t) = \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

:unsure:
 
mathmari said:
So the second derivative of $g$ is then calculated as follows?
$$g''(t) = \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
Nope. (Shake)

It should be:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\end{align*}
🤔
 
Klaas van Aarsen said:
It should be:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\end{align*}
🤔
Ah ok!

Then do we get \begin{align*}\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\\ & =\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\end{align*}

🤔

But how do we get the form $ (\mathbf{x}-\mathbf{y})^TH (\mathbf{x}-\mathbf{y})$ ? 🤔
 
mathmari said:
But how do we get the form $ (\mathbf{x}-\mathbf{y})^TH (\mathbf{x}-\mathbf{y})$ ?
Don't we have $\mathbf a \cdot \mathbf b = \mathbf b^T\mathbf a$? 🤔
 
Klaas van Aarsen said:
Don't we have $\mathbf a \cdot \mathbf b = \mathbf b^T\mathbf a$? 🤔
Ahh yes!

So we have in this case:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\\ &=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\\ & =\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\\ & =(\mathbf{x}-\mathbf{y})^T\cdot\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\end{align*}
The middle term is the Hessian matrix? 🤔

 
mathmari said:
The middle term is the Hessian matrix?
Yep. (Nod)
 
Klaas van Aarsen said:
Yep. (Nod)

Great!

How do we get from Taylor's theorem that
$$
\begin{aligned}
g(0) &\ge g(t) + g'(t)(-t)\\
g(1) &\ge g(t) + g'(t)(1-t)
\end{aligned}
$$
? Do we use this formula fro $k=2$ ? :unsure:
 
  • #10
mathmari said:
Do we use this formula fro $k=2$ ? :unsure:
Yep. (Nod)
Well, actually it's for $k=1$ and the remainder $R_k$ refers to $(k+1)=2$. 🧐
 
  • #11
Klaas van Aarsen said:
Yep. (Nod)
Well, actually it's for $k=1$ and the remainder $R_k$ refers to $(k+1)=2$. 🧐

So we have
$$g(t)=g(0)+g'(0)(t-0)+\frac{g''(\xi)}{2}(t-0)^2 \Rightarrow \frac{g''(\xi)}{2}(t-0)^2=g(t)-g(0)-g'(0)(t-0)$$ Since $g''(x)\geq 0$ for every $x$ we get $$\frac{g''(\xi)}{2}(t-0)^2\geq 0 \Rightarrow g(t)-g(0)-g'(0)(t-0)\geq0 $$

$$g(t)=g(1)+g'(1)(t-1)+\frac{g''(\xi)}{2}(t-1)^2 \Rightarrow \frac{g''(\xi)}{2}(t-1)^2=g(t)-g(1)-g'(1)(t-1)$$ Since $g''(x)\geq 0$ for every $x$ we get $$\frac{g''(\xi)}{2}(t-1)^2\geq 0 \Rightarrow g(t)-g(1)-g'(1)(t-1)\geq0 $$

Is that correct so far? :unsure:
 
  • #12
mathmari said:
Is that correct so far?
It is correct, but we won't be able to eliminate both $g'(0)$ and $g'(1)$ like that.
Instead we should pick a different expansion. 🤔
 
  • #13
Klaas van Aarsen said:
It is correct, but we won't be able to eliminate both $g'(0)$ and $g'(1)$ like that.
Instead we should pick a different expansion. 🤔

You mean at a different point? 🤔
 
  • #14
mathmari said:
You mean at a different point?
Yep. (Nod)
 
  • #15
Klaas van Aarsen said:
Yep. (Nod)

I don't really see what point we could take so that $g'(t)$ is still in the inequality.
We would get the inequality $g(0) \ge g(t) + g'(t)(-t)$ if we take at the formula of Wikipedia $a=t$ and $x=0$. But can we take these values? :unsure:

 
  • #16
mathmari said:
I don't really see what point we could take so that $g'(t)$ is still in the inequality.
We would get the inequality $g(0) \ge g(t) + g'(t)(-t)$ if we take at the formula of Wikipedia $a=t$ and $x=0$. But can we take these values?
We are free to choose any values that we want.
So yes, we can take those values. :geek:
 
  • #17
Klaas van Aarsen said:
We are free to choose any values that we want.
So yes, we can take those values. :geek:

So that means that we expand at $a=t$ ? :unsure:
 
  • #18
mathmari said:
So that means that we expand at $a=t$?
Let's try. :unsure:
 
  • #19
Klaas van Aarsen said:
Let's try. :unsure:

We have then that $$g(x)=g(t)+g'(t)(x-t)+\frac{g''(\xi)}{2}(x-t)^2\Rightarrow \frac{g''(\xi)}{2}(x-t)^2=g(x)-g(t)-g'(t)(x-t)\geq 0$$ For $x=0$ we get $$g(0)-g(t)-g'(t)(0-t)\geq 0$$ For $x=1$ we get $$g(1)-g(t)-g'(t)(1-t)\geq 0$$

Is that correct? :unsure:
 
  • #20
Looks good. (Nod)
 
  • #21
Klaas van Aarsen said:
Looks good. (Nod)

I think this direction is now clear to me! 🤩

For the other direction, we suppose that $f$ is convex and we want to show that the Hessian matrix is positiv semidefinite.
Do we need in this direction again the function $g$ ? :unsure:
 
  • #22
mathmari said:
For the other direction, we suppose that $f$ is convex and we want to show that the Hessian matrix is positiv semidefinite.
Do we need in this direction again the function $g$ ?

Do you have a hint for the other direction? 🤔
 
  • #23
Klaas van Aarsen said:
Do you have a hint for the other direction? 🤔

I found a proof online:

1621666181354.png
So doing it in the way as above, we needto take the $g$ as before, or not? :unsure:
 
  • #24
mathmari said:
So doing it in the way as above, we needto take the $g$ as before, or not?

I don't think so.
The $g(t)=f(tx+(1-t)y)$ that we had, matches $f(x+t(y-x)$ in the $H \succcurlyeq 0 \implies \text{convex}$ direction, which is used in a Taylor expansion.
But for the converse we need a different $g$. 🤔
 
  • #25
I found in some other notes a proof using the function $g(t)=f(\mathbf{x}+t\mathbf{v})$ and now it is clear! :)
 

Similar threads

  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
5
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
757
  • · Replies 1 ·
Replies
1
Views
542
  • · Replies 4 ·
Replies
4
Views
3K