Question about convex property in Jensen's inequality

psie · Jun 28, 2024

I am reading a proof of Jensen's inequality. The proof goes like this.

Theorem 4.3: Let ##(\Omega, \mathcal A,\mu)## be a probability space and let ##\varphi:\mathbb R\to\mathbb R_+## be a convex function. Then for every ##f\in L^1(\Omega, \mathcal A,\mu)##, $$\int_\Omega \varphi\circ f\, d\mu\geq\varphi\left(\int_\Omega f\, d\mu\right).$$ Proof: Set $$\mathcal E_\varphi=\{(a,b)\in\mathbb R^2:\forall x\in\mathbb R,\varphi(x)\geq ax+b\}.$$ Then by elementary properties of convex functions, $$\varphi \left(x\right)=\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }}\left(ax+b\right).\tag1$$ ... ... ...

I do not know much about convex functions, but why does (1) hold?

The definition of convex I'm using is that $$\varphi(tx+(1-t)y)\leq t\varphi(x)+(1-t)\varphi(y)$$ holds for all ##x,y\in\mathbb R## and all ##t\in[0,1]##.

psie · Jun 28, 2024

From here, I found the answer. However I still have some questions:

If ##\phi## is convex, for each point ##(\alpha, \phi(\alpha))##, there exists an affine function ##f_\alpha(x) = a_\alpha x + b_\alpha## such that
- the line ##L_\alpha## corresponding to ##f_\alpha## passes through ##(\alpha, \phi(\alpha))##;
- the graph ##\phi## lies above ##L_\alpha##.
Let ##A = \{f_\alpha: \alpha \in \mathbb{R}\}## be the set of all such functions. We have
- ##\sup_{f_\alpha \in A} f_\alpha(x) \geq f_x(x) = \phi(x)## because ##f_x## passes through ##(x, \phi(x))##;
- ##\sup_{f_\alpha \in A} f_\alpha(x) \leq \phi(x)## because all ##f_\alpha## lies below ##\phi##.

How does one show that there exist such an affine function ##f_\alpha(x) = a_\alpha x + b_\alpha## with those properties given the definition I gave above?

fresh_42 · Jun 28, 2024

A function is convex iff its epigraph is convex. In case of differentiability, I assume we can simply use the tangents at ##(a, \varphi (a)).## The difficulty is the points that do not have one. My next assumption is, that such points have at least one-sided tangents that will do. These ideas are based on the functions ##x^2## and ##|x|## and I think they are typical.

Wikipedia said:

Eine auf einem offenen Intervall definierte, konvexe bzw. konkave Funktion ist lokal Lipschitz-stetig und somit nach dem Satz von Rademacher fast überall differenzierbar. Sie ist in jedem Punkt links- und rechtsseitig differenzierbar.

(A convex or concave function on an open interval is locally Lipschitz continuous and by Rademacher's theorem almost everywhere differentiable. It is at each point left- and right differentiable.)

Looks like my intuition is correct but the proof needs some consideration.

psie · Jun 28, 2024

Here is Durret's proof of the inequality (in his book Probability: Theory and Examples). He uses ##\phi## to denote the function ##\varphi## in the theorem in my original post:

Proof. Let ##c=\int f \,d \mu## and let ##l(x)=ax+b## be a linear function that has ##l(c)= \phi(c)## and ##\phi(x) \geq l(x)##. To see that such a function exists, recall that convexity implies
$$\lim_{h \to 0^+} \frac{\phi(c)−\phi(c−h)}{h} \leq \lim_{h \to 0^+} \frac{\phi(c+h)−\phi(c)}{h}\tag2$$(The limits exist since the sequences are monotone.)

If we let ##a## be any number between the two limits and let ##l(x) = a(x − c) + \phi(c)##, then ##l## has the desired properties. With the existence of ##l## established, the rest is easy. From the fact that if ##g \leq f## a.e., then ##\int g\, d\mu \leq \int f \,d\mu##, we have
$$ \int \phi(f ) \,d\mu \geq \int (af + b) \,d\mu = a \int f \,d\mu + b = l\left(\int f \,d\mu \right)= \phi\left(\int f \,d\mu \right),$$ since ##c = \int f \,d \mu## and ##l(c) = φ(c)##.

I can see why ##(2)## holds, but I do not see why the function ##l(x) = a(x − c) + \phi(c)## satisfies ##\phi(x) \geq l(x)##. Is this clear to someone? Also, when Durret says "The limits exist since the sequences are monotone", which sequences does he mean?

fresh_42 · Jun 28, 2024

psie said:

Here is Durret's proof of the inequality (in his book Probability: Theory and Examples). He uses ##\phi## to denote the function ##\varphi## in the theorem in my original post:

I can see why ##(2)## holds, but I do not see why the function ##l(x) = a(x − c) + \phi(c)## satisfies ##\phi(x) \geq l(x)##. Is this clear to someone? Also, when Durret says "The limits exist since the sequences are monotone", which sequences does he mean?

Isn't ##l(x)=a(x − c) + \phi(c) =ax + \underbrace{(\phi(c)-c)}_{=b}\leq \phi(x)## simply the definition of ##\mathcal{E}_\varphi =\mathcal{E}_\phi\;##?

Durret probably means with sequence something like ...
$$
\lim_{h \to 0^+} f(h) = \lim_{n \to \infty} f(n^{-1})
$$

Question about convex property in Jensen's inequality

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad About the definition of topological manifold using closed sets

Graduate Hopf fibration of 3-sphere

Undergrad Apparent counterexample to Cauchy-Goursat theorem (Complex Analysis)

Graduate Trivial fiber bundle vs product space

Graduate Shauder basis for Hilbert spaces

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers