Undergrad Question about convex property in Jensen's inequality

Click For Summary
SUMMARY

The discussion centers on the convex property in Jensen's inequality, specifically Theorem 4.3, which states that for a convex function ##\varphi:\mathbb R\to\mathbb R_+## and a function ##f\in L^1(\Omega, \mathcal A,\mu)##, the inequality $$\int_\Omega \varphi\circ f\, d\mu\geq\varphi\left(\int_\Omega f\, d\mu\right)$$ holds. Participants explore the existence of affine functions ##f_\alpha(x) = a_\alpha x + b_\alpha## that satisfy certain properties related to convex functions. The proof involves the use of limits and the properties of convexity, particularly in establishing the existence of linear functions that lie below the convex function.

PREREQUISITES
  • Understanding of convex functions and their properties.
  • Familiarity with measure theory, specifically the concept of integrable functions in the space ##L^1(\Omega, \mathcal A,\mu)##.
  • Knowledge of limits and differentiability, particularly in the context of convex analysis.
  • Basic understanding of Rademacher's theorem and its implications for differentiability of convex functions.
NEXT STEPS
  • Study the proof of Jensen's inequality in detail, focusing on the role of convex functions.
  • Learn about the properties of affine functions and their relationship to convex functions.
  • Explore Rademacher's theorem and its application in convex analysis.
  • Investigate the concept of epigraphs and their significance in understanding convexity.
USEFUL FOR

Mathematicians, statisticians, and students studying convex analysis, particularly those interested in inequalities and their proofs in probability theory.

psie
Messages
315
Reaction score
40
TL;DR
I am reading a proof of Jensen's inequality. I am getting stuck on an "elementary property" of convex functions.
I am reading a proof of Jensen's inequality. The proof goes like this.

Theorem 4.3: Let ##(\Omega, \mathcal A,\mu)## be a probability space and let ##\varphi:\mathbb R\to\mathbb R_+## be a convex function. Then for every ##f\in L^1(\Omega, \mathcal A,\mu)##, $$\int_\Omega \varphi\circ f\, d\mu\geq\varphi\left(\int_\Omega f\, d\mu\right).$$ Proof: Set $$\mathcal E_\varphi=\{(a,b)\in\mathbb R^2:\forall x\in\mathbb R,\varphi(x)\geq ax+b\}.$$ Then by elementary properties of convex functions, $$\varphi \left(x\right)=\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }}\left(ax+b\right).\tag1$$ ... ... ...
I do not know much about convex functions, but why does (1) hold?

The definition of convex I'm using is that $$\varphi(tx+(1-t)y)\leq t\varphi(x)+(1-t)\varphi(y)$$ holds for all ##x,y\in\mathbb R## and all ##t\in[0,1]##.
 
Physics news on Phys.org
From here, I found the answer. However I still have some questions:
If ##\phi## is convex, for each point ##(\alpha, \phi(\alpha))##, there exists an affine function ##f_\alpha(x) = a_\alpha x + b_\alpha## such that
- the line ##L_\alpha## corresponding to ##f_\alpha## passes through ##(\alpha, \phi(\alpha))##;
- the graph ##\phi## lies above ##L_\alpha##.
Let ##A = \{f_\alpha: \alpha \in \mathbb{R}\}## be the set of all such functions. We have
- ##\sup_{f_\alpha \in A} f_\alpha(x) \geq f_x(x) = \phi(x)## because ##f_x## passes through ##(x, \phi(x))##;
- ##\sup_{f_\alpha \in A} f_\alpha(x) \leq \phi(x)## because all ##f_\alpha## lies below ##\phi##.
How does one show that there exist such an affine function ##f_\alpha(x) = a_\alpha x + b_\alpha## with those properties given the definition I gave above?
 
A function is convex iff its epigraph is convex. In case of differentiability, I assume we can simply use the tangents at ##(a, \varphi (a)).## The difficulty is the points that do not have one. My next assumption is, that such points have at least one-sided tangents that will do. These ideas are based on the functions ##x^2## and ##|x|## and I think they are typical.
Wikipedia said:
Eine auf einem offenen Intervall definierte, konvexe bzw. konkave Funktion ist lokal Lipschitz-stetig und somit nach dem Satz von Rademacher fast überall differenzierbar. Sie ist in jedem Punkt links- und rechtsseitig differenzierbar.
(A convex or concave function on an open interval is locally Lipschitz continuous and by Rademacher's theorem almost everywhere differentiable. It is at each point left- and right differentiable.)

Looks like my intuition is correct but the proof needs some consideration.
 
Here is Durret's proof of the inequality (in his book Probability: Theory and Examples). He uses ##\phi## to denote the function ##\varphi## in the theorem in my original post:

Proof. Let ##c=\int f \,d \mu## and let ##l(x)=ax+b## be a linear function that has ##l(c)= \phi(c)## and ##\phi(x) \geq l(x)##. To see that such a function exists, recall that convexity implies
$$\lim_{h \to 0^+} \frac{\phi(c)−\phi(c−h)}{h} \leq \lim_{h \to 0^+} \frac{\phi(c+h)−\phi(c)}{h}\tag2$$(The limits exist since the sequences are monotone.)

If we let ##a## be any number between the two limits and let ##l(x) = a(x − c) + \phi(c)##, then ##l## has the desired properties. With the existence of ##l## established, the rest is easy. From the fact that if ##g \leq f## a.e., then ##\int g\, d\mu \leq \int f \,d\mu##, we have
$$ \int \phi(f ) \,d\mu \geq \int (af + b) \,d\mu = a \int f \,d\mu + b = l\left(\int f \,d\mu \right)= \phi\left(\int f \,d\mu \right),$$ since ##c = \int f \,d \mu## and ##l(c) = φ(c)##.
I can see why ##(2)## holds, but I do not see why the function ##l(x) = a(x − c) + \phi(c)## satisfies ##\phi(x) \geq l(x)##. Is this clear to someone? Also, when Durret says "The limits exist since the sequences are monotone", which sequences does he mean?
 
psie said:
Here is Durret's proof of the inequality (in his book Probability: Theory and Examples). He uses ##\phi## to denote the function ##\varphi## in the theorem in my original post:


I can see why ##(2)## holds, but I do not see why the function ##l(x) = a(x − c) + \phi(c)## satisfies ##\phi(x) \geq l(x)##. Is this clear to someone? Also, when Durret says "The limits exist since the sequences are monotone", which sequences does he mean?
Isn't ##l(x)=a(x − c) + \phi(c) =ax + \underbrace{(\phi(c)-c)}_{=b}\leq \phi(x)## simply the definition of ##\mathcal{E}_\varphi =\mathcal{E}_\phi\;##?

Durret probably means with sequence something like ...
$$
\lim_{h \to 0^+} f(h) = \lim_{n \to \infty} f(n^{-1})
$$
 
As shown by this animation, the fibers of the Hopf fibration of the 3-sphere are circles (click on a point on the sphere to visualize the associated fiber). As far as I understand, they never intersect and their union is the 3-sphere itself. I'd be sure whether the circles in the animation are given by stereographic projection of the 3-sphere from a point, say the "equivalent" of the ##S^2## north-pole. Assuming the viewpoint of 3-sphere defined by its embedding in ##\mathbb C^2## as...

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K