Question about convex property in Jensen's inequality

Click For Summary

Discussion Overview

The discussion revolves around the properties of convex functions in the context of Jensen's inequality, specifically exploring the proof and implications of the theorem involving convex functions and integrals. Participants examine definitions, properties, and specific examples related to convexity, as well as the existence of certain affine functions.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant questions the validity of a specific property regarding the supremum of affine functions related to a convex function.
  • Another participant discusses the relationship between convex functions and their epigraphs, suggesting that differentiability plays a role in establishing certain properties.
  • A later reply presents a proof from Durret's book, raising questions about the existence of a linear function that satisfies certain conditions and the meaning of monotonicity in the context of limits.
  • Participants express uncertainty about why a specific linear function satisfies the inequality involving the convex function and seek clarification on the sequences mentioned in the proof.

Areas of Agreement / Disagreement

Participants do not reach consensus on several points, including the existence of specific affine functions and the implications of the proof presented. Multiple competing views and questions remain unresolved.

Contextual Notes

Participants reference various properties of convex functions, including differentiability and the definition of convexity, but do not fully resolve the implications or assumptions underlying these discussions.

psie
Messages
315
Reaction score
40
TL;DR
I am reading a proof of Jensen's inequality. I am getting stuck on an "elementary property" of convex functions.
I am reading a proof of Jensen's inequality. The proof goes like this.

Theorem 4.3: Let ##(\Omega, \mathcal A,\mu)## be a probability space and let ##\varphi:\mathbb R\to\mathbb R_+## be a convex function. Then for every ##f\in L^1(\Omega, \mathcal A,\mu)##, $$\int_\Omega \varphi\circ f\, d\mu\geq\varphi\left(\int_\Omega f\, d\mu\right).$$ Proof: Set $$\mathcal E_\varphi=\{(a,b)\in\mathbb R^2:\forall x\in\mathbb R,\varphi(x)\geq ax+b\}.$$ Then by elementary properties of convex functions, $$\varphi \left(x\right)=\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }}\left(ax+b\right).\tag1$$ ... ... ...
I do not know much about convex functions, but why does (1) hold?

The definition of convex I'm using is that $$\varphi(tx+(1-t)y)\leq t\varphi(x)+(1-t)\varphi(y)$$ holds for all ##x,y\in\mathbb R## and all ##t\in[0,1]##.
 
Physics news on Phys.org
From here, I found the answer. However I still have some questions:
If ##\phi## is convex, for each point ##(\alpha, \phi(\alpha))##, there exists an affine function ##f_\alpha(x) = a_\alpha x + b_\alpha## such that
- the line ##L_\alpha## corresponding to ##f_\alpha## passes through ##(\alpha, \phi(\alpha))##;
- the graph ##\phi## lies above ##L_\alpha##.
Let ##A = \{f_\alpha: \alpha \in \mathbb{R}\}## be the set of all such functions. We have
- ##\sup_{f_\alpha \in A} f_\alpha(x) \geq f_x(x) = \phi(x)## because ##f_x## passes through ##(x, \phi(x))##;
- ##\sup_{f_\alpha \in A} f_\alpha(x) \leq \phi(x)## because all ##f_\alpha## lies below ##\phi##.
How does one show that there exist such an affine function ##f_\alpha(x) = a_\alpha x + b_\alpha## with those properties given the definition I gave above?
 
A function is convex iff its epigraph is convex. In case of differentiability, I assume we can simply use the tangents at ##(a, \varphi (a)).## The difficulty is the points that do not have one. My next assumption is, that such points have at least one-sided tangents that will do. These ideas are based on the functions ##x^2## and ##|x|## and I think they are typical.
Wikipedia said:
Eine auf einem offenen Intervall definierte, konvexe bzw. konkave Funktion ist lokal Lipschitz-stetig und somit nach dem Satz von Rademacher fast überall differenzierbar. Sie ist in jedem Punkt links- und rechtsseitig differenzierbar.
(A convex or concave function on an open interval is locally Lipschitz continuous and by Rademacher's theorem almost everywhere differentiable. It is at each point left- and right differentiable.)

Looks like my intuition is correct but the proof needs some consideration.
 
  • Like
Likes   Reactions: psie
Here is Durret's proof of the inequality (in his book Probability: Theory and Examples). He uses ##\phi## to denote the function ##\varphi## in the theorem in my original post:

Proof. Let ##c=\int f \,d \mu## and let ##l(x)=ax+b## be a linear function that has ##l(c)= \phi(c)## and ##\phi(x) \geq l(x)##. To see that such a function exists, recall that convexity implies
$$\lim_{h \to 0^+} \frac{\phi(c)−\phi(c−h)}{h} \leq \lim_{h \to 0^+} \frac{\phi(c+h)−\phi(c)}{h}\tag2$$(The limits exist since the sequences are monotone.)

If we let ##a## be any number between the two limits and let ##l(x) = a(x − c) + \phi(c)##, then ##l## has the desired properties. With the existence of ##l## established, the rest is easy. From the fact that if ##g \leq f## a.e., then ##\int g\, d\mu \leq \int f \,d\mu##, we have
$$ \int \phi(f ) \,d\mu \geq \int (af + b) \,d\mu = a \int f \,d\mu + b = l\left(\int f \,d\mu \right)= \phi\left(\int f \,d\mu \right),$$ since ##c = \int f \,d \mu## and ##l(c) = φ(c)##.
I can see why ##(2)## holds, but I do not see why the function ##l(x) = a(x − c) + \phi(c)## satisfies ##\phi(x) \geq l(x)##. Is this clear to someone? Also, when Durret says "The limits exist since the sequences are monotone", which sequences does he mean?
 
psie said:
Here is Durret's proof of the inequality (in his book Probability: Theory and Examples). He uses ##\phi## to denote the function ##\varphi## in the theorem in my original post:


I can see why ##(2)## holds, but I do not see why the function ##l(x) = a(x − c) + \phi(c)## satisfies ##\phi(x) \geq l(x)##. Is this clear to someone? Also, when Durret says "The limits exist since the sequences are monotone", which sequences does he mean?
Isn't ##l(x)=a(x − c) + \phi(c) =ax + \underbrace{(\phi(c)-c)}_{=b}\leq \phi(x)## simply the definition of ##\mathcal{E}_\varphi =\mathcal{E}_\phi\;##?

Durret probably means with sequence something like ...
$$
\lim_{h \to 0^+} f(h) = \lim_{n \to \infty} f(n^{-1})
$$
 
  • Like
Likes   Reactions: psie

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K