I On Jensen's inequality for conditional expectation

psie · Jun 29, 2024

Theorem. Let ##\varphi:\mathbb R\to\mathbb R_+## be a convex function and ##X\in L^1##, then $$E[\varphi(X)\mid\mathcal B]\geq\varphi(E[X\mid\mathcal B]).$$

Proof: Set $$\mathcal E_\varphi=\{(a,b)\in\mathbb R^2:\forall x\in\mathbb R,\varphi(x)\geq ax+b\}=$$ Then by convexity of ##\varphi##, $$\varphi \left(x\right)=\underbrace{\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }}\left(ax+b\right)}_{g(x)}=\underbrace{\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }\cap \mathbb Q^2}\left(ax+b\right)}_{h(x)}.$$ We can take advantage of the fact that ##\mathbb Q^2## is countable to disgard [I think it should be discard] a countable collection of sets of probability zero and to get that, a.s., \begin{align*} E[\varphi(X)\mid \mathcal B]&=E\left[\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }\cap \mathbb Q^2}\left(aX+b\right)\Bigm\vert \mathcal B\right] \\ &\geq \sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }\cap \mathbb Q^2}E[aX+b\mid\mathcal B] \\ &=\varphi(E[X\mid\mathcal B])\end{align*}

Questions:

1. I am a bit unsure why ##g(x)=h(x)##. Clearly ##g(x)\geq h(x)##, but why is ##g(x)\leq h(x)##? Here's my explanation, which is kind of lengthy, but maybe you have a better one.

If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)>ax+b## for all ##x\in\mathbb R##, then pick a number ##q_x## in between. Let ##q_x=a'x+b'## where by denseness we choose ##(a',b')## sufficiently close to ##(a,b)## so that ##q_x## satisfies the inequality ##\varphi(x)>q_x>ax+b## for all ##x\in\mathbb R##. Then ##(a',b')\in\mathcal E_\varphi\cap\mathbb Q^2##, and since ##(a,b)## was arbitrary, this shows that ##g(x)\leq h(x)## when ##\varphi(x)>ax+b##. If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)=ax+b##, then we approximate ##(a,b)## from below by rational pairs, and the supremum will give that ##g(x)=h(x)##. Does this make sense?

2. I do not understand what the author means by "We can take advantage of the fact that ##\mathbb Q^2## is countable to [discard] a countable collection of sets of probability zero..."? Moreover I am a bit unsure about the last inequality in the proof. Is this simply an application of monotonicity, i.e. $$\sup(aX+b)\geq aX+b\implies E[\sup(aX+b)\mid\mathcal B]\geq E[aX+b\mid\mathcal B],$$ so taking the supremum of this last inequality gives the desired inequality at the end of the proof. If my reasoning is correct, I don't see why we need to consider ##\mathcal E_\varphi\cap\mathbb Q^2##.

fresh_42 · Jun 29, 2024

psie said:

TL;DR Summary: I am reading a proof of Jensen's inequality for conditional expectation in Le Gall's book Measure Theory, Probability and Stochastic Processes. I am a bit surprised that this inequality does not simply follow from the measure theoretic form that has been previously established, but requires a new, somewhat technical proof. I have some questions about the proof.

Questions:

1. I am a bit unsure why ##g(x)=h(x)##. Clearly ##g(x)\geq h(x)##, but why is ##g(x)\leq h(x)##? Here's my explanation, which is kind of lengthy, but maybe you have a better one.

If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)>ax+b## for all ##x\in\mathbb R##, then pick a number ##q_x## in between. Let ##q_x=a'x+b'## where by denseness we choose ##(a',b')## sufficiently close to ##(a,b)## so that ##q_x## satisfies the inequality ##\varphi(x)>q_x>ax+b## for all ##x\in\mathbb R##. Then ##(a',b')\in\mathcal E_\varphi\cap\mathbb Q^2##, and since ##(a,b)## was arbitrary, this shows that ##g(x)\leq h(x)## when ##\varphi(x)>ax+b##. If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)=ax+b##, then we approximate ##(a,b)## from below by rational pairs, and the supremum will give that ##g(x)=h(x)##. Does this make sense?

2. I do not understand what the author means by "We can take advantage of the fact that ##\mathbb Q^2## is countable to [discard] a countable collection of sets of probability zero..."? Moreover I am a bit unsure about the last inequality in the proof. Is this simply an application of monotonicity, i.e. $$\sup(aX+b)\geq aX+b\implies E[\sup(aX+b)\mid\mathcal B]\geq E[aX+b\mid\mathcal B],$$ so taking the supremum of this last inequality gives the desired inequality at the end of the proof. If my reasoning is correct, I don't see why we need to consider ##\mathcal E_\varphi\cap\mathbb Q^2##.

1. I do not understand your problem. ##\mathbb{Q}^2 \subseteq \mathbb{R}^2## is dense, ##g(x) \in \mathbb{R}_+## is a single real number. Therefore there is always a sequence in ##\mathbb{Q}^2## that converges to that number. Density is the key here plus the supremum is working as a topological closure.

2. As soon as we are on a countable set, we do not need to bother with the zero-set of zero probabilities. But I do not see why that wouldn't be true for a real set, too, if we add a.s. Maybe countability grants us that their measure is zero.

3. The expectation value of the supremum includes the real limits whereas the supremum of the expectation values is restricted to rational numbers so it can be lesser simply because the set where the supremum is taken over is smaller. That would be my interpretation but I would also like to see an example where the strict inequality holds.

I On Jensen's inequality for conditional expectation

Similar threads

B A Little Probability Puzzle

I A variant of the Monty Hall problem

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

I Please Explain (actually explain) The Monty Hall Problem

B How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers