I On Jensen's inequality for conditional expectation

psie
Messages
315
Reaction score
40
TL;DR Summary
I am reading a proof of Jensen's inequality for conditional expectation in Le Gall's book Measure Theory, Probability and Stochastic Processes. I am a bit surprised that this inequality does not simply follow from the measure theoretic form that has been previously established, but requires a new, somewhat technical proof. I have some questions about the proof.
Theorem. Let ##\varphi:\mathbb R\to\mathbb R_+## be a convex function and ##X\in L^1##, then $$E[\varphi(X)\mid\mathcal B]\geq\varphi(E[X\mid\mathcal B]).$$

Proof: Set $$\mathcal E_\varphi=\{(a,b)\in\mathbb R^2:\forall x\in\mathbb R,\varphi(x)\geq ax+b\}=$$ Then by convexity of ##\varphi##, $$\varphi \left(x\right)=\underbrace{\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }}\left(ax+b\right)}_{g(x)}=\underbrace{\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }\cap \mathbb Q^2}\left(ax+b\right)}_{h(x)}.$$ We can take advantage of the fact that ##\mathbb Q^2## is countable to disgard [I think it should be discard] a countable collection of sets of probability zero and to get that, a.s., \begin{align*} E[\varphi(X)\mid \mathcal B]&=E\left[\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }\cap \mathbb Q^2}\left(aX+b\right)\Bigm\vert \mathcal B\right] \\ &\geq \sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }\cap \mathbb Q^2}E[aX+b\mid\mathcal B] \\ &=\varphi(E[X\mid\mathcal B])\end{align*}

Questions:

1. I am a bit unsure why ##g(x)=h(x)##. Clearly ##g(x)\geq h(x)##, but why is ##g(x)\leq h(x)##? Here's my explanation, which is kind of lengthy, but maybe you have a better one.

If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)>ax+b## for all ##x\in\mathbb R##, then pick a number ##q_x## in between. Let ##q_x=a'x+b'## where by denseness we choose ##(a',b')## sufficiently close to ##(a,b)## so that ##q_x## satisfies the inequality ##\varphi(x)>q_x>ax+b## for all ##x\in\mathbb R##. Then ##(a',b')\in\mathcal E_\varphi\cap\mathbb Q^2##, and since ##(a,b)## was arbitrary, this shows that ##g(x)\leq h(x)## when ##\varphi(x)>ax+b##. If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)=ax+b##, then we approximate ##(a,b)## from below by rational pairs, and the supremum will give that ##g(x)=h(x)##. Does this make sense?

2. I do not understand what the author means by "We can take advantage of the fact that ##\mathbb Q^2## is countable to [discard] a countable collection of sets of probability zero..."? Moreover I am a bit unsure about the last inequality in the proof. Is this simply an application of monotonicity, i.e. $$\sup(aX+b)\geq aX+b\implies E[\sup(aX+b)\mid\mathcal B]\geq E[aX+b\mid\mathcal B],$$ so taking the supremum of this last inequality gives the desired inequality at the end of the proof. If my reasoning is correct, I don't see why we need to consider ##\mathcal E_\varphi\cap\mathbb Q^2##.
 
Physics news on Phys.org
psie said:
TL;DR Summary: I am reading a proof of Jensen's inequality for conditional expectation in Le Gall's book Measure Theory, Probability and Stochastic Processes. I am a bit surprised that this inequality does not simply follow from the measure theoretic form that has been previously established, but requires a new, somewhat technical proof. I have some questions about the proof.

Questions:

1. I am a bit unsure why ##g(x)=h(x)##. Clearly ##g(x)\geq h(x)##, but why is ##g(x)\leq h(x)##? Here's my explanation, which is kind of lengthy, but maybe you have a better one.

If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)>ax+b## for all ##x\in\mathbb R##, then pick a number ##q_x## in between. Let ##q_x=a'x+b'## where by denseness we choose ##(a',b')## sufficiently close to ##(a,b)## so that ##q_x## satisfies the inequality ##\varphi(x)>q_x>ax+b## for all ##x\in\mathbb R##. Then ##(a',b')\in\mathcal E_\varphi\cap\mathbb Q^2##, and since ##(a,b)## was arbitrary, this shows that ##g(x)\leq h(x)## when ##\varphi(x)>ax+b##. If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)=ax+b##, then we approximate ##(a,b)## from below by rational pairs, and the supremum will give that ##g(x)=h(x)##. Does this make sense?

2. I do not understand what the author means by "We can take advantage of the fact that ##\mathbb Q^2## is countable to [discard] a countable collection of sets of probability zero..."? Moreover I am a bit unsure about the last inequality in the proof. Is this simply an application of monotonicity, i.e. $$\sup(aX+b)\geq aX+b\implies E[\sup(aX+b)\mid\mathcal B]\geq E[aX+b\mid\mathcal B],$$ so taking the supremum of this last inequality gives the desired inequality at the end of the proof. If my reasoning is correct, I don't see why we need to consider ##\mathcal E_\varphi\cap\mathbb Q^2##.
1. I do not understand your problem. ##\mathbb{Q}^2 \subseteq \mathbb{R}^2## is dense, ##g(x) \in \mathbb{R}_+## is a single real number. Therefore there is always a sequence in ##\mathbb{Q}^2## that converges to that number. Density is the key here plus the supremum is working as a topological closure.

2. As soon as we are on a countable set, we do not need to bother with the zero-set of zero probabilities. But I do not see why that wouldn't be true for a real set, too, if we add a.s. Maybe countability grants us that their measure is zero.

3. The expectation value of the supremum includes the real limits whereas the supremum of the expectation values is restricted to rational numbers so it can be lesser simply because the set where the supremum is taken over is smaller. That would be my interpretation but I would also like to see an example where the strict inequality holds.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top