Calculating a CDF Identity: Derivation and Explanation

pluviosilla · Sep 8, 2008

I ran across this identity in some actuarial literature:

Pr( (x_1 \le X \le x_2) \ \cap \ (y_1 \le Y \le y_2) ) = F(x_2, y_2) - F(x_1, y_2) - F(x_2, y_1) + F(x_1, y_1)

First of all, I am not certain this is correct. I think the expression on the LHS is equal to the following double integral, which is by no means obviously equal to the CDF expression on the RHS:

Pr( (x_1 \le X \le x_2) \cap (y_1 \le Y \le y_2) ) = \int_{x_1 }^{x_2}\int_{y_1}^{y_2}f(x,y)dydx

I suspect that maybe the author intended to use the OR condition in the expression on the left. Did he mean to say this?

Pr( (x_1 \le X \le x_2) \ \cup \ (y_1 \le Y \le y_2) ) = F(x_2, y_2) - F(x_1, y_2) - F(x_2, y_1) + F(x_1, y_1)

Either way, I would like to see the derivation. Any help would be much appreciated.

Thanks!

gel · Sep 8, 2008

That's (almost) right. If I(S) is the indicator function of a set S then

 I(x_1\le X<x_2,\ y_1\le Y<y_2) = \left(I(X<x_2)-I(X<x_1)\right)\left(I(Y<y_2)-I(Y<y_1)\right) 

 =I(X<x_2,Y<y_2)+I(X<x_1,Y<y_1)-I(X<x_1,Y<y_2)-I(X<x_2,Y<y_1)

pluviosilla · Sep 8, 2008

Fascinating! I'll read up on the indicator function, because it looks like something I could use to find shortcuts! :-)

A couple of questions (if you have time):
(1.) Does the equation you posted require X & Y to be independent RVs?
(2.) What do you get when the two parentheses are ORed instead of ANDed?

Pr( (x_1 \le X \le x_2) \ \cup \ (y_1 \le Y \le y_2) )

gel · Sep 9, 2008

pluviosilla said:

(1.) Does the equation you posted require X & Y to be independent RVs?
(2.) What do you get when the two parentheses are ORed instead of ANDed?

(1) no
(2) It's a little bit messier, but you can use I(AuB)=I(A)+I(B)-I(A)I(B).

The "Basic properties" section of the http://en.wikipedia.org/wiki/Indicator_function" should also help answer your questions.

pluviosilla · Oct 13, 2008

I read the Wikipedia article which provides the identities you used above. In particular,

I_{A \cap B} = I_A \cdot I_B

Are you saying that we can extrapolate this relationship to all functions of an intersection? If so, how would you prove that?

It is true that a CDF is, in a sense, a function of an intersection:

F(x, y) = P(X < x AND Y < y)

But it is not generally true that F(x, y) = F(x) * F(y). This identity only works when X & Y are independent.

No doubt, people familiar with the indicator function will quickly see how it applies to a multivariate CDF, but I am having trouble filling in the gaps.

Do you know where I can find some good proofs that use the indicator function to explore the properties of CDFs? I skimmed through a basic Probability text (Sheldon Ross) and found interesting applications (notably, a proof that E[I_A] = P(A)). But I found nothing of relevance to this discussion.

statdad · Oct 13, 2008

Try this.
 \begin{align*} \Pr(x_1 \le X \le x_2 \cap y_1 \le Y \le y_2) & = \int_{x_1}^{x_2} \int_{y_1}^{y_2} f(x,y)\,dydx\\ & = \int_{x_1}^{x_2} \left(\int_{-\infty}^{y_2} - \int_{-\infty}^{y_1}\right) f(x,y)\,dydx\\ & = \int_{x_1}^{x_2} \int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{x_1}^{x_2} \int_{-\infty}^{y_1} f(x,y) \, dy dx \\ & = \int_{-\infty}^{x_2} \left(\int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{-\infty}^{y_1} f(x,y)\right)\,dydx\\ & - \int_{-\infty}^{x_1} \left(\int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{-\infty}^{y_1} f(x,y)\right) \,dydx\\ & = F(x_2,y_2) - F(x_1,y_2) - F(x_2,y_1) + F(x_1,y_1) \end{align*}

pluviosilla · Oct 13, 2008

Yes, of course. This approach should have been obvious, but I didn't think of it. Thanks!

Do you understand how to prove the identity using the indicator function?

statdad · Oct 13, 2008

Sorry it took a time to get all the integrals correct - the Latex here was giving me fits (not updating, not parsing all the code, giving "Latex image not valid" messages) - it just isn't my day, I guess. The work with indicators is similar:

 \begin{align*} I(x_1 \le X \le x_2, y_1 \le Y \le y_2) & = I(x_1 \le X \le x_2) \cdot I(y_1 \le Y \le y_2))\\ & = I(x_1 \le X \le x_2) \cdot \left(I(Y \le y_2) - I(Y \le y_1)\right) \\ & = \left(I(X \le x_2) - I(X \le x_1\right) \cdot \left(I(Y \le y_2) - I(Y \le y_1)\right) \end{align*} 

Multiply these out and then integrate the entire shebang w.r.t. dF(x,y) = f(x,y) \,dxdy

pluviosilla · Oct 13, 2008

I see! You might say the indicator function takes the place of the integration limits. That's powerful!

It was your last statement (Integrate PDF * Expression with Indicator Function) that made this click for me. I'm a self-taught statistician, so I've got these annoying gaps in my training. This thread is the first I'd ever heard of the indicator function, but it clearly has some very useful properties.

Thanks very much to both you and gel.

gel · Oct 13, 2008

pluviosilla said:

I see! You might say the indicator function takes the place of the integration limits. That's powerful!

Indeed. It works because integration is a linear function of the integrand.
Rather than writing out the integral in full, it's often useful to use the standard notation E(Z) for the expected value of a random variable Z. Then, for any event S, the indicator function I(S) is a random variable taking the values 0 and 1, and
 E(I(S)) = P(S) .
Writing probabilities in terms of expected values in this way is often handy for rearranging expressions such as the one you were asking about.

pluviosilla · Oct 14, 2008

Where are you taught that an integral is actually just a linear function of the integrand? Analysis class, perhaps?

I somehow managed to get a bachelor's degree in physics without a single course in analysis, chemistry or - alas - statistics. In the interview I had with the department chairman before graduating he said, "I admit that you have fulfilled all the requirements even though you are missing these courses, but I have to ask: how on Earth did you do it?" At the time, I thought I was clever to avoid these courses. Now, I just feel like a moron.

statdad · Oct 14, 2008

"Where are you taught that an integral is actually just a linear function of the integrand? "

If you have seen that

 \int_a^b \left(c \cdot f(x) + d \cdot g(x) \right) \, dx = c \int_a^b f(x) \, dx + d \int_a^b g(x) \, dx 

then you've seen the property you reference. Proving this property holds requires a class in which the properties of Riemann integration are developed; that may be an advanced calculus class or a first analysis class (more generalities in the latter). I saw it in advanced calculus as a junior and in a mathematical statistics class as a senior.

Calculating a CDF Identity: Derivation and Explanation

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

I Stochastic calculus: Ito's lemma and differentials

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective