# Proving a CDF Identity

1. Sep 8, 2008

### pluviosilla

I ran across this identity in some actuarial literature:

$$Pr( (x_1 \le X \le x_2) \ \cap \ (y_1 \le Y \le y_2) ) = F(x_2, y_2) - F(x_1, y_2) - F(x_2, y_1) + F(x_1, y_1)$$

First of all, I am not certain this is correct. I think the expression on the LHS is equal to the following double integral, which is by no means obviously equal to the CDF expression on the RHS:

$$Pr( (x_1 \le X \le x_2) \cap (y_1 \le Y \le y_2) ) = \int_{x_1 }^{x_2}\int_{y_1}^{y_2}f(x,y)dydx$$

I suspect that maybe the author intended to use the OR condition in the expression on the left. Did he mean to say this?

$$Pr( (x_1 \le X \le x_2) \ \cup \ (y_1 \le Y \le y_2) ) = F(x_2, y_2) - F(x_1, y_2) - F(x_2, y_1) + F(x_1, y_1)$$

Either way, I would like to see the derivation. Any help would be much appreciated.

Thanks!

2. Sep 8, 2008

### gel

That's (almost) right. If I(S) is the indicator function of a set S then

$$I(x_1\le X<x_2,\ y_1\le Y<y_2) = \left(I(X<x_2)-I(X<x_1)\right)\left(I(Y<y_2)-I(Y<y_1)\right)$$

$$=I(X<x_2,Y<y_2)+I(X<x_1,Y<y_1)-I(X<x_1,Y<y_2)-I(X<x_2,Y<y_1)$$

3. Sep 8, 2008

### pluviosilla

Fascinating! I'll read up on the indicator function, because it looks like something I could use to find shortcuts! :-)

A couple of questions (if you have time):
(1.) Does the equation you posted require X & Y to be independent RVs?
(2.) What do you get when the two parentheses are ORed instead of ANDed?

$$Pr( (x_1 \le X \le x_2) \ \cup \ (y_1 \le Y \le y_2) )$$

4. Sep 9, 2008

### gel

(1) no
(2) It's a little bit messier, but you can use I(AuB)=I(A)+I(B)-I(A)I(B).

The "Basic properties" section of the wikipedia article should also help answer your questions.

5. Oct 13, 2008

### pluviosilla

I read the Wikipedia article which provides the identities you used above. In particular,

$$I_{A \cap B} = I_A \cdot I_B$$

Are you saying that we can extrapolate this relationship to all functions of an intersection? If so, how would you prove that?

It is true that a CDF is, in a sense, a function of an intersection:

F(x, y) = P(X < x AND Y < y)

But it is not generally true that F(x, y) = F(x) * F(y). This identity only works when X & Y are independent.

No doubt, people familiar with the indicator function will quickly see how it applies to a multivariate CDF, but I am having trouble filling in the gaps.

Do you know where I can find some good proofs that use the indicator function to explore the properties of CDFs? I skimmed through a basic Probability text (Sheldon Ross) and found interesting applications (notably, a proof that $$E[I_A] = P(A)$$). But I found nothing of relevance to this discussion.

6. Oct 13, 2008

Try this.
\begin{align*} \Pr(x_1 \le X \le x_2 \cap y_1 \le Y \le y_2) & = \int_{x_1}^{x_2} \int_{y_1}^{y_2} f(x,y)\,dydx\\ & = \int_{x_1}^{x_2} \left(\int_{-\infty}^{y_2} - \int_{-\infty}^{y_1}\right) f(x,y)\,dydx\\ & = \int_{x_1}^{x_2} \int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{x_1}^{x_2} \int_{-\infty}^{y_1} f(x,y) \, dy dx \\ & = \int_{-\infty}^{x_2} \left(\int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{-\infty}^{y_1} f(x,y)\right)\,dydx\\ & - \int_{-\infty}^{x_1} \left(\int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{-\infty}^{y_1} f(x,y)\right) \,dydx\\ & = F(x_2,y_2) - F(x_1,y_2) - F(x_2,y_1) + F(x_1,y_1) \end{align*}

7. Oct 13, 2008

### pluviosilla

Yes, of course. This approach should have been obvious, but I didn't think of it. Thanks!

Do you understand how to prove the identity using the indicator function?

8. Oct 13, 2008

Sorry it took a time to get all the integrals correct - the Latex here was giving me fits (not updating, not parsing all the code, giving "Latex image not valid" messages) - it just isn't my day, I guess. The work with indicators is similar:

\begin{align*} I(x_1 \le X \le x_2, y_1 \le Y \le y_2) & = I(x_1 \le X \le x_2) \cdot I(y_1 \le Y \le y_2))\\ & = I(x_1 \le X \le x_2) \cdot \left(I(Y \le y_2) - I(Y \le y_1)\right) \\ & = \left(I(X \le x_2) - I(X \le x_1\right) \cdot \left(I(Y \le y_2) - I(Y \le y_1)\right) \end{align*}

Multiply these out and then integrate the entire shebang w.r.t. $$dF(x,y) = f(x,y) \,dxdy$$

9. Oct 13, 2008

### pluviosilla

I see!! You might say the indicator function takes the place of the integration limits. That's powerful!

It was your last statement (Integrate PDF * Expression with Indicator Function) that made this click for me. I'm a self-taught statistician, so I've got these annoying gaps in my training. This thread is the first I'd ever heard of the indicator function, but it clearly has some very useful properties.

Thanks very much to both you and gel.

10. Oct 13, 2008

### gel

Indeed. It works because integration is a linear function of the integrand.
Rather than writing out the integral in full, it's often useful to use the standard notation E(Z) for the expected value of a random variable Z. Then, for any event S, the indicator function I(S) is a random variable taking the values 0 and 1, and
$$E(I(S)) = P(S)$$.
Writing probabilities in terms of expected values in this way is often handy for rearranging expressions such as the one you were asking about.

11. Oct 14, 2008

### pluviosilla

Where are you taught that an integral is actually just a linear function of the integrand? Analysis class, perhaps?

I somehow managed to get a bachelor's degree in physics without a single course in analysis, chemistry or - alas - statistics. In the interview I had with the department chairman before graduating he said, "I admit that you have fulfilled all the requirements even though you are missing these courses, but I have to ask: how on earth did you do it?" At the time, I thought I was clever to avoid these courses. Now, I just feel like a moron.

12. Oct 14, 2008

$$\int_a^b \left(c \cdot f(x) + d \cdot g(x) \right) \, dx = c \int_a^b f(x) \, dx + d \int_a^b g(x) \, dx$$