Calculating a CDF Identity: Derivation and Explanation

In summary, the identity states that the probability of an event (x1 <= x2, y1 <= y2) occurring, given that events x1 and x2 have already occurred, is the product of the probabilities of the individual events.
  • #1
pluviosilla
17
0
I ran across this identity in some actuarial literature:

[tex]Pr( (x_1 \le X \le x_2) \ \cap \ (y_1 \le Y \le y_2) ) = F(x_2, y_2) - F(x_1, y_2) - F(x_2, y_1) + F(x_1, y_1)[/tex]

First of all, I am not certain this is correct. I think the expression on the LHS is equal to the following double integral, which is by no means obviously equal to the CDF expression on the RHS:

[tex]Pr( (x_1 \le X \le x_2) \cap (y_1 \le Y \le y_2) ) = \int_{x_1 }^{x_2}\int_{y_1}^{y_2}f(x,y)dydx[/tex]

I suspect that maybe the author intended to use the OR condition in the expression on the left. Did he mean to say this?

[tex]Pr( (x_1 \le X \le x_2) \ \cup \ (y_1 \le Y \le y_2) ) = F(x_2, y_2) - F(x_1, y_2) - F(x_2, y_1) + F(x_1, y_1)[/tex]

Either way, I would like to see the derivation. Any help would be much appreciated.

Thanks!
 
Physics news on Phys.org
  • #2
That's (almost) right. If I(S) is the indicator function of a set S then

[tex]
I(x_1\le X<x_2,\ y_1\le Y<y_2) =
\left(I(X<x_2)-I(X<x_1)\right)\left(I(Y<y_2)-I(Y<y_1)\right)
[/tex]

[tex]
=I(X<x_2,Y<y_2)+I(X<x_1,Y<y_1)-I(X<x_1,Y<y_2)-I(X<x_2,Y<y_1)
[/tex]
 
  • Like
Likes pluviosilla
  • #3
Fascinating! I'll read up on the indicator function, because it looks like something I could use to find shortcuts! :-)

A couple of questions (if you have time):
(1.) Does the equation you posted require X & Y to be independent RVs?
(2.) What do you get when the two parentheses are ORed instead of ANDed?

[tex]Pr( (x_1 \le X \le x_2) \ \cup \ (y_1 \le Y \le y_2) )[/tex]
 
  • #4
pluviosilla said:
(1.) Does the equation you posted require X & Y to be independent RVs?
(2.) What do you get when the two parentheses are ORed instead of ANDed?

(1) no
(2) It's a little bit messier, but you can use I(AuB)=I(A)+I(B)-I(A)I(B).

The "Basic properties" section of the http://en.wikipedia.org/wiki/Indicator_function" should also help answer your questions.
 
Last edited by a moderator:
  • Like
Likes pluviosilla
  • #5
I read the Wikipedia article which provides the identities you used above. In particular,

[tex]I_{A \cap B} = I_A \cdot I_B[/tex]

Are you saying that we can extrapolate this relationship to all functions of an intersection? If so, how would you prove that?

It is true that a CDF is, in a sense, a function of an intersection:

F(x, y) = P(X < x AND Y < y)

But it is not generally true that F(x, y) = F(x) * F(y). This identity only works when X & Y are independent.

No doubt, people familiar with the indicator function will quickly see how it applies to a multivariate CDF, but I am having trouble filling in the gaps.

Do you know where I can find some good proofs that use the indicator function to explore the properties of CDFs? I skimmed through a basic Probability text (Sheldon Ross) and found interesting applications (notably, a proof that [tex]E[I_A] = P(A)[/tex]). But I found nothing of relevance to this discussion.
 
  • #6
Try this.
[tex]
\begin{align*}
\Pr(x_1 \le X \le x_2 \cap y_1 \le Y \le y_2) & = \int_{x_1}^{x_2} \int_{y_1}^{y_2} f(x,y)\,dydx\\
& = \int_{x_1}^{x_2} \left(\int_{-\infty}^{y_2} - \int_{-\infty}^{y_1}\right) f(x,y)\,dydx\\
& = \int_{x_1}^{x_2} \int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{x_1}^{x_2} \int_{-\infty}^{y_1} f(x,y) \, dy dx \\
& = \int_{-\infty}^{x_2} \left(\int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{-\infty}^{y_1} f(x,y)\right)\,dydx\\
& - \int_{-\infty}^{x_1} \left(\int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{-\infty}^{y_1} f(x,y)\right) \,dydx\\
& = F(x_2,y_2) - F(x_1,y_2) - F(x_2,y_1) + F(x_1,y_1)
\end{align*}
[/tex]
 
  • Like
Likes pluviosilla
  • #7
Yes, of course. This approach should have been obvious, but I didn't think of it. Thanks!

Do you understand how to prove the identity using the indicator function?
 
  • #8
Sorry it took a time to get all the integrals correct - the Latex here was giving me fits (not updating, not parsing all the code, giving "Latex image not valid" messages) - it just isn't my day, I guess. The work with indicators is similar:

[tex]
\begin{align*}
I(x_1 \le X \le x_2, y_1 \le Y \le y_2) & = I(x_1 \le X \le x_2) \cdot I(y_1 \le Y \le y_2))\\
& = I(x_1 \le X \le x_2) \cdot \left(I(Y \le y_2) - I(Y \le y_1)\right) \\
& = \left(I(X \le x_2) - I(X \le x_1\right) \cdot \left(I(Y \le y_2) - I(Y \le y_1)\right)
\end{align*}
[/tex]

Multiply these out and then integrate the entire shebang w.r.t. [tex] dF(x,y) = f(x,y) \,dxdy[/tex]
 
  • Like
Likes pluviosilla
  • #9
I see! You might say the indicator function takes the place of the integration limits. That's powerful!

It was your last statement (Integrate PDF * Expression with Indicator Function) that made this click for me. I'm a self-taught statistician, so I've got these annoying gaps in my training. This thread is the first I'd ever heard of the indicator function, but it clearly has some very useful properties.

Thanks very much to both you and gel.
 
  • #10
pluviosilla said:
I see! You might say the indicator function takes the place of the integration limits. That's powerful!

Indeed. It works because integration is a linear function of the integrand.
Rather than writing out the integral in full, it's often useful to use the standard notation E(Z) for the expected value of a random variable Z. Then, for any event S, the indicator function I(S) is a random variable taking the values 0 and 1, and
[tex]
E(I(S)) = P(S)
[/tex].
Writing probabilities in terms of expected values in this way is often handy for rearranging expressions such as the one you were asking about.
 
  • Like
Likes pluviosilla
  • #11
Where are you taught that an integral is actually just a linear function of the integrand? Analysis class, perhaps?

I somehow managed to get a bachelor's degree in physics without a single course in analysis, chemistry or - alas - statistics. In the interview I had with the department chairman before graduating he said, "I admit that you have fulfilled all the requirements even though you are missing these courses, but I have to ask: how on Earth did you do it?" At the time, I thought I was clever to avoid these courses. Now, I just feel like a moron.
 
  • #12
"Where are you taught that an integral is actually just a linear function of the integrand? "

If you have seen that

[tex]
\int_a^b \left(c \cdot f(x) + d \cdot g(x) \right) \, dx = c \int_a^b f(x) \, dx + d \int_a^b g(x) \, dx
[/tex]

then you've seen the property you reference. Proving this property holds requires a class in which the properties of Riemann integration are developed; that may be an advanced calculus class or a first analysis class (more generalities in the latter). I saw it in advanced calculus as a junior and in a mathematical statistics class as a senior.
 
  • Like
Likes pluviosilla

1. What is a CDF identity?

A cumulative distribution function (CDF) identity is an equation that relates the CDFs of different random variables. It shows that the CDFs of these variables are mathematically related in a certain way.

2. Why is it important to prove a CDF identity?

Proving a CDF identity is important because it helps us understand the relationship between different random variables and their distributions. It also allows us to use the properties of CDFs to solve complex probability problems.

3. How do you prove a CDF identity?

To prove a CDF identity, you need to use mathematical techniques such as integration, differentiation, and algebraic manipulation. It is important to follow the rules of mathematical proofs and show each step clearly.

4. What are some common CDF identities?

Some common CDF identities include the sum rule, product rule, and chain rule. These identities relate the CDFs of sums, products, and functions of random variables, respectively.

5. Can CDF identities be used in real-world applications?

Yes, CDF identities are widely used in various fields such as statistics, engineering, and finance. They can be used to determine the distribution of a sum or product of random variables, which is useful in risk assessment, forecasting, and modeling.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
752
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
484
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
Replies
3
Views
734
Replies
12
Views
1K
  • Topology and Analysis
Replies
2
Views
1K
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
3K
  • Calculus and Beyond Homework Help
Replies
2
Views
278
  • Differential Equations
Replies
9
Views
2K
Back
Top