Calculating a CDF Identity: Derivation and Explanation

  • Context: Graduate 
  • Thread starter Thread starter pluviosilla
  • Start date Start date
  • Tags Tags
    Cdf Identity
Click For Summary

Discussion Overview

The discussion revolves around the derivation and understanding of a specific cumulative distribution function (CDF) identity related to joint probabilities of random variables X and Y. Participants explore the mathematical expressions involved, including the use of indicator functions, and question the conditions under which these identities hold. The scope includes theoretical derivations, mathematical reasoning, and conceptual clarifications.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant presents a CDF identity from actuarial literature and expresses uncertainty about its correctness, suggesting a possible misinterpretation involving an OR condition instead of AND.
  • Another participant introduces the use of indicator functions to express joint probabilities and provides an equation involving these functions.
  • Questions arise regarding the independence of random variables X and Y in relation to the presented equations.
  • Discussion includes the complexity of deriving probabilities when using OR conditions instead of AND conditions, with references to properties of indicator functions.
  • A participant seeks proofs that utilize indicator functions to explore properties of CDFs, noting gaps in their understanding despite having read relevant literature.
  • Further elaboration on the integration process and the application of indicator functions to simplify expressions is provided, with acknowledgment of the challenges faced in formatting mathematical expressions.
  • Participants reflect on their educational backgrounds and the implications of missing certain foundational courses in analysis and statistics.

Areas of Agreement / Disagreement

Participants express differing views on the correctness of the initial CDF identity and the implications of using indicator functions. There is no consensus on the interpretation of the identity or the conditions under which it holds, indicating ongoing debate and exploration of the topic.

Contextual Notes

Some participants note the limitations in their understanding of the properties of integrals and indicator functions, suggesting a need for further exploration of these concepts in the context of CDFs and joint probabilities.

pluviosilla
Messages
17
Reaction score
0
I ran across this identity in some actuarial literature:

[tex]Pr( (x_1 \le X \le x_2) \ \cap \ (y_1 \le Y \le y_2) ) = F(x_2, y_2) - F(x_1, y_2) - F(x_2, y_1) + F(x_1, y_1)[/tex]

First of all, I am not certain this is correct. I think the expression on the LHS is equal to the following double integral, which is by no means obviously equal to the CDF expression on the RHS:

[tex]Pr( (x_1 \le X \le x_2) \cap (y_1 \le Y \le y_2) ) = \int_{x_1 }^{x_2}\int_{y_1}^{y_2}f(x,y)dydx[/tex]

I suspect that maybe the author intended to use the OR condition in the expression on the left. Did he mean to say this?

[tex]Pr( (x_1 \le X \le x_2) \ \cup \ (y_1 \le Y \le y_2) ) = F(x_2, y_2) - F(x_1, y_2) - F(x_2, y_1) + F(x_1, y_1)[/tex]

Either way, I would like to see the derivation. Any help would be much appreciated.

Thanks!
 
Physics news on Phys.org
That's (almost) right. If I(S) is the indicator function of a set S then

[tex] I(x_1\le X<x_2,\ y_1\le Y<y_2) =<br /> \left(I(X<x_2)-I(X<x_1)\right)\left(I(Y<y_2)-I(Y<y_1)\right)[/tex]

[tex] =I(X<x_2,Y<y_2)+I(X<x_1,Y<y_1)-I(X<x_1,Y<y_2)-I(X<x_2,Y<y_1)[/tex]
 
  • Like
Likes   Reactions: pluviosilla
Fascinating! I'll read up on the indicator function, because it looks like something I could use to find shortcuts! :-)

A couple of questions (if you have time):
(1.) Does the equation you posted require X & Y to be independent RVs?
(2.) What do you get when the two parentheses are ORed instead of ANDed?

[tex]Pr( (x_1 \le X \le x_2) \ \cup \ (y_1 \le Y \le y_2) )[/tex]
 
pluviosilla said:
(1.) Does the equation you posted require X & Y to be independent RVs?
(2.) What do you get when the two parentheses are ORed instead of ANDed?

(1) no
(2) It's a little bit messier, but you can use I(AuB)=I(A)+I(B)-I(A)I(B).

The "Basic properties" section of the http://en.wikipedia.org/wiki/Indicator_function" should also help answer your questions.
 
Last edited by a moderator:
  • Like
Likes   Reactions: pluviosilla
I read the Wikipedia article which provides the identities you used above. In particular,

[tex]I_{A \cap B} = I_A \cdot I_B[/tex]

Are you saying that we can extrapolate this relationship to all functions of an intersection? If so, how would you prove that?

It is true that a CDF is, in a sense, a function of an intersection:

F(x, y) = P(X < x AND Y < y)

But it is not generally true that F(x, y) = F(x) * F(y). This identity only works when X & Y are independent.

No doubt, people familiar with the indicator function will quickly see how it applies to a multivariate CDF, but I am having trouble filling in the gaps.

Do you know where I can find some good proofs that use the indicator function to explore the properties of CDFs? I skimmed through a basic Probability text (Sheldon Ross) and found interesting applications (notably, a proof that [tex]E[I_A] = P(A)[/tex]). But I found nothing of relevance to this discussion.
 
Try this.
[tex] \begin{align*}<br /> \Pr(x_1 \le X \le x_2 \cap y_1 \le Y \le y_2) & = \int_{x_1}^{x_2} \int_{y_1}^{y_2} f(x,y)\,dydx\\<br /> & = \int_{x_1}^{x_2} \left(\int_{-\infty}^{y_2} - \int_{-\infty}^{y_1}\right) f(x,y)\,dydx\\<br /> & = \int_{x_1}^{x_2} \int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{x_1}^{x_2} \int_{-\infty}^{y_1} f(x,y) \, dy dx \\<br /> & = \int_{-\infty}^{x_2} \left(\int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{-\infty}^{y_1} f(x,y)\right)\,dydx\\<br /> & - \int_{-\infty}^{x_1} \left(\int_{-\infty}^{y_2} f(x,y) \, dy dx - \int_{-\infty}^{y_1} f(x,y)\right) \,dydx\\<br /> & = F(x_2,y_2) - F(x_1,y_2) - F(x_2,y_1) + F(x_1,y_1)<br /> \end{align*}[/tex]
 
  • Like
Likes   Reactions: pluviosilla
Yes, of course. This approach should have been obvious, but I didn't think of it. Thanks!

Do you understand how to prove the identity using the indicator function?
 
Sorry it took a time to get all the integrals correct - the Latex here was giving me fits (not updating, not parsing all the code, giving "Latex image not valid" messages) - it just isn't my day, I guess. The work with indicators is similar:

[tex] \begin{align*}<br /> I(x_1 \le X \le x_2, y_1 \le Y \le y_2) & = I(x_1 \le X \le x_2) \cdot I(y_1 \le Y \le y_2))\\<br /> & = I(x_1 \le X \le x_2) \cdot \left(I(Y \le y_2) - I(Y \le y_1)\right) \\<br /> & = \left(I(X \le x_2) - I(X \le x_1\right) \cdot \left(I(Y \le y_2) - I(Y \le y_1)\right)<br /> \end{align*}[/tex]

Multiply these out and then integrate the entire shebang w.r.t. [tex]dF(x,y) = f(x,y) \,dxdy[/tex]
 
  • Like
Likes   Reactions: pluviosilla
I see! You might say the indicator function takes the place of the integration limits. That's powerful!

It was your last statement (Integrate PDF * Expression with Indicator Function) that made this click for me. I'm a self-taught statistician, so I've got these annoying gaps in my training. This thread is the first I'd ever heard of the indicator function, but it clearly has some very useful properties.

Thanks very much to both you and gel.
 
  • #10
pluviosilla said:
I see! You might say the indicator function takes the place of the integration limits. That's powerful!

Indeed. It works because integration is a linear function of the integrand.
Rather than writing out the integral in full, it's often useful to use the standard notation E(Z) for the expected value of a random variable Z. Then, for any event S, the indicator function I(S) is a random variable taking the values 0 and 1, and
[tex] E(I(S)) = P(S)[/tex].
Writing probabilities in terms of expected values in this way is often handy for rearranging expressions such as the one you were asking about.
 
  • Like
Likes   Reactions: pluviosilla
  • #11
Where are you taught that an integral is actually just a linear function of the integrand? Analysis class, perhaps?

I somehow managed to get a bachelor's degree in physics without a single course in analysis, chemistry or - alas - statistics. In the interview I had with the department chairman before graduating he said, "I admit that you have fulfilled all the requirements even though you are missing these courses, but I have to ask: how on Earth did you do it?" At the time, I thought I was clever to avoid these courses. Now, I just feel like a moron.
 
  • #12
"Where are you taught that an integral is actually just a linear function of the integrand? "

If you have seen that

[tex] \int_a^b \left(c \cdot f(x) + d \cdot g(x) \right) \, dx = c \int_a^b f(x) \, dx + d \int_a^b g(x) \, dx[/tex]

then you've seen the property you reference. Proving this property holds requires a class in which the properties of Riemann integration are developed; that may be an advanced calculus class or a first analysis class (more generalities in the latter). I saw it in advanced calculus as a junior and in a mathematical statistics class as a senior.
 
  • Like
Likes   Reactions: pluviosilla

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 20 ·
Replies
20
Views
2K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 12 ·
Replies
12
Views
3K