CDF and PDF of order statistics

  • Context: Graduate 
  • Thread starter Thread starter EngWiPy
  • Start date Start date
  • Tags Tags
    Cdf Pdf Statistics
Click For Summary

Discussion Overview

The discussion revolves around finding the cumulative distribution function (CDF) and probability density function (PDF) of the sum of the largest two random variables from a set of K independent and identically distributed (i.i.d.) exponentially distributed random variables with mean unity. The scope includes theoretical exploration and mathematical reasoning regarding order statistics and their properties.

Discussion Character

  • Exploratory, Technical explanation, Mathematical reasoning, Debate/contested

Main Points Raised

  • One participant seeks to find the CDF and PDF of the sum of the largest two i.i.d. exponentially distributed random variables, noting that the combinations are not independent.
  • Another participant suggests that the CDF of the sum can be expressed in terms of the probabilities of sums of pairs of the random variables, but questions the independence of those sums.
  • There is a mention of using order statistics to derive the distributions for the largest and second largest variables, with a suggestion to use moment generating functions (MGFs) or convolution for the sum.
  • Several mathematical expressions are provided for the CDF of the highest and second highest variables, along with integrals for the sum of the highest two, indicating a complex relationship between the variables.
  • One participant expresses uncertainty about the independence of events related to the sums of the random variables.
  • Another participant revisits the independence issue and proposes a different approach to calculate the CDF of the sum of the highest two variables, indicating that the problem remains challenging.

Areas of Agreement / Disagreement

Participants express uncertainty regarding the independence of the sums of the random variables, and there is no consensus on the best approach to derive the CDF and PDF of the sum of the largest two variables. Multiple competing views and methods are presented without resolution.

Contextual Notes

Participants highlight the complexity of the problem, particularly regarding the independence of the random variables involved in the sums and the need for careful application of order statistics and convolution methods. The discussion includes various mathematical formulations that may depend on specific assumptions about the distributions.

EngWiPy
Messages
1,361
Reaction score
61
Hi,

I have K i.i.d. exponentially distributed random variables with mean unity. I need to find the CDF and PDF of the summation of the largest two random variables. How can I do that? The problem in this case is that the combinations are not independent.

Thanks in advance
 
Physics news on Phys.org
EDIT: Oops, sorry, when I said it was clear that the X_i+X_j are clearly iid I was mistaken. I'm not sure that they're not independent, but if they are independent, it certainly isn't clear.

--------

The CDF of this random variable, let's call it X, is given by:

a \mapsto P(X < a)

What's the probability that the sum of the largest two of your exponentials is less than a? Well, the sum of the largest two is less than a iff the sum of every two is less than a. In other words, if we let X_1, \dots, X_K be your exponentials, then:

P(X < a) = P(X_1 + X_2 < a\mbox{ and }X_1 + X_3 < a\mbox{ and } \dots \mbox{ and }X_{K-1} + X_K < a)

Since the X_i are iid, [STRIKE]it's not hard to see that the X_i+X_j are iid[/STRIKE]. As such, we can rewrite the above:

P(X < a) = \prod_{1\leq i<j\leq K}P(X_i + X_j < a) = P(X_1 + X_2 < a)^{{K\choose 2}}

a \mapsto P(X_1 + X_2 < a) is the CDF of a \Gamma (2,1) distributed random variable.
 
Last edited:
AKG said:
EDIT: Oops, sorry, when I said it was clear that the X_i+X_j are clearly iid I was mistaken. I'm not sure that they're not independent, but if they are independent, it certainly isn't clear.

--------

The CDF of this random variable, let's call it X, is given by:

a \mapsto P(X < a)

What's the probability that the sum of the largest two of your exponentials is less than a? Well, the sum of the largest two is less than a iff the sum of every two is less than a. In other words, if we let X_1, \dots, X_K be your exponentials, then:

P(X < a) = P(X_1 + X_2 < a\mbox{ and }X_1 + X_3 < a\mbox{ and } \dots \mbox{ and }X_{K-1} + X_K < a)

Since the X_i are iid, [STRIKE]it's not hard to see that the X_i+X_j are iid[/STRIKE]. As such, we can rewrite the above:

P(X < a) = \prod_{1\leq i<j\leq K}P(X_i + X_j < a) = P(X_1 + X_2 < a)^{{K\choose 2}}

a \mapsto P(X_1 + X_2 < a) is the CDF of a \Gamma (2,1) distributed random variable.

Interesting! But are the events X1+X2<a and X1+X3<a are independent?
 
S_David said:
Hi,

I have K i.i.d. exponentially distributed random variables with mean unity. I need to find the CDF and PDF of the summation of the largest two random variables. How can I do that? The problem in this case is that the combinations are not independent.

Thanks in advance

Hey S_David.

The first thing is to get the distribution for the largest and second largest distribution. This can be done with order statistics.

Once you have these distributions, then if they are of the same type you can use MGF's to find the result type of adding the two (usually this is a good idea because in many situations adding to distributions that are i.i.d results in the same distribution with different parameters).

If the type is complex, use the convolution theorem to get the CDF and hence the PDF of the sum of the two variables. These variables have to be independent, but not necessarily identical distributed.

For convolution algorithm and more depth:

http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter7.pdf

Formula for order statistic given a known PDF can be found here:

http://www.encyclopediaofmath.org/index.php/Order_statistic
 
CDF of highest of n = Fn(x)
2nd highest < x if either highest < x or exactly one (any of the n) > x:
CDF of 2nd highest = Fn(x) + (1-Fn(x))*n*Fn-1(x)
= n*Fn-1(x) - (n-1)*Fn(x)
CDF of sum of highest two = ∫y(n*Fn-1(x-y) - (n-1)*Fn(x-y)).dFn(y)
= ∫y(n*Fn-1(x-y) - (n-1)*Fn(x-y))*n*Fn-1(y).dF(y)
 
haruspex said:
CDF of highest of n = Fn(x)
2nd highest < x if either highest < x or exactly one (any of the n) > x:
CDF of 2nd highest = Fn(x) + (1-Fn(x))*n*Fn-1(x)
= n*Fn-1(x) - (n-1)*Fn(x)
CDF of sum of highest two = ∫y(n*Fn-1(x-y) - (n-1)*Fn(x-y)).dFn(y)
= ∫y(n*Fn-1(x-y) - (n-1)*Fn(x-y))*n*Fn-1(y).dF(y)
On second thoughts, still an independence problem there.
Try 2:
P[2nd highest < x | highest = y > x] = (F(x)/F(y))n-1
P[2nd highest < x | highest = y < x] = 1
CDF of sum of highest two = Fn(x/2) + ∫y>x/2(F(x-y)/F(y))n-1.dFn(y)
= Fn(x/2) + n∫y>x/2F(x-y)n-1.dF(y)
Looks nicer at least.
 
Last edited:

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 10 ·
Replies
10
Views
6K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 5 ·
Replies
5
Views
6K
  • · Replies 35 ·
2
Replies
35
Views
4K