CDF and PDF of order statistics

EngWiPy · May 28, 2012

Hi,

I have K i.i.d. exponentially distributed random variables with mean unity. I need to find the CDF and PDF of the summation of the largest two random variables. How can I do that? The problem in this case is that the combinations are not independent.

Thanks in advance

AKG · May 28, 2012

EDIT: Oops, sorry, when I said it was clear that the X_i+X_j are clearly iid I was mistaken. I'm not sure that they're not independent, but if they are independent, it certainly isn't clear.

--------

The CDF of this random variable, let's call it X, is given by:

a \mapsto P(X < a)

What's the probability that the sum of the largest two of your exponentials is less than a? Well, the sum of the largest two is less than a iff the sum of every two is less than a. In other words, if we let X_1, \dots, X_K be your exponentials, then:

P(X < a) = P(X_1 + X_2 < a\mbox{ and }X_1 + X_3 < a\mbox{ and } \dots \mbox{ and }X_{K-1} + X_K < a)

Since the X_i are iid, [STRIKE]it's not hard to see that the X_i+X_j are iid[/STRIKE]. As such, we can rewrite the above:

P(X < a) = \prod_{1\leq i<j\leq K}P(X_i + X_j < a) = P(X_1 + X_2 < a)^{{K\choose 2}}

a \mapsto P(X_1 + X_2 < a) is the CDF of a \Gamma (2,1) distributed random variable.

EngWiPy · May 29, 2012

AKG said:

EDIT: Oops, sorry, when I said it was clear that the X_i+X_j are clearly iid I was mistaken. I'm not sure that they're not independent, but if they are independent, it certainly isn't clear.

--------

The CDF of this random variable, let's call it X, is given by:

a \mapsto P(X < a)

What's the probability that the sum of the largest two of your exponentials is less than a? Well, the sum of the largest two is less than a iff the sum of every two is less than a. In other words, if we let X_1, \dots, X_K be your exponentials, then:

P(X < a) = P(X_1 + X_2 < a\mbox{ and }X_1 + X_3 < a\mbox{ and } \dots \mbox{ and }X_{K-1} + X_K < a)

Since the X_i are iid, [STRIKE]it's not hard to see that the X_i+X_j are iid[/STRIKE]. As such, we can rewrite the above:

P(X < a) = \prod_{1\leq i<j\leq K}P(X_i + X_j < a) = P(X_1 + X_2 < a)^{{K\choose 2}}

a \mapsto P(X_1 + X_2 < a) is the CDF of a \Gamma (2,1) distributed random variable.

Interesting! But are the events X1+X2<a and X1+X3<a are independent?

chiro · May 29, 2012

S_David said:

Hi,

I have K i.i.d. exponentially distributed random variables with mean unity. I need to find the CDF and PDF of the summation of the largest two random variables. How can I do that? The problem in this case is that the combinations are not independent.

Thanks in advance

Hey S_David.

The first thing is to get the distribution for the largest and second largest distribution. This can be done with order statistics.

Once you have these distributions, then if they are of the same type you can use MGF's to find the result type of adding the two (usually this is a good idea because in many situations adding to distributions that are i.i.d results in the same distribution with different parameters).

If the type is complex, use the convolution theorem to get the CDF and hence the PDF of the sum of the two variables. These variables have to be independent, but not necessarily identical distributed.

For convolution algorithm and more depth:

http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter7.pdf

Formula for order statistic given a known PDF can be found here:

http://www.encyclopediaofmath.org/index.php/Order_statistic

haruspex · May 29, 2012

CDF of highest of n = Fⁿ(x)
2nd highest < x if either highest < x or exactly one (any of the n) > x:
CDF of 2nd highest = Fⁿ(x) + (1-Fⁿ(x))*n*F^n-1(x)
= n*F^n-1(x) - (n-1)*Fⁿ(x)
CDF of sum of highest two = ∫_y(n*F^n-1(x-y) - (n-1)*Fⁿ(x-y)).dFⁿ(y)
= ∫_y(n*F^n-1(x-y) - (n-1)*Fⁿ(x-y))*n*F^n-1(y).dF(y)

haruspex · May 29, 2012

haruspex said:

CDF of highest of n = Fⁿ(x)
2nd highest < x if either highest < x or exactly one (any of the n) > x:
CDF of 2nd highest = Fⁿ(x) + (1-Fⁿ(x))*n*F^n-1(x)
= n*F^n-1(x) - (n-1)*Fⁿ(x)
CDF of sum of highest two = ∫_y(n*F^n-1(x-y) - (n-1)*Fⁿ(x-y)).dFⁿ(y)
= ∫_y(n*F^n-1(x-y) - (n-1)*Fⁿ(x-y))*n*F^n-1(y).dF(y)

On second thoughts, still an independence problem there.
Try 2:
P[2nd highest < x | highest = y > x] = (F(x)/F(y))^n-1
P[2nd highest < x | highest = y < x] = 1
CDF of sum of highest two = Fⁿ(x/2) + ∫_y>x/2(F(x-y)/F(y))^n-1.dFⁿ(y)
= Fⁿ(x/2) + n∫_y>x/2F(x-y)^n-1.dF(y)
Looks nicer at least.

CDF and PDF of order statistics

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Roulette wheel physics and probability'

Thread 'Detail of Diagonalization Lemma'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective