CDF and PDF of order statistics

In summary, the problem is to find the CDF and PDF of the summation of the largest two of K i.i.d. exponentially distributed random variables with mean unity. This can be done using order statistics and MGFs, or the convolution algorithm if the distributions are not identical. However, there may be an independence problem in this case. Another approach is to use the CDF of the highest and second highest variables to calculate the CDF of the sum of the highest two variables.
  • #1
EngWiPy
1,368
61
Hi,

I have K i.i.d. exponentially distributed random variables with mean unity. I need to find the CDF and PDF of the summation of the largest two random variables. How can I do that? The problem in this case is that the combinations are not independent.

Thanks in advance
 
Physics news on Phys.org
  • #2
EDIT: Oops, sorry, when I said it was clear that the [itex]X_i+X_j[/itex] are clearly iid I was mistaken. I'm not sure that they're not independent, but if they are independent, it certainly isn't clear.

--------

The CDF of this random variable, let's call it [itex]X[/itex], is given by:

[tex]a \mapsto P(X < a)[/tex]

What's the probability that the sum of the largest two of your exponentials is less than [itex]a[/itex]? Well, the sum of the largest two is less than [itex]a[/itex] iff the sum of every two is less than [itex]a[/itex]. In other words, if we let [itex]X_1, \dots, X_K[/itex] be your exponentials, then:

[tex]P(X < a) = P(X_1 + X_2 < a\mbox{ and }X_1 + X_3 < a\mbox{ and } \dots \mbox{ and }X_{K-1} + X_K < a)[/tex]

Since the [itex]X_i[/itex] are iid, [STRIKE]it's not hard to see that the [itex]X_i+X_j[/itex] are iid[/STRIKE]. As such, we can rewrite the above:

[tex]P(X < a) = \prod_{1\leq i<j\leq K}P(X_i + X_j < a) = P(X_1 + X_2 < a)^{{K\choose 2}}[/tex]

[itex]a \mapsto P(X_1 + X_2 < a)[/itex] is the CDF of a [itex]\Gamma (2,1)[/itex] distributed random variable.
 
Last edited:
  • #3
AKG said:
EDIT: Oops, sorry, when I said it was clear that the [itex]X_i+X_j[/itex] are clearly iid I was mistaken. I'm not sure that they're not independent, but if they are independent, it certainly isn't clear.

--------

The CDF of this random variable, let's call it [itex]X[/itex], is given by:

[tex]a \mapsto P(X < a)[/tex]

What's the probability that the sum of the largest two of your exponentials is less than [itex]a[/itex]? Well, the sum of the largest two is less than [itex]a[/itex] iff the sum of every two is less than [itex]a[/itex]. In other words, if we let [itex]X_1, \dots, X_K[/itex] be your exponentials, then:

[tex]P(X < a) = P(X_1 + X_2 < a\mbox{ and }X_1 + X_3 < a\mbox{ and } \dots \mbox{ and }X_{K-1} + X_K < a)[/tex]

Since the [itex]X_i[/itex] are iid, [STRIKE]it's not hard to see that the [itex]X_i+X_j[/itex] are iid[/STRIKE]. As such, we can rewrite the above:

[tex]P(X < a) = \prod_{1\leq i<j\leq K}P(X_i + X_j < a) = P(X_1 + X_2 < a)^{{K\choose 2}}[/tex]

[itex]a \mapsto P(X_1 + X_2 < a)[/itex] is the CDF of a [itex]\Gamma (2,1)[/itex] distributed random variable.

Interesting! But are the events X1+X2<a and X1+X3<a are independent?
 
  • #4
S_David said:
Hi,

I have K i.i.d. exponentially distributed random variables with mean unity. I need to find the CDF and PDF of the summation of the largest two random variables. How can I do that? The problem in this case is that the combinations are not independent.

Thanks in advance

Hey S_David.

The first thing is to get the distribution for the largest and second largest distribution. This can be done with order statistics.

Once you have these distributions, then if they are of the same type you can use MGF's to find the result type of adding the two (usually this is a good idea because in many situations adding to distributions that are i.i.d results in the same distribution with different parameters).

If the type is complex, use the convolution theorem to get the CDF and hence the PDF of the sum of the two variables. These variables have to be independent, but not necessarily identical distributed.

For convolution algorithm and more depth:

http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter7.pdf

Formula for order statistic given a known PDF can be found here:

http://www.encyclopediaofmath.org/index.php/Order_statistic
 
  • #5
CDF of highest of n = Fn(x)
2nd highest < x if either highest < x or exactly one (any of the n) > x:
CDF of 2nd highest = Fn(x) + (1-Fn(x))*n*Fn-1(x)
= n*Fn-1(x) - (n-1)*Fn(x)
CDF of sum of highest two = ∫y(n*Fn-1(x-y) - (n-1)*Fn(x-y)).dFn(y)
= ∫y(n*Fn-1(x-y) - (n-1)*Fn(x-y))*n*Fn-1(y).dF(y)
 
  • #6
haruspex said:
CDF of highest of n = Fn(x)
2nd highest < x if either highest < x or exactly one (any of the n) > x:
CDF of 2nd highest = Fn(x) + (1-Fn(x))*n*Fn-1(x)
= n*Fn-1(x) - (n-1)*Fn(x)
CDF of sum of highest two = ∫y(n*Fn-1(x-y) - (n-1)*Fn(x-y)).dFn(y)
= ∫y(n*Fn-1(x-y) - (n-1)*Fn(x-y))*n*Fn-1(y).dF(y)
On second thoughts, still an independence problem there.
Try 2:
P[2nd highest < x | highest = y > x] = (F(x)/F(y))n-1
P[2nd highest < x | highest = y < x] = 1
CDF of sum of highest two = Fn(x/2) + ∫y>x/2(F(x-y)/F(y))n-1.dFn(y)
= Fn(x/2) + n∫y>x/2F(x-y)n-1.dF(y)
Looks nicer at least.
 
Last edited:

1. What are CDF and PDF of order statistics?

CDF (Cumulative Distribution Function) and PDF (Probability Density Function) of order statistics refer to the statistical tools used to analyze the distribution of the ordered values in a sample. They help in understanding the probability of a data point being equal to or less than a certain value in a dataset.

2. How are CDF and PDF of order statistics calculated?

CDF of order statistics can be calculated by arranging the values in the dataset in ascending order and then calculating the cumulative sum of the probabilities. PDF of order statistics can be calculated by differentiating the CDF function with respect to the data point.

3. What is the significance of CDF and PDF of order statistics?

CDF and PDF of order statistics are useful in understanding the distribution of the ordered values in a dataset. They can help in identifying outliers, determining the probability of certain values occurring, and making statistical inferences about the data.

4. Can CDF and PDF of order statistics be used for any type of data?

Yes, CDF and PDF of order statistics can be used for any type of data as long as the data is continuous and follows a certain distribution, such as normal, exponential, or uniform.

5. How can CDF and PDF of order statistics be visualized?

CDF and PDF of order statistics can be visualized using graphs, such as line graphs or histogram plots. These graphs can help in understanding the distribution of the ordered values and making comparisons between different datasets.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
742
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
899
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
346
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
479
  • Set Theory, Logic, Probability, Statistics
2
Replies
35
Views
3K
Back
Top