Finding CDF of Maximum of Order Statistics of Random Variables

  • I
  • Thread starter EngWiPy
  • Start date
  • Tags
    Statistics
In summary, the conversation discusses the variables ##\eta_k=\alpha_k/\beta_k## and the selection of ##M## variables with the largest denominators. The resulting composite random variables are defined as ##\eta_{i_k}=\alpha_{i_k}/\beta_{(k)}##, where ##i_k## is the index corresponding to the ##k##th largest variable. The goal is to find the CDF of the maximum of these composite variables, denoted as ##\zeta##. Using conditional probabilities and the joint PDF of the unordered ##\beta## variables, the CDF is given by an integral involving the CDF and PDF of the unordered ##\beta## variables. If these are given,
  • #1
EngWiPy
1,368
61
Suppose that I have these random variables ##\eta_k=\alpha_k/\beta_k## for ##k=1,\,2,\,\ldots,\,K##, where ##\{\alpha_k,\,\beta_k\}## are i.i.d. random variables. Now suppose that I select ##M\leq K## random variables such that denominators are the largest ##M## random variables. That is, suppose that ##\beta_{(1)}\geq\beta_{(2)}\geq\cdots\geq \beta_{(K)}##. Then the resulting composite random variables are ##\eta_{i_k}=\alpha_{i_k}/\beta_{(k)}##, for ##k=1,\,2,\,\ldots,\,M##, where ##i_k## is the index that corresponds to the ##k##th largest random variables ##\beta_{(k)}##.

I want to find the CDF of ##\max_{k}\eta_{i_k}##, which is defined as

[tex]\text{Pr}\left[\max_{k=1,...,M}\,\eta_{i_k}\leq x\right][/tex]

how can I find it? If the random variables ##\{\eta_{i_k}\}## are independent, we can write the above probability as

[tex]\prod_{k=1}^N\text{Pr}\left[\eta_{i_k}\leq x\right][/tex]

but they are not independent because of the ordered denominator. So, how can it be found then?

Thanks in advance
 
Physics news on Phys.org
  • #2
The presentation is confusing because it seeks to define ##\eta_{i_k}## to be ##\alpha_{i_k} / \beta_{(k)}## but that doesn't make sense because ##\eta_{i_k}## already has a definition (given that ##i_k## is separately defined above), which is ##\alpha_{i_k} / \beta_{i_k}##.

To avoid this confusion, it is necessary to define a new set of random variables, say ##(\xi_k)_{1\le k\le M}## such that ##\xi_k=\alpha_{i_k} / \beta_{(k)}##.

Then what you are seeking is the CDF of the random variable ##\zeta \triangleq \max_{1\le k\le M} \xi_k##, which is given by:
$$F_\zeta(y) \triangleq Pr[\zeta\le y] = Pr\left[\bigwedge_{1\le k\le M}\xi_k \le y\right]$$



To calculate that you can use conditional probabilities. Provided the distributions are absolutely continuous, we can use the formula from https://en.wikipedia.org/wiki/Order...tics_of_an_absolutely_continuous_distribution

for the joint pdf of ##\beta_{(1)},...,\beta_{(M)}##. We write that as ##f_{\vec\beta}(\vec x)## where ##\vec\beta## and ##\vec x## are the vectors ##(\beta_{(1)},...,\beta_{(M)})## and ##(x_1,...,x_M)##. Then, writing ##d^M\vec x## for the incremental product ##dx_1dx_2...dx_M## and ##A## for the subset of ##\mathbb R^M## such that ##x_1\le x_2\le ...\le x_M##, we have:

\begin{align*}
F_\zeta(y)
% 1
&=Pr[\zeta \le y]
% 2
\\&= \int_A Pr[\zeta \le y\ |\ \vec\beta=\vec x]f_{\vec\beta}(\vec x)d^M\vec x
% 3
\\&=
\int_A
Pr\left[\bigwedge_{k=1}^M\xi_k \le y\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
% 4
\\&=
\int_A
\prod_{k=1}^M Pr\left[\xi_k \le y\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
\end{align*}
since the ##\xi_k## are conditionally independent once ##\vec\beta## is fixed by the outer integral
\begin{align*}
% 5
\\
\quad\quad\quad\quad
&=
\int_A
\prod_{k=1}^M Pr\left[\alpha_k \le x_ky\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
% 6
\\&=
\int_A
\prod_{k=1}^M f_\alpha(x_ky)
f_{\vec\beta}(\vec x)d^M\vec x
\end{align*}
where we have dropped the conditionality, since the ##\alpha##s are independent of the ##\beta##s

To take it further requires specification of the pdfs of ##\alpha## and ##\beta##. For some distributions, the integral will be simple. For others it will not be possible to do analytically and will require numeric integration.

I just noticed that in the link, the subscript (1) is used for the smallest order statistic, whereas you have used it for the largest one. So the formula from the link will need a little adaptation.
 
  • Like
Likes EngWiPy
  • #3
Thanks. You are right. I need to define a new random variable for ##\alpha_{i_k}/\beta_{(k)}##. I continued the analysis using the same idea yesterday, namely, using conditional independence. The CDF of ##\zeta## (according to your notation) is

[tex]\text{Pr}\left[\zeta\leq y\right]=\text{Pr}\left[\zeta_1\leq y,\,\zeta_2\leq y,\ldots,\,\zeta_M\leq y\right][/tex]

which can be written as (I like to detail the notations. It is more clear to me. I should write the conditions as ##\beta_{(k)}=x_k##, but to save some time I didn't do that)

[tex]\text{E}_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left[\text{Pr}\left[\zeta_1\leq y,\,\zeta_2\leq y,\ldots,\,\zeta_M\leq y\right]\Big|\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right] \\= \text{E}_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left[\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]\Big|\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right] \\= \int_{\beta_{(1)}}\int_{\beta_{(2)}}\cdots\int_{\beta_{(M)}}\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]f_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left(\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right)\,d\beta_{(1)}d\beta_{(2)}\cdots d\beta_{(M)}[/tex]

I found that the joint PDF for ##\beta_{(1)}\geq\beta_{(2)}\geq\cdots\geq\beta_{(M)}\geq 0## can be written as

[tex]f_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left(\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right)=M!{K\choose M}\left[F_{\beta}(\beta_{(M)})\right]^{K-M}\prod_{k=1}^M f_{\beta}(\beta_{(k)})[/tex]

where ##F_{\beta}(.)## and ##f_{\beta}(.)## are the CDF and PDF of the unordered random variables ##\{\beta_k\}##. So, the CDF in question becomes

[tex]\text{Pr}\left[\zeta\leq y\right]=M!{K\choose M}\int_{\beta_{(1)}}\int_{\beta_{(2)}}\cdots\int_{\beta_{(M)}}\left[F_{\beta}(\beta_{(M)})\right]^{K-M}\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]f_{\beta}(\beta_{(k)})\,d\beta_{(k)}[/tex]

Now suppose these CDF and PDF of the unordered random variables ##\{\beta_k\}## are given by ##F_{\beta}(x)=1-e^{-x}## and ##f_{\beta}(x)=e^{-x}##. How can we proceed? I would like to find a closed form solution to the above integral. Is that possible?

Thanks
 
Last edited:
  • #4
I have not checked the implementation of the cdfs for the ##\beta## order statistics but, taking that as accepted, here are a few observations:
  • The integration variables should not be ##\beta##s because those are random variables and integration needs to be done with respect to ordinary real variables (unless it is a Stieltjes integral but in that case the pdf(s) for the ##\beta##s should not be in the integrand). That is why I used ##x_1,...,x_M## above. Each ##\beta_{(*)}## in your final formula should be replaced by ##x_*##.
  • Given that you have written it as nested scalar integrals rather than a vector integral (per my post), the differential part at the end needs to have one entry per integral. So
    • where it is written ##d\beta_{(k)}## it should have ##dx_M dx_{M-1}...dx_1##; and
    • the integration limits are ##\int_{-\infty}^\infty \int_{-\infty}^{x_1}...\int_{-\infty}^{x_{M-1}}##
  • It looks to me like the probability inside the iterated product was supposed to be conditional on ##\beta_{(k)}=x_k##. Assuming that to be so, the item currently written as ##Pr[\zeta_k<y]## is really ##Pr[\zeta_k\le y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k/x_k\le y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k\le x_k y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k\le x_k y] = F_\alpha(x_ky)##. So at that point you need to introduce a cdf for ##\alpha##.
At this point there seems no reason to give up hope of an analytic solution existing. Whether it does will depend on the choice of distribution of ##\alpha##.
 
  • #5
Yes, I think your presentation is more accurate. The CDF of ##\alpha## is also exponential, i.e., ##F_{\alpha}(y)=1-e^{-y}##. The problem I think is how to evaluate the integrals because of their limits. We cannot separate the integrals as a product of single integral because all depends on each other.
 
  • #6
I can't be sure without doing the calcs but my expectation is that, because of the nice nature of the distributions of ##\alpha,\beta##, the integral can be done analytically. Look at the non-constant factors in the integrand. They are all of the form ##e^{-x_j}, e^{-x_jy},(1-e^{-x_j})## or ##(1-e^{-x_jy})##. When we expand out the integrand using the distributive rule (use the binomial theorem to expand the first factor, which is ##(1-e^{-x_M})^{K-M}##), we get a linear combination of terms, each of which has the form:

$$a\times \exp\left(-\left[
\sum_{k=1}^M x_k +
s x_M +
y \sum_{k\in A} x_k
\right]\right)$$

for some term-specific constant ##a##, term-specific integer ##s## and term-specific subset ##A## of ##\{1,2,...,M\}##.

Each term can be integrated separately and since ##y## is constant in all integrations, the integrations should be straightforward.

Note that, given the choice of distributions, the lower limit of each integral is 0.

I suggest having a go for the case ##K=3,M=2## and seeing if it comes out. My guess is that it will. If it does, it should point the way towards a formula for the general integral.
 
  • Like
Likes EngWiPy
  • #7
Yes, doing it for a special case should point to some pattern in the nested integrations. Thanks for your replies.
 
  • #8
I am changing some notations compared to before. but I think this post is self-sufficient.

To simplify things, I assumed that I have two random variables ##\beta_1, \beta_2## and that ##B_{(1)}\geq\beta_{(2)}## are the order statistics. The random variable ##\alpha_k## is selected whenever the random variable ##\beta_k## is selected, and the index corresponding to selecting ##\beta_{(k)}## is ##i_k##, for ##k=1,2##. I assumed that ##\eta_{k}=\alpha_{i_k}/\beta_{(k)}##. I need to find the CDF

[tex]\text{Pr}\left[\max_{k=1,2}\eta_k\leq \zeta\right]=\text{Pr}\left[\frac{\alpha_1}{\beta_{(1)}}\leq \zeta,\,\frac{\alpha_2}{\beta_{(2)}}\leq \zeta\right][/tex]

which can be written as

[tex]\int_0^{\infty}\int_0^{x_1}\text{Pr}\left[\frac{\alpha_1}{x_1}\leq \zeta,\,\frac{\alpha_2}{x_2}\leq \zeta\right]f_{\beta_{(1)},\beta_{(2)}}(x_1,\,x_2)\,dx_1,\,dx_2\\=2\int_0^{\infty}\int_0^{x_1}\prod_{k=1}^2\text{Pr}\left[\frac{\alpha_k}{x_k}\leq \zeta\right]f_{\beta}(x_k)\,dx_k\\=2\int_0^{\infty}\left[\text{Pr}\left[\frac{\alpha_1}{x_1}\leq \zeta\right]f_{\beta}(x_1)\int_0^{x_1}\text{Pr}\left[\frac{\alpha_2}{x_2}\leq \zeta\right]f_{\beta}(x_2)\,dx_2\right]\,dx_1[/tex]

I assume that all random variables are independent and identically distributed exponential random variables with parameter 1. So, the above integral can be written as

[tex]2\int_0^{\infty}\left[\left(1-e^{-x_1\zeta}\right)e^{-x_1}\underbrace{\int_0^{x_1}\left(1-e^{-x_2\zeta}\right)e^{-x_2}\,dx_2}_{-e^{-x_1}+1+\frac{e^{-x_1[\zeta+1]}-1}{\zeta+1}}\right]\,dx_1[/tex]

Although the above integral is easy to evaluate, considering the general case of ##K## random variables, things appear to get messy using this approach. Is there another way using which I can find the CDF?
 
  • #9
In this reference, Tests of significance for samples of the χ2 population with two degrees of freedom, the author showed that the order statistics can be transformed to be independent random variables, but didn't quite understand it, and if I can use it.
 

1. What is the definition of the CDF of Maximum of Order Statistics of Random Variables?

The CDF (cumulative distribution function) of the maximum of order statistics of random variables is a mathematical function that describes the probability that the maximum value of a set of ordered random variables is equal to or less than a given value.

2. How is the CDF of Maximum of Order Statistics of Random Variables calculated?

The CDF of the maximum of order statistics of random variables can be calculated by finding the CDF of each individual random variable, then taking the maximum of those CDFs.

3. What is the significance of the CDF of Maximum of Order Statistics of Random Variables?

The CDF of the maximum of order statistics of random variables is useful in various statistical analyses, such as determining the distribution of the maximum value in a data set or calculating the probability of extreme events.

4. How is the CDF of Maximum of Order Statistics of Random Variables used in real-world applications?

The CDF of the maximum of order statistics of random variables is commonly used in risk assessment, reliability analysis, and extreme value analysis in fields such as finance, engineering, and environmental science.

5. Is there a difference between the CDF of Maximum of Order Statistics of Random Variables and the CDF of the Maximum of Random Variables?

Yes, there is a difference. The CDF of the maximum of order statistics of random variables considers the maximum value in a set of ordered random variables, while the CDF of the maximum of random variables considers the maximum value of a set of independent random variables without any specific order.

Similar threads

  • Set Theory, Logic, Probability, Statistics
2
Replies
35
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
0
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
888
Back
Top