Finding CDF of Maximum of Order Statistics of Random Variables

EngWiPy · May 8, 2018

Suppose that I have these random variables ##\eta_k=\alpha_k/\beta_k## for ##k=1,\,2,\,\ldots,\,K##, where ##\{\alpha_k,\,\beta_k\}## are i.i.d. random variables. Now suppose that I select ##M\leq K## random variables such that denominators are the largest ##M## random variables. That is, suppose that ##\beta_{(1)}\geq\beta_{(2)}\geq\cdots\geq \beta_{(K)}##. Then the resulting composite random variables are ##\eta_{i_k}=\alpha_{i_k}/\beta_{(k)}##, for ##k=1,\,2,\,\ldots,\,M##, where ##i_k## is the index that corresponds to the ##k##th largest random variables ##\beta_{(k)}##.

I want to find the CDF of ##\max_{k}\eta_{i_k}##, which is defined as

[tex]\text{Pr}\left[\max_{k=1,...,M}\,\eta_{i_k}\leq x\right][/tex]

how can I find it? If the random variables ##\{\eta_{i_k}\}## are independent, we can write the above probability as

[tex]\prod_{k=1}^N\text{Pr}\left[\eta_{i_k}\leq x\right][/tex]

but they are not independent because of the ordered denominator. So, how can it be found then?

Thanks in advance

andrewkirk · May 9, 2018

The presentation is confusing because it seeks to define ##\eta_{i_k}## to be ##\alpha_{i_k} / \beta_{(k)}## but that doesn't make sense because ##\eta_{i_k}## already has a definition (given that ##i_k## is separately defined above), which is ##\alpha_{i_k} / \beta_{i_k}##.

To avoid this confusion, it is necessary to define a new set of random variables, say ##(\xi_k)_{1\le k\le M}## such that ##\xi_k=\alpha_{i_k} / \beta_{(k)}##.

Then what you are seeking is the CDF of the random variable ##\zeta \triangleq \max_{1\le k\le M} \xi_k##, which is given by:
$$F_\zeta(y) \triangleq Pr[\zeta\le y] = Pr\left[\bigwedge_{1\le k\le M}\xi_k \le y\right]$$

To calculate that you can use conditional probabilities. Provided the distributions are absolutely continuous, we can use the formula from https://en.wikipedia.org/wiki/Order...tics_of_an_absolutely_continuous_distribution

for the joint pdf of ##\beta_{(1)},...,\beta_{(M)}##. We write that as ##f_{\vec\beta}(\vec x)## where ##\vec\beta## and ##\vec x## are the vectors ##(\beta_{(1)},...,\beta_{(M)})## and ##(x_1,...,x_M)##. Then, writing ##d^M\vec x## for the incremental product ##dx_1dx_2...dx_M## and ##A## for the subset of ##\mathbb R^M## such that ##x_1\le x_2\le ...\le x_M##, we have:

\begin{align*}
F_\zeta(y)
% 1
&=Pr[\zeta \le y]
% 2
\\&= \int_A Pr[\zeta \le y\ |\ \vec\beta=\vec x]f_{\vec\beta}(\vec x)d^M\vec x
% 3
\\&=
\int_A
Pr\left[\bigwedge_{k=1}^M\xi_k \le y\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
% 4
\\&=
\int_A
\prod_{k=1}^M Pr\left[\xi_k \le y\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
\end{align*}
since the ##\xi_k## are conditionally independent once ##\vec\beta## is fixed by the outer integral
\begin{align*}
% 5
\\
\quad\quad\quad\quad
&=
\int_A
\prod_{k=1}^M Pr\left[\alpha_k \le x_ky\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
% 6
\\&=
\int_A
\prod_{k=1}^M f_\alpha(x_ky)
f_{\vec\beta}(\vec x)d^M\vec x
\end{align*}
where we have dropped the conditionality, since the ##\alpha##s are independent of the ##\beta##s

To take it further requires specification of the pdfs of ##\alpha## and ##\beta##. For some distributions, the integral will be simple. For others it will not be possible to do analytically and will require numeric integration.

I just noticed that in the link, the subscript (1) is used for the smallest order statistic, whereas you have used it for the largest one. So the formula from the link will need a little adaptation.

EngWiPy · May 9, 2018

Thanks. You are right. I need to define a new random variable for ##\alpha_{i_k}/\beta_{(k)}##. I continued the analysis using the same idea yesterday, namely, using conditional independence. The CDF of ##\zeta## (according to your notation) is

[tex]\text{Pr}\left[\zeta\leq y\right]=\text{Pr}\left[\zeta_1\leq y,\,\zeta_2\leq y,\ldots,\,\zeta_M\leq y\right][/tex]

which can be written as (I like to detail the notations. It is more clear to me. I should write the conditions as ##\beta_{(k)}=x_k##, but to save some time I didn't do that)

[tex]\text{E}_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left[\text{Pr}\left[\zeta_1\leq y,\,\zeta_2\leq y,\ldots,\,\zeta_M\leq y\right]\Big|\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right] \\= \text{E}_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left[\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]\Big|\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right] \\= \int_{\beta_{(1)}}\int_{\beta_{(2)}}\cdots\int_{\beta_{(M)}}\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]f_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left(\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right)\,d\beta_{(1)}d\beta_{(2)}\cdots d\beta_{(M)}[/tex]

I found that the joint PDF for ##\beta_{(1)}\geq\beta_{(2)}\geq\cdots\geq\beta_{(M)}\geq 0## can be written as

[tex]f_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left(\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right)=M!{K\choose M}\left[F_{\beta}(\beta_{(M)})\right]^{K-M}\prod_{k=1}^M f_{\beta}(\beta_{(k)})[/tex]

where ##F_{\beta}(.)## and ##f_{\beta}(.)## are the CDF and PDF of the unordered random variables ##\{\beta_k\}##. So, the CDF in question becomes

[tex]\text{Pr}\left[\zeta\leq y\right]=M!{K\choose M}\int_{\beta_{(1)}}\int_{\beta_{(2)}}\cdots\int_{\beta_{(M)}}\left[F_{\beta}(\beta_{(M)})\right]^{K-M}\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]f_{\beta}(\beta_{(k)})\,d\beta_{(k)}[/tex]

Now suppose these CDF and PDF of the unordered random variables ##\{\beta_k\}## are given by ##F_{\beta}(x)=1-e^{-x}## and ##f_{\beta}(x)=e^{-x}##. How can we proceed? I would like to find a closed form solution to the above integral. Is that possible?

Thanks

andrewkirk · May 9, 2018

I have not checked the implementation of the cdfs for the ##\beta## order statistics but, taking that as accepted, here are a few observations:

The integration variables should not be ##\beta##s because those are random variables and integration needs to be done with respect to ordinary real variables (unless it is a Stieltjes integral but in that case the pdf(s) for the ##\beta##s should not be in the integrand). That is why I used ##x_1,...,x_M## above. Each ##\beta_{(*)}## in your final formula should be replaced by ##x_*##.
Given that you have written it as nested scalar integrals rather than a vector integral (per my post), the differential part at the end needs to have one entry per integral. So
- where it is written ##d\beta_{(k)}## it should have ##dx_M dx_{M-1}...dx_1##; and
- the integration limits are ##\int_{-\infty}^\infty \int_{-\infty}^{x_1}...\int_{-\infty}^{x_{M-1}}##
It looks to me like the probability inside the iterated product was supposed to be conditional on ##\beta_{(k)}=x_k##. Assuming that to be so, the item currently written as ##Pr[\zeta_k<y]## is really ##Pr[\zeta_k\le y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k/x_k\le y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k\le x_k y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k\le x_k y] = F_\alpha(x_ky)##. So at that point you need to introduce a cdf for ##\alpha##.

At this point there seems no reason to give up hope of an analytic solution existing. Whether it does will depend on the choice of distribution of ##\alpha##.

EngWiPy · May 10, 2018

Yes, I think your presentation is more accurate. The CDF of ##\alpha## is also exponential, i.e., ##F_{\alpha}(y)=1-e^{-y}##. The problem I think is how to evaluate the integrals because of their limits. We cannot separate the integrals as a product of single integral because all depends on each other.

andrewkirk · May 10, 2018

I can't be sure without doing the calcs but my expectation is that, because of the nice nature of the distributions of ##\alpha,\beta##, the integral can be done analytically. Look at the non-constant factors in the integrand. They are all of the form ##e^{-x_j}, e^{-x_jy},(1-e^{-x_j})## or ##(1-e^{-x_jy})##. When we expand out the integrand using the distributive rule (use the binomial theorem to expand the first factor, which is ##(1-e^{-x_M})^{K-M}##), we get a linear combination of terms, each of which has the form:

$$a\times \exp\left(-\left[
\sum_{k=1}^M x_k +
s x_M +
y \sum_{k\in A} x_k
\right]\right)$$

for some term-specific constant ##a##, term-specific integer ##s## and term-specific subset ##A## of ##\{1,2,...,M\}##.

Each term can be integrated separately and since ##y## is constant in all integrations, the integrations should be straightforward.

Note that, given the choice of distributions, the lower limit of each integral is 0.

I suggest having a go for the case ##K=3,M=2## and seeing if it comes out. My guess is that it will. If it does, it should point the way towards a formula for the general integral.

EngWiPy · May 10, 2018

Yes, doing it for a special case should point to some pattern in the nested integrations. Thanks for your replies.

EngWiPy · Jun 28, 2018

I am changing some notations compared to before. but I think this post is self-sufficient.

To simplify things, I assumed that I have two random variables ##\beta_1, \beta_2## and that ##B_{(1)}\geq\beta_{(2)}## are the order statistics. The random variable ##\alpha_k## is selected whenever the random variable ##\beta_k## is selected, and the index corresponding to selecting ##\beta_{(k)}## is ##i_k##, for ##k=1,2##. I assumed that ##\eta_{k}=\alpha_{i_k}/\beta_{(k)}##. I need to find the CDF

[tex]\text{Pr}\left[\max_{k=1,2}\eta_k\leq \zeta\right]=\text{Pr}\left[\frac{\alpha_1}{\beta_{(1)}}\leq \zeta,\,\frac{\alpha_2}{\beta_{(2)}}\leq \zeta\right][/tex]

which can be written as

[tex]\int_0^{\infty}\int_0^{x_1}\text{Pr}\left[\frac{\alpha_1}{x_1}\leq \zeta,\,\frac{\alpha_2}{x_2}\leq \zeta\right]f_{\beta_{(1)},\beta_{(2)}}(x_1,\,x_2)\,dx_1,\,dx_2\\=2\int_0^{\infty}\int_0^{x_1}\prod_{k=1}^2\text{Pr}\left[\frac{\alpha_k}{x_k}\leq \zeta\right]f_{\beta}(x_k)\,dx_k\\=2\int_0^{\infty}\left[\text{Pr}\left[\frac{\alpha_1}{x_1}\leq \zeta\right]f_{\beta}(x_1)\int_0^{x_1}\text{Pr}\left[\frac{\alpha_2}{x_2}\leq \zeta\right]f_{\beta}(x_2)\,dx_2\right]\,dx_1[/tex]

I assume that all random variables are independent and identically distributed exponential random variables with parameter 1. So, the above integral can be written as

[tex]2\int_0^{\infty}\left[\left(1-e^{-x_1\zeta}\right)e^{-x_1}\underbrace{\int_0^{x_1}\left(1-e^{-x_2\zeta}\right)e^{-x_2}\,dx_2}_{-e^{-x_1}+1+\frac{e^{-x_1[\zeta+1]}-1}{\zeta+1}}\right]\,dx_1[/tex]

Although the above integral is easy to evaluate, considering the general case of ##K## random variables, things appear to get messy using this approach. Is there another way using which I can find the CDF?

EngWiPy · Jun 28, 2018

In this reference, Tests of significance for samples of the χ2 population with two degrees of freedom, the author showed that the order statistics can be transformed to be independent random variables, but didn't quite understand it, and if I can use it.

Finding CDF of Maximum of Order Statistics of Random Variables

1. What is the definition of the CDF of Maximum of Order Statistics of Random Variables?

2. How is the CDF of Maximum of Order Statistics of Random Variables calculated?

3. What is the significance of the CDF of Maximum of Order Statistics of Random Variables?

4. How is the CDF of Maximum of Order Statistics of Random Variables used in real-world applications?

5. Is there a difference between the CDF of Maximum of Order Statistics of Random Variables and the CDF of the Maximum of Random Variables?

Similar threads

Hot Threads

Recent Insights