Undergrad Finding CDF of Maximum of Order Statistics of Random Variables

  • Thread starter Thread starter EngWiPy
  • Start date Start date
  • Tags Tags
    Statistics
Click For Summary
The discussion focuses on finding the cumulative distribution function (CDF) of the maximum of order statistics derived from random variables defined as the ratio of independent identically distributed (i.i.d.) variables. The challenge arises from the dependence introduced by selecting the largest denominators for the ratios. A new set of random variables is proposed to simplify the analysis, leading to the CDF of the maximum being expressed in terms of conditional probabilities and joint distributions. The conversation also explores the possibility of obtaining a closed-form solution through specific distributions, particularly exponential ones, while addressing the complexities of nested integrals. Ultimately, the participants suggest that while the problem is intricate, it may yield to analytical solutions under certain conditions.
EngWiPy
Messages
1,361
Reaction score
61
Suppose that I have these random variables ##\eta_k=\alpha_k/\beta_k## for ##k=1,\,2,\,\ldots,\,K##, where ##\{\alpha_k,\,\beta_k\}## are i.i.d. random variables. Now suppose that I select ##M\leq K## random variables such that denominators are the largest ##M## random variables. That is, suppose that ##\beta_{(1)}\geq\beta_{(2)}\geq\cdots\geq \beta_{(K)}##. Then the resulting composite random variables are ##\eta_{i_k}=\alpha_{i_k}/\beta_{(k)}##, for ##k=1,\,2,\,\ldots,\,M##, where ##i_k## is the index that corresponds to the ##k##th largest random variables ##\beta_{(k)}##.

I want to find the CDF of ##\max_{k}\eta_{i_k}##, which is defined as

\text{Pr}\left[\max_{k=1,...,M}\,\eta_{i_k}\leq x\right]

how can I find it? If the random variables ##\{\eta_{i_k}\}## are independent, we can write the above probability as

\prod_{k=1}^N\text{Pr}\left[\eta_{i_k}\leq x\right]

but they are not independent because of the ordered denominator. So, how can it be found then?

Thanks in advance
 
Physics news on Phys.org
The presentation is confusing because it seeks to define ##\eta_{i_k}## to be ##\alpha_{i_k} / \beta_{(k)}## but that doesn't make sense because ##\eta_{i_k}## already has a definition (given that ##i_k## is separately defined above), which is ##\alpha_{i_k} / \beta_{i_k}##.

To avoid this confusion, it is necessary to define a new set of random variables, say ##(\xi_k)_{1\le k\le M}## such that ##\xi_k=\alpha_{i_k} / \beta_{(k)}##.

Then what you are seeking is the CDF of the random variable ##\zeta \triangleq \max_{1\le k\le M} \xi_k##, which is given by:
$$F_\zeta(y) \triangleq Pr[\zeta\le y] = Pr\left[\bigwedge_{1\le k\le M}\xi_k \le y\right]$$



To calculate that you can use conditional probabilities. Provided the distributions are absolutely continuous, we can use the formula from https://en.wikipedia.org/wiki/Order...tics_of_an_absolutely_continuous_distribution

for the joint pdf of ##\beta_{(1)},...,\beta_{(M)}##. We write that as ##f_{\vec\beta}(\vec x)## where ##\vec\beta## and ##\vec x## are the vectors ##(\beta_{(1)},...,\beta_{(M)})## and ##(x_1,...,x_M)##. Then, writing ##d^M\vec x## for the incremental product ##dx_1dx_2...dx_M## and ##A## for the subset of ##\mathbb R^M## such that ##x_1\le x_2\le ...\le x_M##, we have:

\begin{align*}
F_\zeta(y)
% 1
&=Pr[\zeta \le y]
% 2
\\&= \int_A Pr[\zeta \le y\ |\ \vec\beta=\vec x]f_{\vec\beta}(\vec x)d^M\vec x
% 3
\\&=
\int_A
Pr\left[\bigwedge_{k=1}^M\xi_k \le y\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
% 4
\\&=
\int_A
\prod_{k=1}^M Pr\left[\xi_k \le y\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
\end{align*}
since the ##\xi_k## are conditionally independent once ##\vec\beta## is fixed by the outer integral
\begin{align*}
% 5
\\
\quad\quad\quad\quad
&=
\int_A
\prod_{k=1}^M Pr\left[\alpha_k \le x_ky\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
% 6
\\&=
\int_A
\prod_{k=1}^M f_\alpha(x_ky)
f_{\vec\beta}(\vec x)d^M\vec x
\end{align*}
where we have dropped the conditionality, since the ##\alpha##s are independent of the ##\beta##s

To take it further requires specification of the pdfs of ##\alpha## and ##\beta##. For some distributions, the integral will be simple. For others it will not be possible to do analytically and will require numeric integration.

I just noticed that in the link, the subscript (1) is used for the smallest order statistic, whereas you have used it for the largest one. So the formula from the link will need a little adaptation.
 
  • Like
Likes EngWiPy
Thanks. You are right. I need to define a new random variable for ##\alpha_{i_k}/\beta_{(k)}##. I continued the analysis using the same idea yesterday, namely, using conditional independence. The CDF of ##\zeta## (according to your notation) is

\text{Pr}\left[\zeta\leq y\right]=\text{Pr}\left[\zeta_1\leq y,\,\zeta_2\leq y,\ldots,\,\zeta_M\leq y\right]

which can be written as (I like to detail the notations. It is more clear to me. I should write the conditions as ##\beta_{(k)}=x_k##, but to save some time I didn't do that)

\text{E}_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left[\text{Pr}\left[\zeta_1\leq y,\,\zeta_2\leq y,\ldots,\,\zeta_M\leq y\right]\Big|\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right] \\= \text{E}_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left[\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]\Big|\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right] \\= \int_{\beta_{(1)}}\int_{\beta_{(2)}}\cdots\int_{\beta_{(M)}}\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]f_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left(\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right)\,d\beta_{(1)}d\beta_{(2)}\cdots d\beta_{(M)}

I found that the joint PDF for ##\beta_{(1)}\geq\beta_{(2)}\geq\cdots\geq\beta_{(M)}\geq 0## can be written as

f_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left(\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right)=M!{K\choose M}\left[F_{\beta}(\beta_{(M)})\right]^{K-M}\prod_{k=1}^M f_{\beta}(\beta_{(k)})

where ##F_{\beta}(.)## and ##f_{\beta}(.)## are the CDF and PDF of the unordered random variables ##\{\beta_k\}##. So, the CDF in question becomes

\text{Pr}\left[\zeta\leq y\right]=M!{K\choose M}\int_{\beta_{(1)}}\int_{\beta_{(2)}}\cdots\int_{\beta_{(M)}}\left[F_{\beta}(\beta_{(M)})\right]^{K-M}\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]f_{\beta}(\beta_{(k)})\,d\beta_{(k)}

Now suppose these CDF and PDF of the unordered random variables ##\{\beta_k\}## are given by ##F_{\beta}(x)=1-e^{-x}## and ##f_{\beta}(x)=e^{-x}##. How can we proceed? I would like to find a closed form solution to the above integral. Is that possible?

Thanks
 
Last edited:
I have not checked the implementation of the cdfs for the ##\beta## order statistics but, taking that as accepted, here are a few observations:
  • The integration variables should not be ##\beta##s because those are random variables and integration needs to be done with respect to ordinary real variables (unless it is a Stieltjes integral but in that case the pdf(s) for the ##\beta##s should not be in the integrand). That is why I used ##x_1,...,x_M## above. Each ##\beta_{(*)}## in your final formula should be replaced by ##x_*##.
  • Given that you have written it as nested scalar integrals rather than a vector integral (per my post), the differential part at the end needs to have one entry per integral. So
    • where it is written ##d\beta_{(k)}## it should have ##dx_M dx_{M-1}...dx_1##; and
    • the integration limits are ##\int_{-\infty}^\infty \int_{-\infty}^{x_1}...\int_{-\infty}^{x_{M-1}}##
  • It looks to me like the probability inside the iterated product was supposed to be conditional on ##\beta_{(k)}=x_k##. Assuming that to be so, the item currently written as ##Pr[\zeta_k<y]## is really ##Pr[\zeta_k\le y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k/x_k\le y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k\le x_k y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k\le x_k y] = F_\alpha(x_ky)##. So at that point you need to introduce a cdf for ##\alpha##.
At this point there seems no reason to give up hope of an analytic solution existing. Whether it does will depend on the choice of distribution of ##\alpha##.
 
Yes, I think your presentation is more accurate. The CDF of ##\alpha## is also exponential, i.e., ##F_{\alpha}(y)=1-e^{-y}##. The problem I think is how to evaluate the integrals because of their limits. We cannot separate the integrals as a product of single integral because all depends on each other.
 
I can't be sure without doing the calcs but my expectation is that, because of the nice nature of the distributions of ##\alpha,\beta##, the integral can be done analytically. Look at the non-constant factors in the integrand. They are all of the form ##e^{-x_j}, e^{-x_jy},(1-e^{-x_j})## or ##(1-e^{-x_jy})##. When we expand out the integrand using the distributive rule (use the binomial theorem to expand the first factor, which is ##(1-e^{-x_M})^{K-M}##), we get a linear combination of terms, each of which has the form:

$$a\times \exp\left(-\left[
\sum_{k=1}^M x_k +
s x_M +
y \sum_{k\in A} x_k
\right]\right)$$

for some term-specific constant ##a##, term-specific integer ##s## and term-specific subset ##A## of ##\{1,2,...,M\}##.

Each term can be integrated separately and since ##y## is constant in all integrations, the integrations should be straightforward.

Note that, given the choice of distributions, the lower limit of each integral is 0.

I suggest having a go for the case ##K=3,M=2## and seeing if it comes out. My guess is that it will. If it does, it should point the way towards a formula for the general integral.
 
  • Like
Likes EngWiPy
Yes, doing it for a special case should point to some pattern in the nested integrations. Thanks for your replies.
 
I am changing some notations compared to before. but I think this post is self-sufficient.

To simplify things, I assumed that I have two random variables ##\beta_1, \beta_2## and that ##B_{(1)}\geq\beta_{(2)}## are the order statistics. The random variable ##\alpha_k## is selected whenever the random variable ##\beta_k## is selected, and the index corresponding to selecting ##\beta_{(k)}## is ##i_k##, for ##k=1,2##. I assumed that ##\eta_{k}=\alpha_{i_k}/\beta_{(k)}##. I need to find the CDF

\text{Pr}\left[\max_{k=1,2}\eta_k\leq \zeta\right]=\text{Pr}\left[\frac{\alpha_1}{\beta_{(1)}}\leq \zeta,\,\frac{\alpha_2}{\beta_{(2)}}\leq \zeta\right]

which can be written as

\int_0^{\infty}\int_0^{x_1}\text{Pr}\left[\frac{\alpha_1}{x_1}\leq \zeta,\,\frac{\alpha_2}{x_2}\leq \zeta\right]f_{\beta_{(1)},\beta_{(2)}}(x_1,\,x_2)\,dx_1,\,dx_2\\=2\int_0^{\infty}\int_0^{x_1}\prod_{k=1}^2\text{Pr}\left[\frac{\alpha_k}{x_k}\leq \zeta\right]f_{\beta}(x_k)\,dx_k\\=2\int_0^{\infty}\left[\text{Pr}\left[\frac{\alpha_1}{x_1}\leq \zeta\right]f_{\beta}(x_1)\int_0^{x_1}\text{Pr}\left[\frac{\alpha_2}{x_2}\leq \zeta\right]f_{\beta}(x_2)\,dx_2\right]\,dx_1

I assume that all random variables are independent and identically distributed exponential random variables with parameter 1. So, the above integral can be written as

2\int_0^{\infty}\left[\left(1-e^{-x_1\zeta}\right)e^{-x_1}\underbrace{\int_0^{x_1}\left(1-e^{-x_2\zeta}\right)e^{-x_2}\,dx_2}_{-e^{-x_1}+1+\frac{e^{-x_1[\zeta+1]}-1}{\zeta+1}}\right]\,dx_1

Although the above integral is easy to evaluate, considering the general case of ##K## random variables, things appear to get messy using this approach. Is there another way using which I can find the CDF?
 
In this reference, Tests of significance for samples of the χ2 population with two degrees of freedom, the author showed that the order statistics can be transformed to be independent random variables, but didn't quite understand it, and if I can use it.
 

Similar threads

  • · Replies 35 ·
2
Replies
35
Views
4K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 10 ·
Replies
10
Views
6K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 5 ·
Replies
5
Views
5K