Finding CDF of Maximum of Order Statistics of Random Variables

  • Context: Undergrad 
  • Thread starter Thread starter EngWiPy
  • Start date Start date
  • Tags Tags
    Statistics
Click For Summary

Discussion Overview

The discussion revolves around finding the cumulative distribution function (CDF) of the maximum of a set of random variables defined as the ratio of independent identically distributed (i.i.d.) random variables. Participants explore the implications of selecting the largest denominators from these variables and the challenges posed by their dependence.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant defines random variables as ##\eta_k=\alpha_k/\beta_k## and seeks the CDF of ##\max_{k}\eta_{i_k}##, noting the dependence introduced by the ordered denominators.
  • Another participant suggests redefining the random variables to avoid confusion and proposes calculating the CDF of a new variable ##\zeta = \max_{1\le k\le M} \xi_k## using conditional probabilities.
  • Further analysis involves expressing the CDF in terms of integrals over the joint PDF of the ordered random variables, with a focus on the conditional independence of the ##\xi_k## given the ##\beta## values.
  • Participants discuss the joint PDF for the ordered variables and the implications of the distributions of ##\alpha## and ##\beta## on the integrals needed for the CDF.
  • One participant expresses uncertainty about the limits of integration and the need to adjust the notation for clarity, emphasizing the conditional nature of the probabilities involved.
  • Another participant agrees on the need for clarity and introduces the exponential distribution for ##\alpha##, raising concerns about the complexity of evaluating the integrals due to their interdependencies.
  • Finally, a participant expresses optimism about the potential for an analytic solution based on the properties of the distributions involved.

Areas of Agreement / Disagreement

Participants generally agree on the need to redefine variables for clarity and the importance of conditional probabilities. However, there remains uncertainty regarding the feasibility of obtaining an analytic solution and the correct approach to evaluating the integrals.

Contextual Notes

The discussion highlights limitations related to the dependence of the random variables, the complexity of the integrals, and the need for specific distribution forms for ##\alpha## and ##\beta## to facilitate analysis.

EngWiPy
Messages
1,361
Reaction score
61
Suppose that I have these random variables ##\eta_k=\alpha_k/\beta_k## for ##k=1,\,2,\,\ldots,\,K##, where ##\{\alpha_k,\,\beta_k\}## are i.i.d. random variables. Now suppose that I select ##M\leq K## random variables such that denominators are the largest ##M## random variables. That is, suppose that ##\beta_{(1)}\geq\beta_{(2)}\geq\cdots\geq \beta_{(K)}##. Then the resulting composite random variables are ##\eta_{i_k}=\alpha_{i_k}/\beta_{(k)}##, for ##k=1,\,2,\,\ldots,\,M##, where ##i_k## is the index that corresponds to the ##k##th largest random variables ##\beta_{(k)}##.

I want to find the CDF of ##\max_{k}\eta_{i_k}##, which is defined as

\text{Pr}\left[\max_{k=1,...,M}\,\eta_{i_k}\leq x\right]

how can I find it? If the random variables ##\{\eta_{i_k}\}## are independent, we can write the above probability as

\prod_{k=1}^N\text{Pr}\left[\eta_{i_k}\leq x\right]

but they are not independent because of the ordered denominator. So, how can it be found then?

Thanks in advance
 
Physics news on Phys.org
The presentation is confusing because it seeks to define ##\eta_{i_k}## to be ##\alpha_{i_k} / \beta_{(k)}## but that doesn't make sense because ##\eta_{i_k}## already has a definition (given that ##i_k## is separately defined above), which is ##\alpha_{i_k} / \beta_{i_k}##.

To avoid this confusion, it is necessary to define a new set of random variables, say ##(\xi_k)_{1\le k\le M}## such that ##\xi_k=\alpha_{i_k} / \beta_{(k)}##.

Then what you are seeking is the CDF of the random variable ##\zeta \triangleq \max_{1\le k\le M} \xi_k##, which is given by:
$$F_\zeta(y) \triangleq Pr[\zeta\le y] = Pr\left[\bigwedge_{1\le k\le M}\xi_k \le y\right]$$



To calculate that you can use conditional probabilities. Provided the distributions are absolutely continuous, we can use the formula from https://en.wikipedia.org/wiki/Order...tics_of_an_absolutely_continuous_distribution

for the joint pdf of ##\beta_{(1)},...,\beta_{(M)}##. We write that as ##f_{\vec\beta}(\vec x)## where ##\vec\beta## and ##\vec x## are the vectors ##(\beta_{(1)},...,\beta_{(M)})## and ##(x_1,...,x_M)##. Then, writing ##d^M\vec x## for the incremental product ##dx_1dx_2...dx_M## and ##A## for the subset of ##\mathbb R^M## such that ##x_1\le x_2\le ...\le x_M##, we have:

\begin{align*}
F_\zeta(y)
% 1
&=Pr[\zeta \le y]
% 2
\\&= \int_A Pr[\zeta \le y\ |\ \vec\beta=\vec x]f_{\vec\beta}(\vec x)d^M\vec x
% 3
\\&=
\int_A
Pr\left[\bigwedge_{k=1}^M\xi_k \le y\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
% 4
\\&=
\int_A
\prod_{k=1}^M Pr\left[\xi_k \le y\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
\end{align*}
since the ##\xi_k## are conditionally independent once ##\vec\beta## is fixed by the outer integral
\begin{align*}
% 5
\\
\quad\quad\quad\quad
&=
\int_A
\prod_{k=1}^M Pr\left[\alpha_k \le x_ky\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
% 6
\\&=
\int_A
\prod_{k=1}^M f_\alpha(x_ky)
f_{\vec\beta}(\vec x)d^M\vec x
\end{align*}
where we have dropped the conditionality, since the ##\alpha##s are independent of the ##\beta##s

To take it further requires specification of the pdfs of ##\alpha## and ##\beta##. For some distributions, the integral will be simple. For others it will not be possible to do analytically and will require numeric integration.

I just noticed that in the link, the subscript (1) is used for the smallest order statistic, whereas you have used it for the largest one. So the formula from the link will need a little adaptation.
 
  • Like
Likes   Reactions: EngWiPy
Thanks. You are right. I need to define a new random variable for ##\alpha_{i_k}/\beta_{(k)}##. I continued the analysis using the same idea yesterday, namely, using conditional independence. The CDF of ##\zeta## (according to your notation) is

\text{Pr}\left[\zeta\leq y\right]=\text{Pr}\left[\zeta_1\leq y,\,\zeta_2\leq y,\ldots,\,\zeta_M\leq y\right]

which can be written as (I like to detail the notations. It is more clear to me. I should write the conditions as ##\beta_{(k)}=x_k##, but to save some time I didn't do that)

\text{E}_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left[\text{Pr}\left[\zeta_1\leq y,\,\zeta_2\leq y,\ldots,\,\zeta_M\leq y\right]\Big|\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right] \\= \text{E}_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left[\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]\Big|\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right] \\= \int_{\beta_{(1)}}\int_{\beta_{(2)}}\cdots\int_{\beta_{(M)}}\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]f_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left(\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right)\,d\beta_{(1)}d\beta_{(2)}\cdots d\beta_{(M)}

I found that the joint PDF for ##\beta_{(1)}\geq\beta_{(2)}\geq\cdots\geq\beta_{(M)}\geq 0## can be written as

f_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left(\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right)=M!{K\choose M}\left[F_{\beta}(\beta_{(M)})\right]^{K-M}\prod_{k=1}^M f_{\beta}(\beta_{(k)})

where ##F_{\beta}(.)## and ##f_{\beta}(.)## are the CDF and PDF of the unordered random variables ##\{\beta_k\}##. So, the CDF in question becomes

\text{Pr}\left[\zeta\leq y\right]=M!{K\choose M}\int_{\beta_{(1)}}\int_{\beta_{(2)}}\cdots\int_{\beta_{(M)}}\left[F_{\beta}(\beta_{(M)})\right]^{K-M}\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]f_{\beta}(\beta_{(k)})\,d\beta_{(k)}

Now suppose these CDF and PDF of the unordered random variables ##\{\beta_k\}## are given by ##F_{\beta}(x)=1-e^{-x}## and ##f_{\beta}(x)=e^{-x}##. How can we proceed? I would like to find a closed form solution to the above integral. Is that possible?

Thanks
 
Last edited:
I have not checked the implementation of the cdfs for the ##\beta## order statistics but, taking that as accepted, here are a few observations:
  • The integration variables should not be ##\beta##s because those are random variables and integration needs to be done with respect to ordinary real variables (unless it is a Stieltjes integral but in that case the pdf(s) for the ##\beta##s should not be in the integrand). That is why I used ##x_1,...,x_M## above. Each ##\beta_{(*)}## in your final formula should be replaced by ##x_*##.
  • Given that you have written it as nested scalar integrals rather than a vector integral (per my post), the differential part at the end needs to have one entry per integral. So
    • where it is written ##d\beta_{(k)}## it should have ##dx_M dx_{M-1}...dx_1##; and
    • the integration limits are ##\int_{-\infty}^\infty \int_{-\infty}^{x_1}...\int_{-\infty}^{x_{M-1}}##
  • It looks to me like the probability inside the iterated product was supposed to be conditional on ##\beta_{(k)}=x_k##. Assuming that to be so, the item currently written as ##Pr[\zeta_k<y]## is really ##Pr[\zeta_k\le y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k/x_k\le y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k\le x_k y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k\le x_k y] = F_\alpha(x_ky)##. So at that point you need to introduce a cdf for ##\alpha##.
At this point there seems no reason to give up hope of an analytic solution existing. Whether it does will depend on the choice of distribution of ##\alpha##.
 
Yes, I think your presentation is more accurate. The CDF of ##\alpha## is also exponential, i.e., ##F_{\alpha}(y)=1-e^{-y}##. The problem I think is how to evaluate the integrals because of their limits. We cannot separate the integrals as a product of single integral because all depends on each other.
 
I can't be sure without doing the calcs but my expectation is that, because of the nice nature of the distributions of ##\alpha,\beta##, the integral can be done analytically. Look at the non-constant factors in the integrand. They are all of the form ##e^{-x_j}, e^{-x_jy},(1-e^{-x_j})## or ##(1-e^{-x_jy})##. When we expand out the integrand using the distributive rule (use the binomial theorem to expand the first factor, which is ##(1-e^{-x_M})^{K-M}##), we get a linear combination of terms, each of which has the form:

$$a\times \exp\left(-\left[
\sum_{k=1}^M x_k +
s x_M +
y \sum_{k\in A} x_k
\right]\right)$$

for some term-specific constant ##a##, term-specific integer ##s## and term-specific subset ##A## of ##\{1,2,...,M\}##.

Each term can be integrated separately and since ##y## is constant in all integrations, the integrations should be straightforward.

Note that, given the choice of distributions, the lower limit of each integral is 0.

I suggest having a go for the case ##K=3,M=2## and seeing if it comes out. My guess is that it will. If it does, it should point the way towards a formula for the general integral.
 
  • Like
Likes   Reactions: EngWiPy
Yes, doing it for a special case should point to some pattern in the nested integrations. Thanks for your replies.
 
I am changing some notations compared to before. but I think this post is self-sufficient.

To simplify things, I assumed that I have two random variables ##\beta_1, \beta_2## and that ##B_{(1)}\geq\beta_{(2)}## are the order statistics. The random variable ##\alpha_k## is selected whenever the random variable ##\beta_k## is selected, and the index corresponding to selecting ##\beta_{(k)}## is ##i_k##, for ##k=1,2##. I assumed that ##\eta_{k}=\alpha_{i_k}/\beta_{(k)}##. I need to find the CDF

\text{Pr}\left[\max_{k=1,2}\eta_k\leq \zeta\right]=\text{Pr}\left[\frac{\alpha_1}{\beta_{(1)}}\leq \zeta,\,\frac{\alpha_2}{\beta_{(2)}}\leq \zeta\right]

which can be written as

\int_0^{\infty}\int_0^{x_1}\text{Pr}\left[\frac{\alpha_1}{x_1}\leq \zeta,\,\frac{\alpha_2}{x_2}\leq \zeta\right]f_{\beta_{(1)},\beta_{(2)}}(x_1,\,x_2)\,dx_1,\,dx_2\\=2\int_0^{\infty}\int_0^{x_1}\prod_{k=1}^2\text{Pr}\left[\frac{\alpha_k}{x_k}\leq \zeta\right]f_{\beta}(x_k)\,dx_k\\=2\int_0^{\infty}\left[\text{Pr}\left[\frac{\alpha_1}{x_1}\leq \zeta\right]f_{\beta}(x_1)\int_0^{x_1}\text{Pr}\left[\frac{\alpha_2}{x_2}\leq \zeta\right]f_{\beta}(x_2)\,dx_2\right]\,dx_1

I assume that all random variables are independent and identically distributed exponential random variables with parameter 1. So, the above integral can be written as

2\int_0^{\infty}\left[\left(1-e^{-x_1\zeta}\right)e^{-x_1}\underbrace{\int_0^{x_1}\left(1-e^{-x_2\zeta}\right)e^{-x_2}\,dx_2}_{-e^{-x_1}+1+\frac{e^{-x_1[\zeta+1]}-1}{\zeta+1}}\right]\,dx_1

Although the above integral is easy to evaluate, considering the general case of ##K## random variables, things appear to get messy using this approach. Is there another way using which I can find the CDF?
 
In this reference, Tests of significance for samples of the χ2 population with two degrees of freedom, the author showed that the order statistics can be transformed to be independent random variables, but didn't quite understand it, and if I can use it.
 

Similar threads

  • · Replies 35 ·
2
Replies
35
Views
5K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 10 ·
Replies
10
Views
6K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K