Undergrad Finding CDF of Maximum of Order Statistics of Random Variables

EngWiPy · May 8, 2018

Suppose that I have these random variables ##\eta_k=\alpha_k/\beta_k## for ##k=1,\,2,\,\ldots,\,K##, where ##\{\alpha_k,\,\beta_k\}## are i.i.d. random variables. Now suppose that I select ##M\leq K## random variables such that denominators are the largest ##M## random variables. That is, suppose that ##\beta_{(1)}\geq\beta_{(2)}\geq\cdots\geq \beta_{(K)}##. Then the resulting composite random variables are ##\eta_{i_k}=\alpha_{i_k}/\beta_{(k)}##, for ##k=1,\,2,\,\ldots,\,M##, where ##i_k## is the index that corresponds to the ##k##th largest random variables ##\beta_{(k)}##.

I want to find the CDF of ##\max_{k}\eta_{i_k}##, which is defined as

\text{Pr}\left[\max_{k=1,...,M}\,\eta_{i_k}\leq x\right]

how can I find it? If the random variables ##\{\eta_{i_k}\}## are independent, we can write the above probability as

\prod_{k=1}^N\text{Pr}\left[\eta_{i_k}\leq x\right]

but they are not independent because of the ordered denominator. So, how can it be found then?

Thanks in advance

andrewkirk · May 9, 2018

The presentation is confusing because it seeks to define ##\eta_{i_k}## to be ##\alpha_{i_k} / \beta_{(k)}## but that doesn't make sense because ##\eta_{i_k}## already has a definition (given that ##i_k## is separately defined above), which is ##\alpha_{i_k} / \beta_{i_k}##.

To avoid this confusion, it is necessary to define a new set of random variables, say ##(\xi_k)_{1\le k\le M}## such that ##\xi_k=\alpha_{i_k} / \beta_{(k)}##.

Then what you are seeking is the CDF of the random variable ##\zeta \triangleq \max_{1\le k\le M} \xi_k##, which is given by:
$$F_\zeta(y) \triangleq Pr[\zeta\le y] = Pr\left[\bigwedge_{1\le k\le M}\xi_k \le y\right]$$

To calculate that you can use conditional probabilities. Provided the distributions are absolutely continuous, we can use the formula from https://en.wikipedia.org/wiki/Order...tics_of_an_absolutely_continuous_distribution

for the joint pdf of ##\beta_{(1)},...,\beta_{(M)}##. We write that as ##f_{\vec\beta}(\vec x)## where ##\vec\beta## and ##\vec x## are the vectors ##(\beta_{(1)},...,\beta_{(M)})## and ##(x_1,...,x_M)##. Then, writing ##d^M\vec x## for the incremental product ##dx_1dx_2...dx_M## and ##A## for the subset of ##\mathbb R^M## such that ##x_1\le x_2\le ...\le x_M##, we have:

\begin{align*}
F_\zeta(y)
% 1
&=Pr[\zeta \le y]
% 2
\\&= \int_A Pr[\zeta \le y\ |\ \vec\beta=\vec x]f_{\vec\beta}(\vec x)d^M\vec x
% 3
\\&=
\int_A
Pr\left[\bigwedge_{k=1}^M\xi_k \le y\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
% 4
\\&=
\int_A
\prod_{k=1}^M Pr\left[\xi_k \le y\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
\end{align*}
since the ##\xi_k## are conditionally independent once ##\vec\beta## is fixed by the outer integral
\begin{align*}
% 5
\\
\quad\quad\quad\quad
&=
\int_A
\prod_{k=1}^M Pr\left[\alpha_k \le x_ky\ \middle|\ \vec\beta=\vec x\right]
f_{\vec\beta}(\vec x)d^M\vec x
% 6
\\&=
\int_A
\prod_{k=1}^M f_\alpha(x_ky)
f_{\vec\beta}(\vec x)d^M\vec x
\end{align*}
where we have dropped the conditionality, since the ##\alpha##s are independent of the ##\beta##s

To take it further requires specification of the pdfs of ##\alpha## and ##\beta##. For some distributions, the integral will be simple. For others it will not be possible to do analytically and will require numeric integration.

I just noticed that in the link, the subscript (1) is used for the smallest order statistic, whereas you have used it for the largest one. So the formula from the link will need a little adaptation.

EngWiPy · May 9, 2018

Thanks. You are right. I need to define a new random variable for ##\alpha_{i_k}/\beta_{(k)}##. I continued the analysis using the same idea yesterday, namely, using conditional independence. The CDF of ##\zeta## (according to your notation) is

\text{Pr}\left[\zeta\leq y\right]=\text{Pr}\left[\zeta_1\leq y,\,\zeta_2\leq y,\ldots,\,\zeta_M\leq y\right]

which can be written as (I like to detail the notations. It is more clear to me. I should write the conditions as ##\beta_{(k)}=x_k##, but to save some time I didn't do that)

\text{E}_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left[\text{Pr}\left[\zeta_1\leq y,\,\zeta_2\leq y,\ldots,\,\zeta_M\leq y\right]\Big|\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right] \\= \text{E}_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left[\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]\Big|\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right] \\= \int_{\beta_{(1)}}\int_{\beta_{(2)}}\cdots\int_{\beta_{(M)}}\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]f_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left(\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right)\,d\beta_{(1)}d\beta_{(2)}\cdots d\beta_{(M)}

I found that the joint PDF for ##\beta_{(1)}\geq\beta_{(2)}\geq\cdots\geq\beta_{(M)}\geq 0## can be written as

f_{\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}}\left(\beta_{(1)},\,\beta_{(2)},\ldots,\,\beta_{(M)}\right)=M!{K\choose M}\left[F_{\beta}(\beta_{(M)})\right]^{K-M}\prod_{k=1}^M f_{\beta}(\beta_{(k)})

where ##F_{\beta}(.)## and ##f_{\beta}(.)## are the CDF and PDF of the unordered random variables ##\{\beta_k\}##. So, the CDF in question becomes

\text{Pr}\left[\zeta\leq y\right]=M!{K\choose M}\int_{\beta_{(1)}}\int_{\beta_{(2)}}\cdots\int_{\beta_{(M)}}\left[F_{\beta}(\beta_{(M)})\right]^{K-M}\prod_{k=1}^M\text{Pr}\left[\zeta_k\leq y\right]f_{\beta}(\beta_{(k)})\,d\beta_{(k)}

Now suppose these CDF and PDF of the unordered random variables ##\{\beta_k\}## are given by ##F_{\beta}(x)=1-e^{-x}## and ##f_{\beta}(x)=e^{-x}##. How can we proceed? I would like to find a closed form solution to the above integral. Is that possible?

Thanks

andrewkirk · May 9, 2018

I have not checked the implementation of the cdfs for the ##\beta## order statistics but, taking that as accepted, here are a few observations:

The integration variables should not be ##\beta##s because those are random variables and integration needs to be done with respect to ordinary real variables (unless it is a Stieltjes integral but in that case the pdf(s) for the ##\beta##s should not be in the integrand). That is why I used ##x_1,...,x_M## above. Each ##\beta_{(*)}## in your final formula should be replaced by ##x_*##.
Given that you have written it as nested scalar integrals rather than a vector integral (per my post), the differential part at the end needs to have one entry per integral. So
- where it is written ##d\beta_{(k)}## it should have ##dx_M dx_{M-1}...dx_1##; and
- the integration limits are ##\int_{-\infty}^\infty \int_{-\infty}^{x_1}...\int_{-\infty}^{x_{M-1}}##
It looks to me like the probability inside the iterated product was supposed to be conditional on ##\beta_{(k)}=x_k##. Assuming that to be so, the item currently written as ##Pr[\zeta_k<y]## is really ##Pr[\zeta_k\le y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k/x_k\le y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k\le x_k y\ |\ \beta_{(k)}=x_k] = Pr[\alpha_k\le x_k y] = F_\alpha(x_ky)##. So at that point you need to introduce a cdf for ##\alpha##.

At this point there seems no reason to give up hope of an analytic solution existing. Whether it does will depend on the choice of distribution of ##\alpha##.

EngWiPy · May 10, 2018

Yes, I think your presentation is more accurate. The CDF of ##\alpha## is also exponential, i.e., ##F_{\alpha}(y)=1-e^{-y}##. The problem I think is how to evaluate the integrals because of their limits. We cannot separate the integrals as a product of single integral because all depends on each other.

andrewkirk · May 10, 2018

I can't be sure without doing the calcs but my expectation is that, because of the nice nature of the distributions of ##\alpha,\beta##, the integral can be done analytically. Look at the non-constant factors in the integrand. They are all of the form ##e^{-x_j}, e^{-x_jy},(1-e^{-x_j})## or ##(1-e^{-x_jy})##. When we expand out the integrand using the distributive rule (use the binomial theorem to expand the first factor, which is ##(1-e^{-x_M})^{K-M}##), we get a linear combination of terms, each of which has the form:

$$a\times \exp\left(-\left[
\sum_{k=1}^M x_k +
s x_M +
y \sum_{k\in A} x_k
\right]\right)$$

for some term-specific constant ##a##, term-specific integer ##s## and term-specific subset ##A## of ##\{1,2,...,M\}##.

Each term can be integrated separately and since ##y## is constant in all integrations, the integrations should be straightforward.

Note that, given the choice of distributions, the lower limit of each integral is 0.

I suggest having a go for the case ##K=3,M=2## and seeing if it comes out. My guess is that it will. If it does, it should point the way towards a formula for the general integral.

EngWiPy · May 10, 2018

Yes, doing it for a special case should point to some pattern in the nested integrations. Thanks for your replies.

EngWiPy · Jun 28, 2018

I am changing some notations compared to before. but I think this post is self-sufficient.

To simplify things, I assumed that I have two random variables ##\beta_1, \beta_2## and that ##B_{(1)}\geq\beta_{(2)}## are the order statistics. The random variable ##\alpha_k## is selected whenever the random variable ##\beta_k## is selected, and the index corresponding to selecting ##\beta_{(k)}## is ##i_k##, for ##k=1,2##. I assumed that ##\eta_{k}=\alpha_{i_k}/\beta_{(k)}##. I need to find the CDF

\text{Pr}\left[\max_{k=1,2}\eta_k\leq \zeta\right]=\text{Pr}\left[\frac{\alpha_1}{\beta_{(1)}}\leq \zeta,\,\frac{\alpha_2}{\beta_{(2)}}\leq \zeta\right]

which can be written as

\int_0^{\infty}\int_0^{x_1}\text{Pr}\left[\frac{\alpha_1}{x_1}\leq \zeta,\,\frac{\alpha_2}{x_2}\leq \zeta\right]f_{\beta_{(1)},\beta_{(2)}}(x_1,\,x_2)\,dx_1,\,dx_2\\=2\int_0^{\infty}\int_0^{x_1}\prod_{k=1}^2\text{Pr}\left[\frac{\alpha_k}{x_k}\leq \zeta\right]f_{\beta}(x_k)\,dx_k\\=2\int_0^{\infty}\left[\text{Pr}\left[\frac{\alpha_1}{x_1}\leq \zeta\right]f_{\beta}(x_1)\int_0^{x_1}\text{Pr}\left[\frac{\alpha_2}{x_2}\leq \zeta\right]f_{\beta}(x_2)\,dx_2\right]\,dx_1

I assume that all random variables are independent and identically distributed exponential random variables with parameter 1. So, the above integral can be written as

2\int_0^{\infty}\left[\left(1-e^{-x_1\zeta}\right)e^{-x_1}\underbrace{\int_0^{x_1}\left(1-e^{-x_2\zeta}\right)e^{-x_2}\,dx_2}_{-e^{-x_1}+1+\frac{e^{-x_1[\zeta+1]}-1}{\zeta+1}}\right]\,dx_1

Although the above integral is easy to evaluate, considering the general case of ##K## random variables, things appear to get messy using this approach. Is there another way using which I can find the CDF?

EngWiPy · Jun 28, 2018

In this reference, Tests of significance for samples of the χ2 population with two degrees of freedom, the author showed that the order statistics can be transformed to be independent random variables, but didn't quite understand it, and if I can use it.

Undergrad Finding CDF of Maximum of Order Statistics of Random Variables

Similar threads

Undergrad A variant of the Monty Hall problem

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers