Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

I Order Statistics: CDF

  1. Jul 13, 2018 #1
    Suppose I have the random variables ##Z_k=X_k/Y_k## with a PDF ##f_{Z_k}(z_k)## for ##k=1,\,2\,\ldots, K##, where ##\{X_k, Y_k\}## are i.i.d. random variables. I can find

    [tex]\text{Pr}\left[\sum_{i=1}^3Z_k\leq \eta\right][/tex]

    as

    [tex]\int_{z_3=0}^{\eta}\int_{z_2=0}^{\eta-z_3}\int_{z_1=0}^{\eta-z_3-z_2}f_{Z_1}(z_1)f_{Z_2}(z_2)f_{Z_3}(z_3)\,dz_1dz_2dz_3[/tex]

    Now suppose I arrange the random variables ##\{X_k\}_{k=1}^K## as

    [tex]X_{(1)}\leq X_{(2)}\leq \cdots \leq X_{(K)}[/tex]

    and define ##G_k=X_{(k)}/Y_{i_k}##, where ##i_k## is the index of the ##k^{\text{th}}## order statistic.

    How can I find

    [tex]\text{Pr}\left[\sum_{i=1}^3G_k\leq \eta\right][/tex]
     
  2. jcsd
  3. Jul 13, 2018 #2

    andrewkirk

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    This doesn't make it clear what is identically distributed to what, or what is independent of what.

    The maximal interpretation is that all ##X_k## and all ##Y_k## have the same distribution, with PDF ##f_X##, and that the set of random variables ##\{X_1,...,X_K,Y_1,...,Y_K\}## is pairwise independent. Or it could mean that ##X_k## has the same distribution as ##Y_k## and the two are independent of each other, but the distributions may differ from that of ##X_j## for ##j\neq k##, and there may be dependences between ##X_k## and ##X_j##. There are a variety of possible interpretations.

    I will assume that the set of all ##2K## random variables is pairwise independent, and that all ##X_k## have the same PDF ##f_X## and all the ##Y_k## have the same PDF ##f_Y##, but the two PDFs may differ from one another. I will also assume that the support of all the random variables is the positive reals, which is implied by the integral you have written.

    With those assumptions, the distribution of ##X_{(k)}/Y_{i_k}## is the same as that of ##X_{(k)}/Y_{j}## for any ##j##. Then the CDF of ##G_k## satisfies:

    $$ F_{G_k}(a) = \int_0^\infty\int_0^{\eta a}f_{X_{(k)}}(x) \,dx\,f_Y(y)\,dy$$

    The PDF ##f_{X_{(k)}}## of the ##k##th order statistic of ##X## can be calculated using the formula here.

    The PDF ##f_{G_k}## of ##G_k## is obtained by differentiating the above expression.
    Having obtained that, we calculate the CDF of ##G_1+G_2+G_3## using the triple integral you wrote above, but replacing each ##f_{Z_k}## by ##f_{G_k}##.
     
  4. Jul 13, 2018 #3
    What I meant by ##\{X_k, Y_k\}## of being i.i.d. is that every single random variable is independent of all other random variables, and all have the same PDF distribution. I could have written that the random variables ##\{X_k\}_{k=1}^{2K}## are i.i.d., but because I need to separate the nominator and denominator, I used different symbols. These random variables are exponential random variables, so yes, the support is positive real. I wasn't clear about that. My mistake.


    OK, so we can average over each ##G_k## as if the random variables are independent with no ordering, but what about the limits? I mean, ##G_3=\frac{X_{(3)}}{Y_3}## has the limits ##0## to ##\eta##, but ##G_2## contains ##X_{(2)}## which is less than or equal ##X_{(3)}## in ##G_3##. Or this doesn't affect the evaluation?
     
  5. Jul 14, 2018 #4

    andrewkirk

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    You are right to point out that the new ##G_k##s are not mutually independent, because they are based on order statistics. So we have to change the integrand in your triple integral, because it is based on the Zs being independent.

    Instead we'll have to do (take a deep breath!):
    \begin{align*}
    \text{Pr}&\left[\sum_{i=1}^3G_k\leq \eta\right]
    =
    \text{Pr}\left[\frac{X_{(1)}}{Y_1}+\frac{X_{(2)}}{Y_2}+\frac{X_{(3)}}{Y_3}\leq \eta\right]
    \\
    &=
    \int_{y_1=0}^\infty \int_{x_1=0}^{y_1\eta}
    \int_{y_2=0}^\infty \int_{x_2=0}^{y_2(\eta - x_1/y_1)}
    \int_{y_3=0}^\infty \int_{x_3=0}^{y_3(\eta - x_1/y_1-x_2/y_2)}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    f_Y(y_1) f_Y(y_2) f_Y(y_3)
    f_{X_{(1)},X_{(2)},X_{(3)}}(x_1,x_2,x_3)
    \,dx_3\,dy_3\,dx_2\,dy_2\,dx_1\,dy_1
    \\
    &=
    \int_{y_1=0}^\infty f_Y(y_1) \int_{x_1=0}^{y_1\eta}
    \int_{y_2=0}^\infty f_Y(y_2) \int_{x_2=0}^{y_2(\eta - x_1/y_1)}
    \int_{y_3=0}^\infty f_Y(y_3) \int_{x_3=0}^{y_3(\eta - x_1/y_1-x_2/y_2)}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    f_{X_{(1)},X_{(2)},X_{(3)}}(x_1,x_2,x_3)
    \,dx_3\,dy_3\,dx_2\,dy_2\,dx_1\,dy_1
    \end{align*}

    It looks horrible, but it should simplify down if all your distributions are exponentials.

    You can get the three-way joint probability function for the Xs from the wiki section I linked above. The formula is on the last line in the section, with ##n=3##.

    EDIT: Actually the formulas in that section don't give exactly what we want, which is the joint density for the three smallest order stats, but we can infer a formula using the same principles. It should be:

    $$\frac{K!}{(K-3)!}f_X(x_1)f_X(x_2)f_X(x_3) \left[1-F_X(x_3)\right]^{K-3}$$

    where ##x_1\le x_2\le x_3##, otherwise zero. To enforce that inequality condition we need to change the limits for the X integrals, to get:
    \begin{align*}
    \text{Pr}&\left[\sum_{i=1}^3G_k\leq \eta\right]
    =
    \int_{y_1=0}^\infty \int_{x_1=0}^{y_1\eta}
    \int_{y_2=0}^\infty \int_{x_2=x_1}^{\max(x_1,y_2(\eta - x_1/y_1))}
    \int_{y_3=0}^\infty \int_{x_3=x_2}^{\max(x_2,y_3(\eta - x_1/y_1-x_2/y_2))}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \frac{K!}{(K-3)!}
    f_Y(y_1)f_Y(y_2)f_Y(y_3)
    f_X(x_1)f_X(x_2)f_X(x_3) \left[1-F_X(x_3)\right]^{K-3}
    \,dx_3\,dy_3\,dx_2\,dy_2\,dx_1\,dy_1
    \\&=
    \frac{K!}{(K-3)!}
    \int_{y_1=0}^\infty f_Y(y_1) \int_{x_1=0}^{y_1\eta} f_X(x_1)
    \int_{y_2=0}^\infty f_Y(y_2) \int_{x_2=x_1}^{\max(x_1,y_2(\eta - x_1/y_1))} f_X(x_2)
    \int_{y_3=0}^\infty f_Y(y_3) \int_{x_3=x_2}^{\max(x_2,y_3(\eta - x_1/y_1-x_2/y_2))}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    f_X(x_3) \left[1-F_X(x_3)\right]^{K-3}
    \,dx_3\,dy_3\,dx_2\,dy_2\,dx_1\,dy_1
    \end{align*}
     
    Last edited: Jul 14, 2018
  6. Jul 14, 2018 #5
    Thanks. As expected. The result is nasty. Could you explain how you arrived to the integrals ( and their limits) in a little more detail?

    Another but related question, can this CDF be evaluated numerically using any software tool for any value ##1\leq M\leq K## instead of 3?
     
  7. Jul 14, 2018 #6

    andrewkirk

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    If all the distributions are exponentials, I think there's a reasonable chance you could evaluate this analytically, starting from the inside and working out. It's possible the max-es in the upper limits may throw a spanner in the works, but it's worth a try. If not then a good mathematical software tool should be able to numerically integrate it. If it complains about the six-deep nesting, it shouldn't be too hard to write a little program to do it, in something like Matlab, Python or R, assuming the exponential parameter and the value of ##\eta## are both known.

    If we change 3 to some other M in 1...K, the same principles apply, but instead of six-deep nesting of integrals it will be 2M-deep! Maybe there's some formula that can reduce the number of integrals, but I can't see one from here.

    Setting ##x_0=0## to make the pattern of limits clearer, the formula would be:

    \begin{align*}
    \text{Pr}&\left[\sum_{i=1}^M G_k\leq \eta\right]
    =
    \int_{y_1=0}^\infty \int_{x_1=x_0}^{\max\left(x_0,\,y_1\left(\eta - \sum_{k=1}^0 x_k/y_k\right)\right)}
    \int_{y_2=0}^\infty \int_{x_2=x_1}^{\max\left(x_1,\,y_2\left(\eta - \sum_{k=1}^1 x_k/y_k\right)\right)}
    \\
    &\quad\quad\quad\quad\quad\quad\quad\quad\quad...
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \int_{y_M=0}^\infty \int_{x_M=x_{M-1}}^{\max\left(x_{M-1},\,y_M\left(\eta - \sum_{k=1}^{M-1} x_k/y_k\right)\right)}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \frac{K!}{(K-M)!}
    f_Y(y_1)f_Y(y_2)...f_Y(y_M)
    f_X(x_1)f_X(x_2)...f_X(x_M) \left[1-F_X(x_M)\right]^{K-M}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \,dx_M\,dy_M...\,dx_2\,dy_2\,dx_1\,dy_1
    %
    \\&=
    \frac{K!}{(K-M)!}
    \int_{y_1=0}^\infty f_Y(y_1)
    \int_{x_1=0}^{\max\left(x_0,\,y_1\left(\eta - \sum_{k=1}^0 x_k/y_k\right)\right)} f_X(x_1)
    \int_{y_2=0}^\infty f_Y(y_2)
    \int_{x_2=x_1}^{\max\left(x_1,\,y_2\left(\eta - \sum_{k=1}^1 x_k/y_k\right)\right)} f_X(x_2)
    \\
    &\quad\quad\quad\quad\quad\quad\quad\quad\quad...
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \int_{y_M=0}^\infty f_Y(y_M)
    \int_{x_M=x_{M-1}}^{\max\left(x_{M-1},\,y_M\left(\eta - \sum_{k=1}^{M-1} x_k/y_k\right)\right)} f_X(x_M)
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \left[1-F_X(x_M)\right]^{K-M}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \,dx_M\,dy_M...\,dx_2\,dy_2\,dx_1\,dy_1
    \end{align*}

    The complexity of the integral is nearly all in the limits. In the centre is just the joint PDF of all six variables. That is the product of three PDFs for the three independent Y random variable and the joint PDF for the three Xs. The latter is converted to a product of three independent X PDFs, with a combinatoric coefficient because of the different possible orderings, and the factor ##\left[1-F_X(x_M)\right]^{K-M}## that ensures that all the other ##M-K## X values are greater than ##X_{(M)}##.

    The integration limits are set up from the outside in. The range of the Y variables is unconstrained. But each X variable must have an upper limit to ensure that it doesn't cause the sum of all the Gs covered by the integral signs thus far to exceed ##\eta##. Also, each X has to have a lower limit of the previous X value, to protect the required ordering of the order statistics, and that requires an adjustment to the upper limit as well, so that if the lower limit is greater than the upper limit, we get zero rather than a negative number from a backwards integral, which is a possible interpretation when the lower integral limit exceeds the upper one.
     
  8. Jul 14, 2018 #7
    That makes sense. The complexity of the mathematical expression is not an issue. Eventually, I need to evaluate the CDF. It doesn't matter how (except not using Monte-Carlo simulation, because I use it for verification). Of course, if a numerical solution is possible, that is great, because it would be easier to evaluate, but numerical integration is just fine. The analytical solution doesn't have a pattern in the case of no ordering, so I suspect it would have with ordering. I will try this again anyway.

    Could you give me a hint on how to write a program in either MATLAB or Python to evaluate these ##2M## nested integrals? In MATLAB I think you cannot go for more than 2 or maybe three nested integrals. I need a code that can be used for any value of M.
     
  9. Jul 14, 2018 #8

    andrewkirk

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    For the general case of M rather than 3, the depth of nesting is unknown, so it's easiest to use recursion.
    I wouldn't normally do this but, since I'm learning Python, I wrote the following Python code for practice. It is untested but should do the trick if you enter the necessary PDFs for the X and Y distributions in the function skeletons below.

    The integration is bog-basic rectangular integration. If you want you can sex it up to trapezoidal or even Simpson integration for a more accurate result. But unless it takes a long time to run, it's probably easier to keep it rectangular and just increase the granularity.

    Code (Text):
    granularity = 1000
    lim_y = 100
    M = 10 # order statistic
    K = 30 # number of Xs (also the number of Ys)

    def pdf_y(y)
      # define pdf function for y here

    def pdf_x(x)
      # define pdf function for x here

    def cdf_x(x)
      # define cdf function for x here

    def permut(n, k)
      # calc number of permutations of k chosen from n
      perm = n
      j = 1
      while j < k  
        perm *= n - j
        j += 1
      return perm
    # end of permutation function

    def integ2(eta, x_prev, sum_x, m, k)
      y = 0
      integ_y = 0
      # integrate across this y_(M-m)
      while y < lim_y
        y += dy
        lim_x = max(x_prev, y * (eta - sum_x))
        dx = (limx - x_prev) / granularity
        x = x_prev
        integ_x = 0
        # integrate across this x_(M-m)
        while x < lim_x
           x += dx
           if m > 0
             integrand = integ2(eta, x, sum_x + x / y, m - 1, k - 1))
           else
             integrand = (1 - cdf_x(x)) ^ k
           integ_x += integrand * pdf_x(x) * dx
        integ_y += integ_x * pdf_y(y) * dy
      return integ_y
    # end of definition of function integ2

    dy = lim_y / granularity
    # call the function to find the probability
    prob = permut(K, M) * integ2(eta, 0, 0, M - 1, K - 1)
    print(prob)
     
     
  10. Jul 15, 2018 #9
    I will need to go over the math and the code in more details, and compare the results with those of Monte-Carlo simulations at the end. I will report back. Thanks for your responses. They were very helpful.
     
  11. Aug 11, 2018 #10
    @andrewkirk Apparently, for ##M\geq 3## the CDF is very complicated to be evaluated numerically, and simulations has to be done. But simulations isn't an efficient way to evaluate nested interdependent integrals. So, I am trying to evaluate a closed-form solution for the special case of ##M=2##. In particular, I want to evaluate the following CDF (note that I took another way of evaluating the integrals, only to make the approach consistent of some other work I am doing. I think both are essentially the same.)

    [tex]F_Z(z) = \text{Pr}\left[\frac{X_{(1)}}{Y_1}+\frac{X_{(2)}}{Y_2}\leq z\right] = \int_{y_2}^{\infty}\int_{x_2=0}^{y_2z}\int_{y_2=0}^{\infty}\int_{x_1=0}^{\min\left(x_2,\,y_1(z-\frac{x_2}{y_2})\right)}f_{(1).(2)}(x_1,x_2)f_Y(y_1)f_Y(y_2)\,dx_1\,dx_2\,dy_1\,dy_2[/tex]

    where ##f_{(1),(2)}(x_1,x_2)## is the joint PDF of the order statistics ##X_{(1)}## and ##X_{(2)}##, and ##f_Y(y_k)## is the PDF of the random variable ##Y_k##.

    The joint PDF ##f_{(1),(2)}(x_1,x_2)## can be written as (all random variables are i.i.d. exponential RVs with parameter 1)

    [tex]f_{(1),(2)}(x_1,x_2) = K(K-1)e^{-x_1}e^{-(K-1)x_2}[/tex]

    Thus the original CDF can be written as

    [tex]F_Z(z) = K(K-1)\int_{y_2}^{\infty}\int_{x_2=0}^{y_2z}\int_{y_2=0}^{\infty}\int_{x_1=0}^{\min\left(x_2,\,y_1(z-\frac{x_2}{y_2})\right)}e^{-x_1}e^{-(K-1)x_2}e^{-y_1}e^{-y_2}\,dx_1\,dx_2\,dy_1\,dy_2[/tex]

    The inner most integral can be evaluate as

    [tex]\int_{x_1=0}^{\min\left(x_2,\,y_1(z-\frac{x_2}{y_2})\right)}e^{-x_1}\,dx_1 = 1-\exp\left(-\min\left(x_2,\,y_1(z-\frac{x_2}{y_2})\right)\right)[/tex]

    So

    [tex]F_Z(z) = K(K-1)\left[I_1(z) - I_2(z)\right][/tex]

    ##I_1(z)## can be evaluated easily, but I got stuck in evaluating ##I_2## which is

    [tex]I_2(z)=\int_{y_2}^{\infty}\int_{x_2=0}^{y_2z}\int_{y_1=0}^{\infty}\exp\left(-\min\left(x_2,\,y_1(z-\frac{x_2}{y_2})\right)\right)e^{-(K-1)x_2}e^{-y_1}e^{-y_2}\,dx_2\,dy_1\,dy_2[/tex]

    I divided the integrals as follows

    [tex]I_2(z) = \int_{y_2}^{\infty}\int_{y_1}^{\infty}\int_{x_2=0}^{\frac{zy_1y_2}{y_1+y_2}}e^{-Kx_2}e^{-y_1}e^{-y_2}\,dx_2\,dy_1\,dy_2+ \int_{y_2}^{\infty}\int_{y_1}^{\infty}\int_{x_2=\frac{zy_1y_2}{y_1+y_2}}^{y_2z}e^{-y_1(z-\frac{x_2}{y_2}+1)}e^{-(K-1)x_2}e^{-y_2}\,dx_2\,dy_1\,dy_2[/tex]

    The limit ##\frac{zy_1y_2}{y_1+y_2}## makes the outer integrals difficult to evaluate. Is there another way or any tricks I can do to evaluate the original CDF in closed-form?
     
    Last edited: Aug 11, 2018
  12. Aug 11, 2018 #11

    andrewkirk

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    I am not optimistic of being able to get a nice analytic formula (closed-form), because of the complicated, inter-related limits. But I think we can get a reasonable algorithm for numerical integration, using the same technique as we discussed recently for doing nested integrals by iteration from the inside out.

    It is based on the formulas I wrote above:

    \begin{align*}
    \text{Pr}&\left[\sum_{i=1}^M G_k\leq \eta\right]
    =
    \int_{y_1=0}^\infty \int_{x_1=x_0}^{\max\left(x_0,\,y_1\left(\eta - \sum_{k=1}^0 x_k/y_k\right)\right)}
    \int_{y_2=0}^\infty \int_{x_2=x_1}^{\max\left(x_1,\,y_2\left(\eta - \sum_{k=1}^1 x_k/y_k\right)\right)}
    \\
    &\quad\quad\quad\quad\quad\quad\quad\quad\quad...
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \int_{y_M=0}^\infty \int_{x_M=x_{M-1}}^{\max\left(x_{M-1},\,y_M\left(\eta - \sum_{k=1}^{M-1} x_k/y_k\right)\right)}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \frac{K!}{(K-M)!}
    f_Y(y_1)f_Y(y_2)...f_Y(y_M)
    f_X(x_1)f_X(x_2)...f_X(x_M) \left[1-F_X(x_M)\right]^{K-M}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \,dx_M\,dy_M...\,dx_2\,dy_2\,dx_1\,dy_1

    \\&=
    \frac{K!}{(K-M)!}
    \int_{y_1=0}^\infty f_Y(y_1)
    \int_{x_1=0}^{\max\left(x_0,\,y_1\left(\eta - \sum_{k=1}^0 x_k/y_k\right)\right)} f_X(x_1)
    \int_{y_2=0}^\infty f_Y(y_2)
    \int_{x_2=x_1}^{\max\left(x_1,\,y_2\left(\eta - \sum_{k=1}^1 x_k/y_k\right)\right)} f_X(x_2)
    \\
    &\quad\quad\quad\quad\quad\quad\quad\quad\quad...
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \int_{y_M=0}^\infty f_Y(y_M)
    \int_{x_M=x_{M-1}}^{\max\left(x_{M-1},\,y_M\left(\eta - \sum_{k=1}^{M-1} x_k/y_k\right)\right)} f_X(x_M)
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \left[1-F_X(x_M)\right]^{K-M}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \,dx_M\,dy_M...\,dx_2\,dy_2\,dx_1\,dy_1
    \end{align*}

    For ##k=1,...,M## define functions
    ##I_k^M:\mathbb R^2\to \mathbb R## and ##J_k^M:\mathbb R^3\to \mathbb R## such that:

    ##I_k^M(x_{k-1},s_{k-1}) = \int_0^\infty f_Y(y_k) J_k^M(x_{k-1},s_{k-1}, y_k)\,dy_k##

    ##J_k^M(x_{k-1},s_{k-1}, y_k) =
    \begin{cases}
    0&\mbox{, if }x_{k-1} \ge y_ks_{k-1}\textrm{, else}\\
    \int_{x_{k-1}}^{y_ks_{k-1}} f_X(x_k)\left(1-F_X(x_k)\right)^{K-M}\,dx_k
    &\mbox{, if } k=M \textrm{, else}\\
    \int_{x_{k-1}}^{y_ks_{k-1}} f_X(x_k)
    I_{k+1}^M(x_k, s_{k-1} - x_k/y_k)
    \,dx_k
    &\mbox{, otherwise}
    \end{cases} ##

    Note that ##s_k=s_0 - \sum_{j=1}^k x_k/y_k##.

    Then by suitably constraining and segmenting the range of possible values for ##x, y, s## so that each range contains say ##N## points for ##x,y## and ##L>N## points for ##s##, we can calculate and store these functions as ##N\times L##-shaped tables for the ##I_k^M## and ##N\times L\times N##-shaped tables for the ##J_k^M##. We start with ##J_M^M## and then work outwards, at each stage using the table just determined to do the integration for the new function. For accuracy we should use Simpson's Rule, with an extra trapezoid at the end when the number of segments in the integration region is even.

    This approach will take longer to run than the one considered earlier, having the order of ##N^3L## iterations. But for any given value of ##K## and ##M##, it only needs to be run once, and only the table for ##I_1^M## needs to be retained. Then for any ##\eta\in\mathbb R^+## we can calculate:

    $$\text{Pr}\left[\sum_{i=1}^M G_i\leq \eta\right]
    = \frac{K!}{(K-M)!}I_1^M(0, \eta)$$

    This can be programmed in a language like Python, following the same principles as the earlier example. It will be somewhat more complex because of the two extra dimensions, but eminently doable. I suggest starting with a very coarse grid until it is fully debugged, then running it overnight on a finer grid.
     
    Last edited: Aug 31, 2018
  13. Aug 11, 2018 #12
    Thanks, but I still don't get the implementation of the example you did in the other thread in R and Python without selection/ordering. I tried to start there because it is an easier case, and then come back here. But since I didn't understand the implementation there, I thought may be I can do some analytical solution for the special case ##M=2##. I found an analytic solution when the ordering includes the whole fraction ##X_k/Y_k##. In this case, I just have to consider the whole fraction as a new RV, and thus will have double integral. But when ordering is done on one of the RVs ##X_k## or ##Y_k##, it becomes much more complicated, and you need to consider the individual RVs.

    Without ordering there was ##M-1## nested integrals (which would correspond to selecting ##M## out of ##K## random variables at random, and ##-1## because for the last integral we used the CDF), now there would be ##2M## nested integrals (here we need to consider the joint PDF, and we cannot use the CDFs). I get the math and how to get the limits, but I don't understand the implementation. I do for the recursion, though but it is not efficient.

    So, you think even for ##M=2## there is no analytic solution? In that case, doing a numerical integration for ##M=2## is not as a contribution as for a general ##M##. Probably I should read your other posts again.

    Thanks for your responses.

    PS: Just a side note, wouldn't it be easier if you consider the larger RVs first? In that case the lower limits for the integrals would be 0, and the upper limits would be min instead of max. It probably would make a difference for an analytic solution, but maybe not for a numerical solution. Just a remark.
     
    Last edited: Aug 11, 2018
  14. Aug 12, 2018 #13

    andrewkirk

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    Approaching it analytically with ##M=2## I can get one step further than was reached above, to where there are two integrals left to evaluate. But at that point it looks about to become horribly messy, because the next integration variable ##x_1## appears in both the numerator and denominator of exponents, as well as in the denominator of the integrand. Wolfram doesn't produce anything that looks easy to work with. FWIW, here's my development:

    For ##M=2## and ##X,Y## both exponential with parameter 1, we have:
    %
    \begin{align*}
    \text{Pr}&\left[\sum_{i=1}^M G_k\leq \eta\right]
    =
    \frac{K!}{(K-2)!}
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta} e^{-x_1}
    \int_{y_2=0}^\infty e^{-y_2}
    \int_{x_2=x_{1}}^{\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)} e^{-x_2}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    \left[1-\left(1-e^{-x_2}\right)\right]^{K-2}
    \,dx_2\,dy_2\,dx_1\,dy_1 \\
    % 2
    &=
    K(K-1)
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta} e^{-x_1}
    \int_{y_2=0}^\infty e^{-y_2}
    \int_{x_2=x_{1}}^{\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)} e^{-x_2}
    e^{-x_2(K-2)}
    \,dx_2\,dy_2\,dx_1\,dy_1 \\
    % 3
    &=
    K(K-1)
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta} e^{-x_1}
    \int_{y_2=0}^\infty e^{-y_2}
    \int_{x_2=x_{1}}^{\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)}
    e^{-x_2(K-1)}
    \,dx_2\,dy_2\,dx_1\,dy_1
    \\% 4
    &=
    \frac{K(K-1)}{K-1}
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta} e^{-x_1}
    \int_{y_2=0}^\infty e^{-y_2}
    \left[
    -e^{-x_2(K-1)}
    \right]_{x_2=x_{1}}^{\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)}
    \,dy_2\,dx_1\,dy_1
    %
    \\% 5
    &=
    K
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta} e^{-x_1}
    \int_{y_2=0}^\infty e^{-y_2}
    \left(
    e^{-x_1(K-1)}
    -
    e^{-\left(\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)\right)(K-1)}
    \right)
    \,dy_2\,dx_1\,dy_1
    %
    \\% 6
    &=
    K
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta} e^{-x_1}
    \left(
    e^{-x_1(K-1)}
    -
    \int_{y_2=0}^\infty e^{-y_2}
    e^{-\left(\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)\right)(K-1)}
    \,dy_2 \right)
    \,dx_1\,dy_1
    %
    \\% 7
    &=
    K
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta} e^{-x_1}
    \bigg(
    e^{-x_1(K-1)}
    - \int_{y_2=0}^{\frac{x_1}{\eta - x_1/y_1}} e^{-y_2}
    e^{-x_1(K-1)} \,dy_2
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    - \int_{y_2={\frac{x_1}{\eta - x_1/y_1}}}^\infty e^{-y_2}
    e^{-y_2\left(\eta - x_1/y_1\right)(K-1)} \,dy_2
    \bigg)
    \,dx_1\,dy_1
    %
    \\% 8
    &=
    K
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta} e^{-x_1}
    \bigg(
    e^{-x_1(K-1)}
    +
    e^{-x_1(K-1)}\bigg[e^{-y_2}\bigg]_{y_2=0}^{\frac{x_1}{\eta - x_1/y_1}}
    \\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
    - \int_{y_2={\frac{x_1}{\eta - x_1/y_1}}}^\infty
    e^{-y_2\left( 1 + \left(\eta - x_1/y_1\right)(K-1)\right)} \,dy_2
    \bigg)
    \,dx_1\,dy_1
    %
    \\% 9
    &=
    K
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta} e^{-x_1}
    \bigg(
    e^{-\frac{x_1}{\eta - x_1/y_1}(K-1)}
    + \frac{
    \left[
    e^{-y_2\left( 1 + \left(\eta - x_1/y_1\right)(K-1)\right)}
    \right]_{y_2={\frac{x_1}{\eta - x_1/y_1}}}^\infty
    }{1 + \left(\eta - x_1/y_1\right)(K-1)}
    \bigg)
    \,dx_1\,dy_1
    %
    \\% 10
    &=
    K
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta} e^{-x_1}
    \bigg(
    e^{-\frac{x_1}{\eta - x_1/y_1}(K-1)}
    - \frac{
    e^{-{{{\frac{x_1}{\eta - x_1/y_1}}} }\left( 1 + \left(\eta - x_1/y_1\right)(K-1)\right)}
    }{1 + \left(\eta - x_1/y_1\right)(K-1)}
    \bigg)
    \,dx_1\,dy_1
    %
    \\% 11
    &=
    K
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta}
    \bigg(
    e^{-x_1-\frac{x_1}{\eta - x_1/y_1}(K-1)}
    +
    \frac{
    e^{-\frac{x_1}{\eta - x_1/y_1} - x_1(K-1)}
    }{1 + \left(\eta - x_1/y_1\right)(K-1)}
    \bigg)
    \,dx_1\,dy_1
    %
    \\% 12
    &=
    K
    \int_{y_1=0}^\infty e^{-y_1}
    \int_{x_1=0}^{y_1\eta}
    e^{-\frac{x_1}{\eta - x_1/y_1}(K-1)}
    \bigg(
    e^{-x_1}
    +
    \frac{
    e^{-x_1(K-1)}
    }{1 + \left(\eta - x_1/y_1\right)(K-1)}
    \bigg)
    \,dx_1\,dy_1
    \end{align*}

    That's the point where I usually give up and take the numeric approach.
     
  15. Aug 12, 2018 #14
    I reached to almost a similar result. You divided the integral over ##y_2##, while I divided the integral over ##x_2##. It seems that numerical integration is the way to go.
     
  16. Aug 12, 2018 #15
    I tried to do the numerical integrals for your results. If we write your result as ##K(I_1(\eta)+I_2(\eta)))##, then the first integral ##I_1(\eta)## was evaluated OK, but the second ##I_2(\eta)## gives me a division by zero error. It gives me the same error when I divide the integral over ##x_2## instead of ##y_2##!!
     
  17. Aug 12, 2018 #16

    andrewkirk

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    I expect that's because the denominator ##(\eta - x_1/y_1)## of the exponent of the factor ##e^{-\frac{x_1}{\eta - x_1/y_1}(K-1)}## in the inner integrand goes to zero at the upper inner integration limit. The integral is an improper integral, but the integrand can be made continuous. You should be able to get around the problem by changing the integrand to be a branched function that gives zero if ##x_1 = y_1\eta## and what is written otherwise.
     
  18. Aug 12, 2018 #17
    You are right. When I accounted for the case ##\eta - x_1/y_1 = 0## I got the result without any warning or errors. I thought the software takes care of this, since ##e^{-\infty} = 0##! Apparently not. Thanks
     
  19. Aug 30, 2018 #18
    Have you run this code? Why do you pass M-1 and not M to the original function? Also why do you pass K-1 (it is not needed)? I am trying to implement the mathematical integration numerically in Python (I did it in a slightly different way), but it is not working. I am trying to figure out why. I wrote the recursive function mathematically as the following:

    [tex]F_{m-1}(z_{m-2}) = \int_{y_{m-1}=0}^{\infty}\int_{x_{m-1}=x_{m-2}}^{\max(x_{m-2},\,y_{m-1}\,z_{m-2})}F_m\left(z_{m-2}-\frac{x_{m-1}}{y_{m-1}}\right)f_X(x_{m-1})f_Y(y_{m-1})\,dx_{m-1}\,dy_{m-1}[/tex]

    where

    [tex]F_{M}(z_{M-1}) = \int_{y_{M}=0}^{\infty}\int_{x_{M}=x_{M-1}}^{\max(x_{M-1},\,y_{M}\,z_{M-1})}\left[1-F_X(x_M)\right]^{K-M}f_X(x_{M})f_Y(y_{M})\,dx_{M}\,dy_{M}[/tex]

    So, we need to find ##\frac{K!}{(K-M)!}F_1(z_0)##.

    The code corresponding to this recursive function is

    Code (Python):
    import math
    K=3
    M=2
    dx = 0.01
    dy = 0.01
    y_lim = 100

    #the CDF of X
    def F(x):
        return math.exp(-x*(K-M))

    #The pdf of Xs and Ys
    def f(u):
        return math.exp(-u)

    def G(z, m, x_prev_lim):
        if m == M:
            res = 0
            ux = x_prev_lim
            uy = 0
            while uy < y_lim:
                uy += dy
                resx = 0
                while ux <= max(x_prev_lim, uy*z):
                    ux += dx
                    resx += F(ux)*f(ux)*dx
                res += resx*f(uy)*dy
     
        else:
            res = 0
            ux = x_prev_lim
            uy = 0
            while uy < y_lim:
                uy += dy
                resx = 0
                while ux <= max(x_prev_lim, uy*z):
                    ux += dx
                    resx += G(z - (ux/uy), m+1, ux)*f(ux)*dx
                res += resx*f(uy)*dy
        return res

    print((math.factorial(K)/math.factorial(K-M))*G(1, 1, 0))
    Where did I make a mistake?
     
    Last edited: Aug 30, 2018
  20. Aug 30, 2018 #19

    andrewkirk

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    Because there are M nested integrals if the innermost integrand is a PDF, but in the above code the innermost integrand is a CDF, which is the integral of a PDF, so we use only M-1 nested integrals.

    Here is that innermost integrand:
    Code (Text):
    integrand = (1 - cdf_x(x)) ^ k
    The K is used as the exponent in that innermost integrand.

    I am away from home till Monday, and hence away from my Python compiler. But I hope the above is of some help.
     
  21. Aug 30, 2018 #20
    OK, that makes sense when the inner most double-integrals (##F_M(z_{M-1})## in the above equation) are evaluated numerically. Otherwise, the CDF of ##X## (which appears what you are referring to) is part of the joint PDF of the order statistics. And thus we still have ##M## nested double-integrals, right?

    Regarding the argument ##k-1##, in the mathematical formulation, ##K## appears only once in evaluating ##F_M(z_{M-1})##, and it is constant. I wonder why you included it as an argument.
     
    Last edited: Aug 30, 2018
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted