Order Statistics: CDF Calculation for i.i.d. Random Variables

  • Context: Undergrad 
  • Thread starter Thread starter EngWiPy
  • Start date Start date
  • Tags Tags
    Cdf Statistics
Click For Summary
SUMMARY

This discussion focuses on calculating the cumulative distribution function (CDF) for the sum of order statistics derived from independent and identically distributed (i.i.d.) random variables. The main formula presented is the probability expression for the sum of transformed order statistics, specifically Pr[sum_{i=1}^3 G_k ≤ η], where G_k = X_{(k)}/Y_{i_k}. The discussion clarifies the assumptions regarding the independence and distribution of the random variables involved, emphasizing that all X_k and Y_k have the same probability density functions (PDFs) f_X and f_Y, respectively. The final CDF expression is derived through a series of nested integrals, highlighting the complexity introduced by the order statistics.

PREREQUISITES
  • Understanding of cumulative distribution functions (CDF) and probability density functions (PDF)
  • Familiarity with order statistics in probability theory
  • Knowledge of integration techniques, particularly multiple integrals
  • Basic programming skills in Python or MATLAB for numerical evaluation
NEXT STEPS
  • Learn about the properties of order statistics and their applications in statistical analysis
  • Study numerical integration techniques in Python, focusing on libraries like SciPy
  • Explore the derivation and application of joint probability density functions for multiple random variables
  • Investigate the use of Monte Carlo simulations for validating analytical results in probability
USEFUL FOR

Statisticians, data scientists, and researchers working with probabilistic models, particularly those dealing with order statistics and their applications in statistical inference and analysis.

  • #31
Solving ##dy## yields to the following formula

dy = y\left[\frac{1}{1-y\,\delta}-1\right]

To have a positive increment ##dy## we need ##0<y\,\delta<1##. How to incorporate this in the calculation of ##dy##? In other words, what ##dy## should be when the condition is not met? Can we say

dy=\max\left[\rho,\,\frac{1}{1-y\,\delta}-1\right]

where ##\rho = 0.001##, for example? Also, I think we need to put some condition on ##dy## like ##0<dy<1##, right?

The code becomes (before applying the conditions on ##dy##)

Python:
import math
K = 3
M = 2
dx = 0.01
delta = 0.01

y_Upper_lim = 7

#[1-FX(x)]^{(K-M)}
def F(x):
    return math.exp(-x*(K-M))

#fX(x) or fY(y)
def f(u):
    return math.exp(-u)

def G(z, m, x_prev_lim, y_current_lower_lim):
    res = 0
    #The lower limit of the integrations over x
    ux = x_prev_lim
    uy = y_current_lower_lim
    dy = uy*((1/(1-uy*delta)) - 1)
    while uy < y_Upper_lim:
        while ux <= uy*z:
            if m == M:
                res += F(ux)*f(ux)*f(uy)*dx*dy
                ux += dx
            else:
                res += G(z - (ux/uy), m+1, ux, z/ux)*f(ux)*f(uy)*dx*dy
                ux += dx
        uy += dy
    return res

(math.factorial(K)/math.factorial(K-M))*G(1, 1, 0.0001, 0.0001)

but ##dy## sometimes becomes negative, and others very large positive (e.g., 1000)!
 
Last edited:
Physics news on Phys.org
  • #32
The calculation of dy needs to be inside the 'while uy...' loop, otherwise the increment size will be the same throughout that integration.

In the recursive call of G, the last argument needs to be ux/z rather than z/ux.

EngWiPy said:
To have a positive increment dy we need 0<yδ<1. How to incorporate this in the calculation of dy?
That condition will automatically be met in the range being considered. With ##y\le L=7## and ##\delta=0.01##, we'll have ##y\delta \le 0.07##.

EngWiPy said:
Also, I think we need to put some condition on dy like 0<dy<1
Again, with ##L=7## and ##\delta=0.01## that will be unnecessary. I think the largest value of dy we get will be 49/93 which is <1.

Lastly, to speed things up, you may wish to put a minimum of ##\varepsilon=0.0001## on each dy that is calculated, otherwise the first dy in the outer loop will be around ##10^{-6}##, which may make it quite slow. Since that minimum has already been applied impoicitly in the first subinterval, accuracy will not be materially degraded by applying for intervals with larger value of y.
 
  • #33
Ooops, you are right. I updated the code as the following , yet it is not working correctly

Python:
import math
K = 3
M = 2
dx = 0.001
dy = 0.001
delta = 0.001

y_Upper_lim = 7

#[1-FX(x)]^{(K-M)}
def F(x):
    return math.exp(-x*(K-M))

#fX(x) or fY(y)
def f(u):
    return math.exp(-u)

def G(z, m, x_prev_lim, y_current_lower_lim):
    res = 0
    #The lower limit of the integrations over x
    ux = x_prev_lim
    uy = y_current_lower_lim
    while uy < y_Upper_lim:
        while ux < uy*z:
            print(f'ux = {ux}, uy*z = {uy*z}, uy = {uy}, and dy = {dy}')
            if m == M:
                res += F(ux)*f(ux)*f(uy)*dx*dy
                ux += dx
            else:
                res += G(z - (ux/uy), m+1, ux, ux/z)*f(ux)*f(uy)*dx*dy
                ux += dx
        uy += dy
    return res

(math.factorial(K)/math.factorial(K-M))*G(1, 1, 0, 0.0001)

I put a print statement after the while loop of ux, and it seems that ux is always close uy*z. Is this what supposed to happen? Also, from that print statement, controlling dy is tricky, because when it is very small (1E-6), we need set it to say 0.0001, but it grows larger than 0.0001 at some point, in which case we probably want to make it smaller. So, I fixed it for now. The numerical results for the above parameters is 3.3406241938703932e-09, while the MC simulations for the same parameters is 0.0489, which makes much more sense!
 
  • #34
Rather than moving the calculation of dy to inside the 'while uy...' loop, it has been moved to the top where it sets dy to be constant at 0.01. It needs to be inside the aforesaid loop, and to use the formula that relates it to delta and uy.

Given epsilon = 0.0001, the statement is:

dy = max(epsilon, uy*((1/(1-uy*delta)) - 1))

EngWiPy said:
it seems that ux is always close uy*z
For uy close to 0 that should usually be the case. But I would not expect it for uy>1.
EngWiPy said:
but it grows larger than 0.0001 at some point, in which case we probably want to make it smaller
No, that would just slow the program down with no material improvement in accuracy. See second-last para of post #32.
 
  • Like
Likes   Reactions: EngWiPy
  • #35
I did this exactly dy = max(epsilon, uy*((1/(1-uy*delta)) - 1)), but I noticed that dy grows larger, where I thought wouldn't give accurate results. But still, this doesn't give the expected result. There is some logical error. Now I have this code

Python:
import math
K = 3
M = 2
dx = 0.001
#dy = 0.001
delta = 0.001

y_Upper_lim = 7

#[1-FX(x)]^{(K-M)}
def F(x):
    return math.exp(-x*(K-M))

#fX(x) or fY(y)
def f(u):
    return math.exp(-u)

def G(z, m, x_prev_lim, y_current_lower_lim):
    res = 0
    #The lower limit of the integrations over x
    ux = x_prev_lim
    uy = y_current_lower_lim
    while uy < y_Upper_lim:
        dy = max(0.0001, uy*((1/(1-uy*delta)) - 1))
        while ux < uy*z:
            print(f'ux = {ux}\n z = {z}\n uy*z = {uy*z}\n uy = {uy}\n ux/uy = {ux/uy}\n dy = {dy}\n')
            if m == M:
                res += F(ux)*f(ux)*f(uy)*dx*dy
                ux += dx
            else:
                res += G(z - (ux/uy), m+1, ux, ux/z)*f(ux)*f(uy)*dx*dy
                ux += dx
        uy += dy
    return res

(math.factorial(K)/math.factorial(K-M))*G(1, 1, 0, 0.0001)

and the first and last 4 print statements in the while loop of ux are as following

ux = 0
z = 1
uy*z = 0.0001
uy = 0.0001
ux/uy = 0.0
dy = 0.0001

ux = 0
z = 1.0
uy*z = 0.0001
uy = 0.0001
ux/uy = 0.0
dy = 0.0001

ux = 0.001
z = 1.0
uy*z = 0.0010000000000000002
uy = 0.0010000000000000002
ux/uy = 0.9999999999999998
dy = 0.0001

ux = 0.002
z = 1.0
uy*z = 0.0020000000000000005
uy = 0.0020000000000000005
ux/uy = 0.9999999999999998
dy = 0.0001

.
.
.ux = 6.96200000000066
z = 1
uy*z = 6.965954503188108
uy = 6.965954503188108
ux/uy = 0.9994323099317346
dy = 0.04886491290055385

ux = 6.96300000000066
z = 1
uy*z = 6.965954503188108
uy = 6.965954503188108
ux/uy = 0.9995758652764534
dy = 0.04886491290055385

ux = 6.96400000000066
z = 1
uy*z = 6.965954503188108
uy = 6.965954503188108
ux/uy = 0.9997194206211721
dy = 0.04886491290055385

ux = 6.965000000000661
z = 1
uy*z = 6.965954503188108
uy = 6.965954503188108
ux/uy = 0.9998629759658909
dy = 0.04886491290055385

which show that ux, uy, and uy*z are almost equal form ##ux\neq 0## (and thus ux/uy ~ 1), and ##z## remains 1 (as it was passed) through out the execution of the code, and for all the recursion calls! This shouldn't happen. z should change with each recursion call.
 
  • #36
The problem was that ux was not reset to x_prev_lim for each uy value, and there was a mistake in the MC simulation. Now it works perfectly, and agrees with the MC simulation. Thanks for your responses @andrewkirk.

For the integration over Xs, dx is constant, and thus applying Simpson's rule for some optimization is straightforward. We simply compute dx = (uy*z - x_prev_lim)/L, where L is an even number (say 600) that represents the number of points in the numerical integration.

However, for Ys the step size is changing within the same integral. How can we deal with the Simpson's rule here, to make sure that the number of points is even?
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 10 ·
Replies
10
Views
6K
  • · Replies 36 ·
2
Replies
36
Views
5K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K