Numerical Integration in Python

EngWiPy · Aug 22, 2018

I want to find the numerical solution to the following nested integral in Python

[tex] \frac{K!}{(K-M)!}\int_{x_1=0}^{y}\int_{x_{2}=x_1}^{\max(x_1,\,y-x_1)}\cdots \int_{x_M=x_{M-1}}^{\max(x_{M-1},\,y-\sum_{m=1}^{M-1}x_m)} \frac{1}{(1+x_M)^{K-M+2}}\prod_{m=1}^{M-1} \frac{1}{(1+x_m)^2}\,dx_1\,dx_2\cdots\,dx_M[/tex]

To that end, I did the following

Python:

import math
K=4
M=2
du = 0.01
def F(u):
      return 1/(1+u)**(K-M+2)
#Define the PDF of U
def f(u):
     return 1/((1+u))**2
 

def G(y, m, prev_lim):
    #print(f'h is {h}, and k is {k}')
    if m == M:
        res = F(y)
    else:
        res =  0
        u = prev_lim
        while u < y:
            res += G(y-u, m+1, u)*f(u)*du
            u += du
    return (math.factorial(K)/math.factorial(K-M))*res

print(G(2, 1, 0))

The above integral is the CDF of a random variable, but the result isn't between 0 and 1. So, I guess there is an error, but where?

Thanks

willem2 · Aug 22, 2018

EngWiPy said:

The above integral is the CDF of a random variable, but the result isn't between 0 and 1. So, I guess there is an error, but where?

I can find 2 related errors.
- With M = 2, you have a 2 dimensional integral. How big is the area that you integrate over? how many points are in this area? It's useful to let your program count this, I get only 200 if I increase a global variable count for each computation of F(y).
- Each of the points represents an area of size (du)² for a double integral.

EngWiPy · Aug 22, 2018

willem2 said:

I can find 2 related errors.
- With M = 2, you have a 2 dimensional integral. How big is the area that you integrate over? how many points are in this area? It's useful to let your program count this, I get only 200 if I increase a global variable count for each computation of F(y).
- Each of the points represents an area of size (du)² for a double integral.

I don't think I understand what you mean, and how these are errors, and how to fix them. Here M can be any positive integer between 1 and K. M=2 is just a case scenario for testing. Could you elaborate more on the errors, and how to fix them?

willem2 · Aug 22, 2018

EngWiPy said:

I don't think I understand what you mean, and how these are errors, and how to fix them. Here M can be any positive integer between 1 and K. M=2 is just a case scenario for testing. Could you elaborate more on the errors, and how to fix them?

For M=2, you should integrate over 0 x₁ < 1, x₂ > x₁, x₂> 2-x₁. which has an area of 1.
Each of the points that you sum over represents an area of du*du. since du = 0.01 there should be about 10000 points to sum. (ignoring the triangular areas at the edges for small du) You only have 200 points, because the recursion is only one level deep, Because the recursion is only one level deep, you multiply the contributions of each point with du, instead of du*du.
I think, starting with m=0 will solve both problems

EngWiPy · Aug 22, 2018

willem2 said:

For M=2, you should integrate over 0 x₁ < 1, x₂ > x₁, x₂> 2-x₁. which has an area of 1.
Each of the points that you sum over represents an area of du*du. since du = 0.01 there should be about 10000 points to sum. (ignoring the triangular areas at the edges for small du) You only have 200 points, because the recursion is only one level deep, Because the recursion is only one level deep, you multiply the contributions of each point with du, instead of du*du.
I think, starting with m=0 will solve both problems

I tried m=0, but didn't fix the issue. I think I forgot to incorporate the max in the recursion. I did the following

Python:

def G(y, m, prev_lim):
    #print(f'h is {h}, and k is {k}')
    if m == M:
        res = F(y)
    else:
        res =  0
        u = prev_lim
        while u < max(prev_lim, y):
            res += G(y-u, m+1, u)*f(u)*du
            u += du
    return (math.factorial(K)/math.factorial(K-M))*res

But it didn't work as well!

andrewkirk · Aug 22, 2018

It looks to me like the multiplication by factorials, in the last line of function G, will happen for every level of recursion, whereas it should only happen once. To fix that, take that multiplication out of the G function and put it in the last line of the main code, which becomes

Code:

print((math.factorial(K) / math.factorial(K-M)) * G(2, 1, 0))

If you still have accuracy problems, you could consider using Simpson integration rather than the rectangular integration that is currently used.

If it runs for too long, there are other techniques that could be tried along the lines of what we've looked at earlier.

EngWiPy · Aug 22, 2018

andrewkirk said:
It looks to me like the multiplication by factorials, in the last line of function G, will happen for every level of recursion, whereas it should only happen once. To fix that, take that multiplication out of the G function and put it in the last line of the main code, which becomes
Code:
print((math.factorial(K) / math.factorial(K-M)) * G(2, 1, 0))
If you still have accuracy problems, you could consider using Simpson integration rather than the rectangular integration that is currently used.

If it runs for too long, there are other techniques that could be tried along the lines of what we've looked at earlier.

Thanks. You are right. The factorial fraction should be multiplied by the final result once at the end. Doing this started to give results between 0 and 1. That is a progress. But relative to the Monte-Carlo simulations, which is found as following

Python:

count = 0
N=100000
for n in range(0, N):
      X = np.random.exponential(1, (1, K))
      Y = np.random.exponential(1, (1, K))
      Z = X/Y
      Z = np.sort(Z[0])
      if np.sum(Z[:M]) <= 2:
         count += 1

print(count/N)

the results are not accurate. Is calling the function G(2, 1, 0) is correct, or we need to call G(2, 0, 0)? Also I am not sure about this segment

Python:

while u < max(prev_lim, y):
            res += G(y-u, m+1, u)*f(u)*du

Is the condition of the while loop correct? Because I choose the max in the while loop condition, I don't have to choose the max in the subsequent calling of G, right?

andrewkirk · Aug 23, 2018

Re your two questions, your code looks to me as though it implements the formula in the OP reliably.

Re the comparison to simulation results, what are you getting as the results from the numeric integration and the simulation? Note that the comparison assumes that the formula in the OP is correct (see last para below).

Since either of those could have an error, why not calculate an analytic value to compare it to? As I recall, when we tried to calculate the integral analytically, it worked for a two-deep nesting, then started to look too hard. Since your nesting above is only two-deep, why not dig out the analytic formula you got for a two-deep nest, calculate the value using that, and compare it to the numeric and simulation values to see which is more accurate?

Also, I am not sure the latex formula in the OP is correct. Can you link to where you derived it? I vaguely recall that the inner integral should have a factor that is a CDF of the inner integation variable, raised to some power, and another factor that is the PDF of the inner integration variable. But the version in the OP only has the CDF factor.

EngWiPy · Aug 23, 2018

andrewkirk said:

Re your two questions, your code looks to me as though it implements the formula in the OP reliably.

Re the comparison to simulation results, what are you getting as the results from the numeric integration and the simulation? Note that the comparison assumes that the formula in the OP is correct (see last para below).

Since either of those could have an error, why not calculate an analytic value to compare it to? As I recall, when we tried to calculate the integral analytically, it worked for a two-deep nesting, then started to look too hard. Since your nesting above is only two-deep, why not dig out the analytic formula you got for a two-deep nest, calculate the value using that, and compare it to the numeric and simulation values to see which is more accurate?

Also, I am not sure the latex formula in the OP is correct. Can you link to where you derived it? I vaguely recall that the inner integral should have a factor that is a CDF of the inner integation variable, raised to some power, and another factor that is the PDF of the inner integration variable. But the version in the OP only has the CDF factor.

I did derive the analytic result for ##M=2##. So, I compared the analytic results, with the numerical integration and the MC simulations. Both the analytic result and the MC coincide, while the numerical integration is way off. To be specific, for ##K=4##, ##M=2##, and ##y=2##, the analytical and MC results are 0.8105581 and 0.8114, respectively, while the numerical integration result is 0.14021.

OK, we didn't actually talked about this case. Here I have ##K## random variables ##X_1,\,X_2,\,\ldots,\,X_K##, where ##X_k=A_k/B_k##, where ##A_k## and ##B_k## are exponential RV with parameter 1 (all RVs are independent). Then I order the RVs as

[tex]X_{(1)}\leq X_{(2)}\leq\cdots X_{(K)}[/tex]

and form the new RV ##Y## as the summation of the ##M## smallest random variables as

[tex]Y=\sum_{m=1}^MX_{(m)}[/tex]

I want to find the CDF of ##Y##, which is

[tex]F_Y(y)=\text{Pr}\left[\sum_{m=1}^MX_{(m)}\leq y\right]= \int_{x_1=0}^{y}\int_{x_{2}=x_1}^{\max(x_1,\,y-x_1)}\cdots \int_{x_M=x_{M-1}}^{\max(x_{M-1},\,y-\sum_{m=1}^{M-1}x_m)}f_{(1),(2),\ldots,(M)}(x_1,\,x_2,\,\ldots,\,x_M)\,dx_m\,dx_{M-1}\cdots dx_1[/tex]

where

[tex]f_{(1),(2),\ldots,(M)}(x_1,\,x_2,\,\ldots,\,x_M)=\frac{K!}{(K-M)!}\left[1-F_X(x_M)\right]^{K-M}\prod_{m=1}^Mf_X(x_m)[/tex]

is the joint PDF of the order statistics ##\{X_{(m)}\}_{m=1}^M##, and ##F_X(x_k)=1-1/(1+x_k)## and ##f_X(x_k)=1/(1+x_k)^2## are the CDF and PDF of the random variable ##X_k##. All of this yields eventually to the formula I posted in the original post. Is this correct?

PS: if I evaluate ##1-F_Y(y)##, I get closer results to the analytic and MC simulations, but I am not sure if this is a coincidence, because my derivations is for ##F_Y(y)##. and not ##1-F_Y(y)##!

SlowThinker · Aug 23, 2018

EngWiPy said:
Python:
while u < max(prev_lim, y):
 res += G(y-u, m+1, u)*f(u)*du
Is the condition of the while loop correct? Because I choose the max in the while loop condition, I don't have to choose the max in the subsequent calling of G, right?

Well the use of "max" function looks suspicious. It seems to do nothing in your code. Perhaps you meant something like

Python:

while u < y-prev_lim:

but this will make the result even smaller so there's got to be something else.
Maybe make du=0.5 or 0.25 and trace all evaluations? (I assume Python cannot be debugged. If you can then it should be easy to find the mistake).

EngWiPy · Aug 23, 2018

SlowThinker said:
Well the use of "max" function looks suspicious. It seems to do nothing in your code. Perhaps you meant something like
Python:
while u < y-prev_lim:
but this will make the result even smaller so there's got to be something else.
Maybe make du=0.5 or 0.25 and trace all evaluations? (I assume Python cannot be debugged. If you can then it should be easy to find the mistake).

The max in the while loop implements the max in the upper limit of the inner integrals. I think it is correct as it is. If we remove it, the upper limit will be only y-prev_lim, which isn't always greater than prev_lim.

SlowThinker · Aug 23, 2018

EngWiPy said:

The max in the while loop implements the max in the upper limit of the inner integrals. I think it is correct as it is. If we remove it, the upper limit will be only y-prev_lim, which isn't always greater than prev_lim.

Yes but you start from prev_lim anyway and go only up. So the max function does not add any cycles of the while loop.

EngWiPy · Aug 23, 2018

SlowThinker said:

Yes but you start from prev_lim anyway and go only up. So the max function does not add any cycles of the while loop.

Ooops, the while loop condition should be

Python:

while u < max(prev_lim, y-prev_lim):

It is correct on my computer, but editing it here I forgot it. Does it make sense now?

SlowThinker · Aug 23, 2018

EngWiPy said:
Ooops, the while loop condition should be
Python:
while u < max(prev_lim, y-prev_lim):
It is correct on my computer, but editing it here I forgot it. Does it make sense now?

It's still the same. You should get the same result with the max, or just u<y-prev_lim.

EngWiPy · Aug 23, 2018

SlowThinker said:

It's still the same. You should get the same result with the max, or just u<y-prev_lim.

Actually, I do, but why? Where is the max limit, and how to implement it effectively. y-prev_lim alone doesn't sound right, and indeed the result is still not consistent with Monte-Carlo simulations!

SlowThinker · Aug 23, 2018

EngWiPy said:

Actually, I do, but why? Where is the max limit, and how to implement it effectively. y-prev_lim alone doesn't sound right, and indeed the result is still not consistent with Monte-Carlo simulations!

Maybe you need the max in mathematics because ##\int_1^0 x\,dx## is valid, but the code will return 0 if bottom bound is higher than upper bound.

Anyway the code looks about right, there's probably some m+1 instead of m or something similar. The best way to find it is by debugging. A worse way is to integrate function f(x)=1 and if it works, move to the full function in small steps.
It might help if you rewrite the integral expression to the form you implemented, with ##\frac{1}{(1+x_m)^2}## terms moved outside the integrals.

Mark44 · Aug 23, 2018

SlowThinker said:

I assume Python cannot be debugged.

Not true. There's a debugger that comes with Python distributions. It's pretty primitive, but it's much better than no debugger at all. I wrote a couple of Insights articles on how to use it --

SlowThinker · Aug 23, 2018

Mark44 said:

Not true. There's a debugger that comes with Python distributions. It's pretty primitive

Uh huh. In that case it might be actually easier to use debugging prints (with a large du of course).

EngWiPy · Aug 23, 2018

I don't know how to use the debugger.

I think I know what the problem is. In my code, once ##m=M##, the function ##G## returns ##1/(1+x)^{K-M+2}## only for the first value of the most inner integral, while it should computer ##1/(1+x)^{K-M+2}## for all values between ##x_{M-1}## up to ##\max(x_{M-1}, y-\sum_{m=1}^{M-1}x_m)##. Also, in the recursion, the first argument should be the same as the upper limit of the while loop, and not ##y-u##. So, the code must be something like this

Python:

K = 4
M = 2
du =0.001
N = 100000
def F(u):
      return 1/(1+u)**(K-M+2)
#Define the PDF of U
def f(u):
     return 1/((1+u))**2
 

def G(h, m, prev_lim):
    #print(f'h is {h}, and k is {k}')
    if m == M:
        res = 0
        u = prev_lim
        while u <= max(prev_lim, h-prev_lim):
            res += F(u)*du
            u += du
    else:
        res =  0
        u = prev_lim
        #while u <= h-prev_lim:#max(prev_lim, ):
        while u <= max(prev_lim, h-prev_lim):
            res += G(max(prev_lim, h-prev_lim), m+1, u)*f(u)*du
            u += du
    return res
#(math.factorial(K)/math.factorial(K-M))*

print((math.factorial(K)/math.factorial(K-M))*G(2, 1, 0))

# Simulation
count = 0
for n in range(0, N):
    X = np.random.exponential(1, (1, K))
    Y = np.random.exponential(1, (1, K))
    Z = X/Y
    Z = np.sort(Z[0])
    if np.sum(Z[:M]) <= 2:
        count += 1
     
print(count/N)

In this case the numerical integration gives 0.819, while the simulations give 0.8121, which is very close.

Mark44 · Aug 23, 2018

SlowThinker said:

Uh huh. In that case it might be actually easier to use debugging prints (with a large du of course).

It's not so primitive that extra print statements are necessary. You can set breakpoints and single-step through the code, query for the values of variables, and lots of other operations.

EngWiPy · Aug 23, 2018

Specifying ##du=0.001## as an absolute value, regardless of ##y## is not practical, because if ##y## is very large, the numerical integration would take forever to evaluate, as the number of points would be huge. Instead of specifying the step size ##du## in the numerical integration, I can specify the number of areas/points ##N##, and then calculate ##du=y/N##, and then pass it as an argument to the recursive function. Does this make sense?

SlowThinker · Aug 24, 2018

EngWiPy said:

Specifying ##du=0.001## as an absolute value, regardless of ##y## is not practical, because if ##y## is very large, the numerical integration would take forever to evaluate, as the number of points would be huge. Instead of specifying the step size ##du## in the numerical integration, I can specify the number of areas/points ##N##, and then calculate ##du=y/N##, and then pass it as an argument to the recursive function. Does this make sense?

You may first want to let it run with small du to make sure that it works now, before optimizing.
You can play with du a bit but it will always be a trade between speed and accuracy.
Then you can try Simpson's rule, that's reasonably simple and improves accuracy even with bigger du. Requires constant du though.
If you really want to spend the time on it, you can instead make du smaller where the function changes faster (where u is small).

Is it possible to use the analytical solution for the last 2 levels? After you verify that the algorithm works, of course.

EngWiPy · Aug 24, 2018

The code is working as I explained in post #19. I guess I can find an analytic solution to the last two levels. I will check Simpson's rule. I may have to go to C++ for faster running time (in the past I got 30x speedup for the same code in C++ as that on Python). Unless there is a way to vectorize my code in Python (or MATLAB), that can speed up the running time significantly, which would be easier for me, as I am not that proficient in C++, and it is not as user-friendly.

EngWiPy · Aug 24, 2018

I was reading about the Simpson's rule, and it states that

[tex]\int_a^bf(x)\,dx\simeq \frac{\Delta x}{3}\left[y_0+4\sum_{i=0}^{n/2}y_{2i+1}+2\sum_{j=1}^{n/2}y_{2j}\right][/tex]

where ##y_i=f(a+i\Delta x)##, and ##\Delta x = \frac{b-a}{n}##, where ##n## is even.

So, in my code, I just need to alternate between

Python:

res += 4*G(max(prev_lim, h-prev_lim), m+1, u)*f(u)*(du/3)

and

Python:

res += 2*G(max(prev_lim, h-prev_lim), m+1, u)*f(u)*(du/3)

after the first evaluation, and that's it?

EngWiPy · Aug 24, 2018

Apparently so

SlowThinker · Aug 25, 2018

EngWiPy said:

Apparently so

The last sample needs to have weight 1 as well. So it's 1, 4, 2, 4, 2, ..., 2, 4, 1. Also make sure that the number of samples is even (perhaps by tweaking du a bit before the loop, or starting at u=h/2 and going both ways at once).

EngWiPy · Aug 25, 2018

SlowThinker said:

The last sample needs to have weight 1 as well. So it's 1, 4, 2, 4, 2, ..., 2, 4, 1. Also make sure that the number of samples is even (perhaps by tweaking du a bit before the loop, or starting at u=h/2 and going both ways at once).

Right, the last sample should have a weight of 1, not 2. What do you mean by "going both ways at once"?

SlowThinker · Aug 25, 2018

EngWiPy said:

Right, the last sample should have a weight of 1, not 2. What do you mean by "going both ways at once"?

I meant something like this

Code:

        res =  0
        u = h/2 #prev_lim
        while u <= max(prev_lim, h-prev_lim):
            res += G(max(prev_lim, h-prev_lim), m+1, u)*f(u)*du
            res += G(max(prev_lim, h-prev_lim), m+1, h-u)*f(h-u)*du
            u += du

but now it doesn't seem like a good idea. You'd still have to make sure that the number of cycles is even, and you'd count the middle sample twice.

So just tweak the du with something like this

Code:

        if 2*prev_lim < h:
            num_cyc=((h-2*prev_lim) // du) | 1
            for cyc=range(0, num_cyc)
                u=prev_lim+du*cyc/num_cyc
                res+= as before

It may need special case for num_cyc==1.
There are always rounding errors when dividing floating-point numbers, that's why I recommend using an integer to guide the loop.

It might be that if you make du a negative power of 2 (say 1/512) you'll automatically always get even number of steps. That would be the easiest and most accurate way to do this.

EngWiPy · Sep 1, 2018

Now I have this numerical integration

[tex]K(K-2)\int_{y_1=0}^{\infty}\int_{x_1=x_0}^{\max(x_0,\,y_1z_0)}\left[\int_{y_2=0}^{\infty}\int_{x_2=x_1}^{\max(x_1,\,y_2z_1)}\left[1-F_X(x_2)\right]^{K-2}f_X(x_2)\,dx_2\,dy_2\right]f_X(x_1)\,f_Y(y_1)\,dx_1\,dy_1[/tex]

where ##z_0=z## and ##x_0=0##. I wrote this as a recursive function as

[tex]G_{m-1}(z_{m-2})=\int_{y_{m-1}=0}^{\infty}\left[\int_{x_{m-1}=x_{m-2}}^{\max(x_{m-2},\,y_{m-1}\,z_{m-2})}G_{m}(z_{m-2}-\frac{x_{m-1}}{y_{m-1}})\,f_X(x_{m-1})dx_{m-1}\right]f_Y(y_{m-1})\,dy_{m-1}[/tex]

where

[tex]G_M(z_{M-1})=\int_{y_{M}=0}^{\infty}\left[\int_{x_{M}=x_{M-1}}^{\max(x_{M-1},\,y_{M}\,z_{M-1})}\left[1-F_X(x_M)\right]^{K-M}\,f_X(x_{M})dx_{M}\right]f_Y(y_{M})\,dy_{M}[/tex]

where ##F_X(x)=1-\exp(-x)##, ##f_X(x)=\exp(-x)##, and ##f_Y(y)=\exp(-y)##. The code for this

Python:

import math

#Define some constants
K = 3
M = 2
dx = 0.01
dy = 0.01
#Intead the upper bound infinity over ys
y_lim = 50

#[1-FX(x)]^{(K-M)}
def F(x):
    return math.exp(-x*(K-M))

#fX(x) or fY(y)
def f(u):
    return math.exp(-u)

def G(z, m, x_prev_lim):
    res = 0
    #The lower limit of the integrations over x
    ux = x_prev_lim
    uy = 0
    while uy < y_lim:
        uy += dy
        resx = 0
        while ux <= max(x_prev_lim, uy*z):
            ux += dx
            if m == M:
                resx += F(ux)*f(ux)*dx
            else:
                resx += G(z - (ux/uy), m+1, ux)*f(ux)*dx
        res += resx*f(uy)*dy
    return res

(math.factorial(K)/math.factorial(K-M))*G(1, 1, 0)

But it yields a different result than the MC simulations, which is 0.0512. Why?

SlowThinker · Sep 2, 2018

In the first formula, is it missing ##f_Y(y_2)## somewhere? Where do the ##z_i## come from? Where does the x/y in the second formula come from?
What result do you get, and how it changes if you use smaller dx & dy?
Do you know if the MC simulation is correct?
But seriously, if you want to make programs, you need to learn to work without mistakes. Start from the first equation and rewrite it in small steps until you get a program. Rewriting such a complex formula into a program in one step and hoping for the best, isn't going to work.

Numerical Integration in Python

Similar threads

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect