# Numerical Integration in Python

• EngWiPy
In summary: But it didn't work as well!It looks to me like the multiplication by factorials, in the last line of function G, will happen for every level of recursion, whereas it should only happen once. To fix that, take that multiplication out of the G function and put it... somewhere else?
EngWiPy
I want to find the numerical solution to the following nested integral in Python

$$\frac{K!}{(K-M)!}\int_{x_1=0}^{y}\int_{x_{2}=x_1}^{\max(x_1,\,y-x_1)}\cdots \int_{x_M=x_{M-1}}^{\max(x_{M-1},\,y-\sum_{m=1}^{M-1}x_m)} \frac{1}{(1+x_M)^{K-M+2}}\prod_{m=1}^{M-1} \frac{1}{(1+x_m)^2}\,dx_1\,dx_2\cdots\,dx_M$$

To that end, I did the following

Python:
import math
K=4
M=2
du = 0.01
def F(u):
return 1/(1+u)**(K-M+2)
#Define the PDF of U
def f(u):
return 1/((1+u))**2

def G(y, m, prev_lim):
#print(f'h is {h}, and k is {k}')
if m == M:
res = F(y)
else:
res =  0
u = prev_lim
while u < y:
res += G(y-u, m+1, u)*f(u)*du
u += du
return (math.factorial(K)/math.factorial(K-M))*res

print(G(2, 1, 0))

The above integral is the CDF of a random variable, but the result isn't between 0 and 1. So, I guess there is an error, but where?

Thanks

Last edited:
EngWiPy said:
The above integral is the CDF of a random variable, but the result isn't between 0 and 1. So, I guess there is an error, but where?

I can find 2 related errors.
- With M = 2, you have a 2 dimensional integral. How big is the area that you integrate over? how many points are in this area? It's useful to let your program count this, I get only 200 if I increase a global variable count for each computation of F(y).
- Each of the points represents an area of size (du)2 for a double integral.

willem2 said:
I can find 2 related errors.
- With M = 2, you have a 2 dimensional integral. How big is the area that you integrate over? how many points are in this area? It's useful to let your program count this, I get only 200 if I increase a global variable count for each computation of F(y).
- Each of the points represents an area of size (du)2 for a double integral.

I don't think I understand what you mean, and how these are errors, and how to fix them. Here M can be any positive integer between 1 and K. M=2 is just a case scenario for testing. Could you elaborate more on the errors, and how to fix them?

EngWiPy said:
I don't think I understand what you mean, and how these are errors, and how to fix them. Here M can be any positive integer between 1 and K. M=2 is just a case scenario for testing. Could you elaborate more on the errors, and how to fix them?

For M=2, you should integrate over 0 x1 < 1, x2 > x1, x2> 2-x1. which has an area of 1.
Each of the points that you sum over represents an area of du*du. since du = 0.01 there should be about 10000 points to sum. (ignoring the triangular areas at the edges for small du) You only have 200 points, because the recursion is only one level deep, Because the recursion is only one level deep, you multiply the contributions of each point with du, instead of du*du.
I think, starting with m=0 will solve both problems

willem2 said:
For M=2, you should integrate over 0 x1 < 1, x2 > x1, x2> 2-x1. which has an area of 1.
Each of the points that you sum over represents an area of du*du. since du = 0.01 there should be about 10000 points to sum. (ignoring the triangular areas at the edges for small du) You only have 200 points, because the recursion is only one level deep, Because the recursion is only one level deep, you multiply the contributions of each point with du, instead of du*du.
I think, starting with m=0 will solve both problems

I tried m=0, but didn't fix the issue. I think I forgot to incorporate the max in the recursion. I did the following

Python:
def G(y, m, prev_lim):
#print(f'h is {h}, and k is {k}')
if m == M:
res = F(y)
else:
res =  0
u = prev_lim
while u < max(prev_lim, y):
res += G(y-u, m+1, u)*f(u)*du
u += du
return (math.factorial(K)/math.factorial(K-M))*res

But it didn't work as well!

It looks to me like the multiplication by factorials, in the last line of function G, will happen for every level of recursion, whereas it should only happen once. To fix that, take that multiplication out of the G function and put it in the last line of the main code, which becomes
Code:
print((math.factorial(K) / math.factorial(K-M)) * G(2, 1, 0))

If you still have accuracy problems, you could consider using Simpson integration rather than the rectangular integration that is currently used.

If it runs for too long, there are other techniques that could be tried along the lines of what we've looked at earlier.

EngWiPy
andrewkirk said:
It looks to me like the multiplication by factorials, in the last line of function G, will happen for every level of recursion, whereas it should only happen once. To fix that, take that multiplication out of the G function and put it in the last line of the main code, which becomes
Code:
print((math.factorial(K) / math.factorial(K-M)) * G(2, 1, 0))

If you still have accuracy problems, you could consider using Simpson integration rather than the rectangular integration that is currently used.

If it runs for too long, there are other techniques that could be tried along the lines of what we've looked at earlier.

Thanks. You are right. The factorial fraction should be multiplied by the final result once at the end. Doing this started to give results between 0 and 1. That is a progress. But relative to the Monte-Carlo simulations, which is found as following

Python:
count = 0
N=100000
for n in range(0, N):
X = np.random.exponential(1, (1, K))
Y = np.random.exponential(1, (1, K))
Z = X/Y
Z = np.sort(Z[0])
if np.sum(Z[:M]) <= 2:
count += 1

print(count/N)
the results are not accurate. Is calling the function G(2, 1, 0) is correct, or we need to call G(2, 0, 0)? Also I am not sure about this segment

Python:
while u < max(prev_lim, y):
res += G(y-u, m+1, u)*f(u)*du

Is the condition of the while loop correct? Because I choose the max in the while loop condition, I don't have to choose the max in the subsequent calling of G, right?

Re your two questions, your code looks to me as though it implements the formula in the OP reliably.

Re the comparison to simulation results, what are you getting as the results from the numeric integration and the simulation? Note that the comparison assumes that the formula in the OP is correct (see last para below).

Since either of those could have an error, why not calculate an analytic value to compare it to? As I recall, when we tried to calculate the integral analytically, it worked for a two-deep nesting, then started to look too hard. Since your nesting above is only two-deep, why not dig out the analytic formula you got for a two-deep nest, calculate the value using that, and compare it to the numeric and simulation values to see which is more accurate?

Also, I am not sure the latex formula in the OP is correct. Can you link to where you derived it? I vaguely recall that the inner integral should have a factor that is a CDF of the inner integation variable, raised to some power, and another factor that is the PDF of the inner integration variable. But the version in the OP only has the CDF factor.

Last edited:
andrewkirk said:
Re your two questions, your code looks to me as though it implements the formula in the OP reliably.

Re the comparison to simulation results, what are you getting as the results from the numeric integration and the simulation? Note that the comparison assumes that the formula in the OP is correct (see last para below).

Since either of those could have an error, why not calculate an analytic value to compare it to? As I recall, when we tried to calculate the integral analytically, it worked for a two-deep nesting, then started to look too hard. Since your nesting above is only two-deep, why not dig out the analytic formula you got for a two-deep nest, calculate the value using that, and compare it to the numeric and simulation values to see which is more accurate?

Also, I am not sure the latex formula in the OP is correct. Can you link to where you derived it? I vaguely recall that the inner integral should have a factor that is a CDF of the inner integation variable, raised to some power, and another factor that is the PDF of the inner integration variable. But the version in the OP only has the CDF factor.

I did derive the analytic result for ##M=2##. So, I compared the analytic results, with the numerical integration and the MC simulations. Both the analytic result and the MC coincide, while the numerical integration is way off. To be specific, for ##K=4##, ##M=2##, and ##y=2##, the analytical and MC results are 0.8105581 and 0.8114, respectively, while the numerical integration result is 0.14021.

OK, we didn't actually talked about this case. Here I have ##K## random variables ##X_1,\,X_2,\,\ldots,\,X_K##, where ##X_k=A_k/B_k##, where ##A_k## and ##B_k## are exponential RV with parameter 1 (all RVs are independent). Then I order the RVs as

$$X_{(1)}\leq X_{(2)}\leq\cdots X_{(K)}$$

and form the new RV ##Y## as the summation of the ##M## smallest random variables as

$$Y=\sum_{m=1}^MX_{(m)}$$

I want to find the CDF of ##Y##, which is

$$F_Y(y)=\text{Pr}\left[\sum_{m=1}^MX_{(m)}\leq y\right]= \int_{x_1=0}^{y}\int_{x_{2}=x_1}^{\max(x_1,\,y-x_1)}\cdots \int_{x_M=x_{M-1}}^{\max(x_{M-1},\,y-\sum_{m=1}^{M-1}x_m)}f_{(1),(2),\ldots,(M)}(x_1,\,x_2,\,\ldots,\,x_M)\,dx_m\,dx_{M-1}\cdots dx_1$$

where

$$f_{(1),(2),\ldots,(M)}(x_1,\,x_2,\,\ldots,\,x_M)=\frac{K!}{(K-M)!}\left[1-F_X(x_M)\right]^{K-M}\prod_{m=1}^Mf_X(x_m)$$

is the joint PDF of the order statistics ##\{X_{(m)}\}_{m=1}^M##, and ##F_X(x_k)=1-1/(1+x_k)## and ##f_X(x_k)=1/(1+x_k)^2## are the CDF and PDF of the random variable ##X_k##. All of this yields eventually to the formula I posted in the original post. Is this correct?

PS: if I evaluate ##1-F_Y(y)##, I get closer results to the analytic and MC simulations, but I am not sure if this is a coincidence, because my derivations is for ##F_Y(y)##. and not ##1-F_Y(y)##!

EngWiPy said:
Python:
while u < max(prev_lim, y):
res += G(y-u, m+1, u)*f(u)*du

Is the condition of the while loop correct? Because I choose the max in the while loop condition, I don't have to choose the max in the subsequent calling of G, right?
Well the use of "max" function looks suspicious. It seems to do nothing in your code. Perhaps you meant something like
Python:
while u < y-prev_lim:
but this will make the result even smaller so there's got to be something else.
Maybe make du=0.5 or 0.25 and trace all evaluations? (I assume Python cannot be debugged. If you can then it should be easy to find the mistake).

SlowThinker said:
Well the use of "max" function looks suspicious. It seems to do nothing in your code. Perhaps you meant something like
Python:
while u < y-prev_lim:
but this will make the result even smaller so there's got to be something else.
Maybe make du=0.5 or 0.25 and trace all evaluations? (I assume Python cannot be debugged. If you can then it should be easy to find the mistake).

The max in the while loop implements the max in the upper limit of the inner integrals. I think it is correct as it is. If we remove it, the upper limit will be only y-prev_lim, which isn't always greater than prev_lim.

EngWiPy said:
The max in the while loop implements the max in the upper limit of the inner integrals. I think it is correct as it is. If we remove it, the upper limit will be only y-prev_lim, which isn't always greater than prev_lim.
Yes but you start from prev_lim anyway and go only up. So the max function does not add any cycles of the while loop.

SlowThinker said:
Yes but you start from prev_lim anyway and go only up. So the max function does not add any cycles of the while loop.

Ooops, the while loop condition should be

Python:
while u < max(prev_lim, y-prev_lim):

It is correct on my computer, but editing it here I forgot it. Does it make sense now?

EngWiPy said:
Ooops, the while loop condition should be

Python:
while u < max(prev_lim, y-prev_lim):

It is correct on my computer, but editing it here I forgot it. Does it make sense now?
It's still the same. You should get the same result with the max, or just u<y-prev_lim.

SlowThinker said:
It's still the same. You should get the same result with the max, or just u<y-prev_lim.

Actually, I do, but why? Where is the max limit, and how to implement it effectively. y-prev_lim alone doesn't sound right, and indeed the result is still not consistent with Monte-Carlo simulations!

EngWiPy said:
Actually, I do, but why? Where is the max limit, and how to implement it effectively. y-prev_lim alone doesn't sound right, and indeed the result is still not consistent with Monte-Carlo simulations!
Maybe you need the max in mathematics because ##\int_1^0 x\,dx## is valid, but the code will return 0 if bottom bound is higher than upper bound.

Anyway the code looks about right, there's probably some m+1 instead of m or something similar. The best way to find it is by debugging. A worse way is to integrate function f(x)=1 and if it works, move to the full function in small steps.
It might help if you rewrite the integral expression to the form you implemented, with ##\frac{1}{(1+x_m)^2}## terms moved outside the integrals.

SlowThinker said:
I assume Python cannot be debugged.
Not true. There's a debugger that comes with Python distributions. It's pretty primitive, but it's much better than no debugger at all. I wrote a couple of Insights articles on how to use it --

Mark44 said:
Not true. There's a debugger that comes with Python distributions. It's pretty primitive
Uh huh. In that case it might be actually easier to use debugging prints (with a large du of course).

I don't know how to use the debugger.

I think I know what the problem is. In my code, once ##m=M##, the function ##G## returns ##1/(1+x)^{K-M+2}## only for the first value of the most inner integral, while it should computer ##1/(1+x)^{K-M+2}## for all values between ##x_{M-1}## up to ##\max(x_{M-1}, y-\sum_{m=1}^{M-1}x_m)##. Also, in the recursion, the first argument should be the same as the upper limit of the while loop, and not ##y-u##. So, the code must be something like this

Python:
K = 4
M = 2
du =0.001
N = 100000
def F(u):
return 1/(1+u)**(K-M+2)
#Define the PDF of U
def f(u):
return 1/((1+u))**2

def G(h, m, prev_lim):
#print(f'h is {h}, and k is {k}')
if m == M:
res = 0
u = prev_lim
while u <= max(prev_lim, h-prev_lim):
res += F(u)*du
u += du
else:
res =  0
u = prev_lim
#while u <= h-prev_lim:#max(prev_lim, ):
while u <= max(prev_lim, h-prev_lim):
res += G(max(prev_lim, h-prev_lim), m+1, u)*f(u)*du
u += du
return res
#(math.factorial(K)/math.factorial(K-M))*

print((math.factorial(K)/math.factorial(K-M))*G(2, 1, 0))

# Simulation
count = 0
for n in range(0, N):
X = np.random.exponential(1, (1, K))
Y = np.random.exponential(1, (1, K))
Z = X/Y
Z = np.sort(Z[0])
if np.sum(Z[:M]) <= 2:
count += 1

print(count/N)

In this case the numerical integration gives 0.819, while the simulations give 0.8121, which is very close.

SlowThinker said:
Uh huh. In that case it might be actually easier to use debugging prints (with a large du of course).
It's not so primitive that extra print statements are necessary. You can set breakpoints and single-step through the code, query for the values of variables, and lots of other operations.

Specifying ##du=0.001## as an absolute value, regardless of ##y## is not practical, because if ##y## is very large, the numerical integration would take forever to evaluate, as the number of points would be huge. Instead of specifying the step size ##du## in the numerical integration, I can specify the number of areas/points ##N##, and then calculate ##du=y/N##, and then pass it as an argument to the recursive function. Does this make sense?

EngWiPy said:
Specifying ##du=0.001## as an absolute value, regardless of ##y## is not practical, because if ##y## is very large, the numerical integration would take forever to evaluate, as the number of points would be huge. Instead of specifying the step size ##du## in the numerical integration, I can specify the number of areas/points ##N##, and then calculate ##du=y/N##, and then pass it as an argument to the recursive function. Does this make sense?
You may first want to let it run with small du to make sure that it works now, before optimizing.
You can play with du a bit but it will always be a trade between speed and accuracy.
Then you can try Simpson's rule, that's reasonably simple and improves accuracy even with bigger du. Requires constant du though.
If you really want to spend the time on it, you can instead make du smaller where the function changes faster (where u is small).

Is it possible to use the analytical solution for the last 2 levels? After you verify that the algorithm works, of course.

The code is working as I explained in post #19. I guess I can find an analytic solution to the last two levels. I will check Simpson's rule. I may have to go to C++ for faster running time (in the past I got 30x speedup for the same code in C++ as that on Python). Unless there is a way to vectorize my code in Python (or MATLAB), that can speed up the running time significantly, which would be easier for me, as I am not that proficient in C++, and it is not as user-friendly.

$$\int_a^bf(x)\,dx\simeq \frac{\Delta x}{3}\left[y_0+4\sum_{i=0}^{n/2}y_{2i+1}+2\sum_{j=1}^{n/2}y_{2j}\right]$$

where ##y_i=f(a+i\Delta x)##, and ##\Delta x = \frac{b-a}{n}##, where ##n## is even.

So, in my code, I just need to alternate between

Python:
res += 4*G(max(prev_lim, h-prev_lim), m+1, u)*f(u)*(du/3)

and

Python:
res += 2*G(max(prev_lim, h-prev_lim), m+1, u)*f(u)*(du/3)

after the first evaluation, and that's it?

Apparently so

EngWiPy said:
Apparently so
The last sample needs to have weight 1 as well. So it's 1, 4, 2, 4, 2, ..., 2, 4, 1. Also make sure that the number of samples is even (perhaps by tweaking du a bit before the loop, or starting at u=h/2 and going both ways at once).

SlowThinker said:
The last sample needs to have weight 1 as well. So it's 1, 4, 2, 4, 2, ..., 2, 4, 1. Also make sure that the number of samples is even (perhaps by tweaking du a bit before the loop, or starting at u=h/2 and going both ways at once).

Right, the last sample should have a weight of 1, not 2. What do you mean by "going both ways at once"?

EngWiPy said:
Right, the last sample should have a weight of 1, not 2. What do you mean by "going both ways at once"?
I meant something like this
Code:
        res =  0
u = h/2 #prev_lim
while u <= max(prev_lim, h-prev_lim):
res += G(max(prev_lim, h-prev_lim), m+1, u)*f(u)*du
res += G(max(prev_lim, h-prev_lim), m+1, h-u)*f(h-u)*du
u += du
but now it doesn't seem like a good idea. You'd still have to make sure that the number of cycles is even, and you'd count the middle sample twice.

So just tweak the du with something like this
Code:
        if 2*prev_lim < h:
num_cyc=((h-2*prev_lim) // du) | 1
for cyc=range(0, num_cyc)
u=prev_lim+du*cyc/num_cyc
res+= as before
It may need special case for num_cyc==1.
There are always rounding errors when dividing floating-point numbers, that's why I recommend using an integer to guide the loop.

It might be that if you make du a negative power of 2 (say 1/512) you'll automatically always get even number of steps. That would be the easiest and most accurate way to do this.

EngWiPy
Now I have this numerical integration

$$K(K-2)\int_{y_1=0}^{\infty}\int_{x_1=x_0}^{\max(x_0,\,y_1z_0)}\left[\int_{y_2=0}^{\infty}\int_{x_2=x_1}^{\max(x_1,\,y_2z_1)}\left[1-F_X(x_2)\right]^{K-2}f_X(x_2)\,dx_2\,dy_2\right]f_X(x_1)\,f_Y(y_1)\,dx_1\,dy_1$$

where ##z_0=z## and ##x_0=0##. I wrote this as a recursive function as

$$G_{m-1}(z_{m-2})=\int_{y_{m-1}=0}^{\infty}\left[\int_{x_{m-1}=x_{m-2}}^{\max(x_{m-2},\,y_{m-1}\,z_{m-2})}G_{m}(z_{m-2}-\frac{x_{m-1}}{y_{m-1}})\,f_X(x_{m-1})dx_{m-1}\right]f_Y(y_{m-1})\,dy_{m-1}$$

where

$$G_M(z_{M-1})=\int_{y_{M}=0}^{\infty}\left[\int_{x_{M}=x_{M-1}}^{\max(x_{M-1},\,y_{M}\,z_{M-1})}\left[1-F_X(x_M)\right]^{K-M}\,f_X(x_{M})dx_{M}\right]f_Y(y_{M})\,dy_{M}$$

where ##F_X(x)=1-\exp(-x)##, ##f_X(x)=\exp(-x)##, and ##f_Y(y)=\exp(-y)##. The code for this

Python:
import math

#Define some constants
K = 3
M = 2
dx = 0.01
dy = 0.01
#Intead the upper bound infinity over ys
y_lim = 50

#[1-FX(x)]^{(K-M)}
def F(x):
return math.exp(-x*(K-M))

#fX(x) or fY(y)
def f(u):
return math.exp(-u)

def G(z, m, x_prev_lim):
res = 0
#The lower limit of the integrations over x
ux = x_prev_lim
uy = 0
while uy < y_lim:
uy += dy
resx = 0
while ux <= max(x_prev_lim, uy*z):
ux += dx
if m == M:
resx += F(ux)*f(ux)*dx
else:
resx += G(z - (ux/uy), m+1, ux)*f(ux)*dx
res += resx*f(uy)*dy
return res

(math.factorial(K)/math.factorial(K-M))*G(1, 1, 0)

But it yields a different result than the MC simulations, which is 0.0512. Why?

Last edited:
In the first formula, is it missing ##f_Y(y_2)## somewhere? Where do the ##z_i## come from? Where does the x/y in the second formula come from?
What result do you get, and how it changes if you use smaller dx & dy?
Do you know if the MC simulation is correct?
But seriously, if you want to make programs, you need to learn to work without mistakes. Start from the first equation and rewrite it in small steps until you get a program. Rewriting such a complex formula into a program in one step and hoping for the best, isn't going to work.

Yes, there is a missing ##f_Y(y_2)## in the first formula. ##z_{m-1}=z_{m-2}-\sum_{l=1}^{m-1}\frac{x_l}{y_l}## for ##m=2,\,\ldots,\,M## and ##z_0=z##, which is passed as parameter. I get ##1.44\times 10^{-6}## from the program. The MC simulations is tractable, and the results make more sense. Basically, I want to evaluate

$$\text{Pr}\left[\frac{X_{(1)}}{Y_1}+\frac{X_{(2)}}{Y_2}\leq z\right]$$

where (##K\geq 2##)

$$X_{(1)}\leq X_{(2)} \leq \cdots X_{(K)}$$

are the order statistics of the RVs ##\{X_k\}_{k=1}^K##, which are i.i.d. exponential random variables with parameter 1, and ##Y_1## and ##Y_2## are also i.i.d. exponential random variables with parameter 1.

Setting dx and or dy smaller that 0.01, makes the running time significantly longer.

I think there is a problem in the logic of the program, and probably something to do with the argument ##z-ux/uy##, but not sure what it is!

What other forums I can use to get help on this? I tried Stack Overflow, but didn't get help there!

EngWiPy said:
but didn't get help there!
As the saying goes, you'll find the helping hand at the end of your arm.
After 5 minutes of debugging, it seems that you need to reset ux to x_prev_lim inside the uy-loop rather than before it.
Now I'm getting 1.999329 with the code you posted, or 1.999993 with dx & dy=0.001. Are you using K=3, M=2?

But seriously, get rid of z and start from the first formula again.
In programming, you can't just wave your hands and hope the program will work. Move in small, careful steps.

StoneTemplePython
Oh, that is right. ux must start from x_prev_lim for each uy value. This is a CDF, and the value should be between 0 and 1. In the simulations I am using this code

Python:
import numpy as np
count = 0
N=100000
for n in range(0, N):
X = np.random.exponential(1, (1, K))
Y = np.random.exponential(1, (1, K))
X=np.sort(X)
Z = X/Y
if np.sum(Z[:M]) <= 1:
count += 1
print(count/N)

and yes for K=3, and M=2.

I am trying to run the code now for dx=dy=0.001, but it has been more than 90 minutes and it is still not finished. How long did the program take on your machine? Mine is a bit old (i5@2.4 with 6GB RAM).

EngWiPy said:
I am trying to run the code now for dx=dy=0.001, but it has been more than 90 minutes and it is still not finished. How long did the program take on your machine? Mine is a bit old (i5@2.4 with 6GB RAM).
I've rewritten your code to C. It took like 20 seconds to run with 0.001 .

I'm not all that good with statistics but it seems the sort function is somehow missing in the other version. Are you sure that the first formula in #29 is correct?

• Programming and Computer Science
Replies
18
Views
1K
• Programming and Computer Science
Replies
5
Views
2K
• Programming and Computer Science
Replies
15
Views
1K
• Programming and Computer Science
Replies
8
Views
782
• Programming and Computer Science
Replies
2
Views
2K
• Programming and Computer Science
Replies
6
Views
1K
• Programming and Computer Science
Replies
55
Views
4K
• Programming and Computer Science
Replies
1
Views
679
• Programming and Computer Science
Replies
1
Views
761
• Programming and Computer Science
Replies
5
Views
2K