# Numerical Integration in Python

• Python

## Main Question or Discussion Point

I want to find the numerical solution to the following nested integral in Python

$$\frac{K!}{(K-M)!}\int_{x_1=0}^{y}\int_{x_{2}=x_1}^{\max(x_1,\,y-x_1)}\cdots \int_{x_M=x_{M-1}}^{\max(x_{M-1},\,y-\sum_{m=1}^{M-1}x_m)} \frac{1}{(1+x_M)^{K-M+2}}\prod_{m=1}^{M-1} \frac{1}{(1+x_m)^2}\,dx_1\,dx_2\cdots\,dx_M$$

To that end, I did the following

Python:
import math
K=4
M=2
du = 0.01
def F(u):
return 1/(1+u)**(K-M+2)
#Define the PDF of U
def f(u):
return 1/((1+u))**2

def G(y, m, prev_lim):
#print(f'h is {h}, and k is {k}')
if m == M:
res = F(y)
else:
res =  0
u = prev_lim
while u < y:
res += G(y-u, m+1, u)*f(u)*du
u += du
return (math.factorial(K)/math.factorial(K-M))*res

print(G(2, 1, 0))
The above integral is the CDF of a random variable, but the result isn't between 0 and 1. So, I guess there is an error, but where?

Thanks

Last edited:

Related Programming and Computer Science News on Phys.org
The above integral is the CDF of a random variable, but the result isn't between 0 and 1. So, I guess there is an error, but where?
I can find 2 related errors.
- With M = 2, you have a 2 dimensional integral. How big is the area that you integrate over? how many points are in this area? It's useful to let your program count this, I get only 200 if I increase a global variable count for each computation of F(y).
- Each of the points represents an area of size (du)2 for a double integral.

I can find 2 related errors.
- With M = 2, you have a 2 dimensional integral. How big is the area that you integrate over? how many points are in this area? It's useful to let your program count this, I get only 200 if I increase a global variable count for each computation of F(y).
- Each of the points represents an area of size (du)2 for a double integral.
I don't think I understand what you mean, and how these are errors, and how to fix them. Here M can be any positive integer between 1 and K. M=2 is just a case scenario for testing. Could you elaborate more on the errors, and how to fix them?

I don't think I understand what you mean, and how these are errors, and how to fix them. Here M can be any positive integer between 1 and K. M=2 is just a case scenario for testing. Could you elaborate more on the errors, and how to fix them?
For M=2, you should integrate over 0 x1 < 1, x2 > x1, x2> 2-x1. wich has an area of 1.
Each of the points that you sum over represents an area of du*du. since du = 0.01 there should be about 10000 points to sum. (ignoring the triangular areas at the edges for small du) You only have 200 points, because the recursion is only one level deep, Because the recursion is only one level deep, you multiply the contributions of each point with du, instead of du*du.
I think, starting with m=0 will solve both problems

For M=2, you should integrate over 0 x1 < 1, x2 > x1, x2> 2-x1. wich has an area of 1.
Each of the points that you sum over represents an area of du*du. since du = 0.01 there should be about 10000 points to sum. (ignoring the triangular areas at the edges for small du) You only have 200 points, because the recursion is only one level deep, Because the recursion is only one level deep, you multiply the contributions of each point with du, instead of du*du.
I think, starting with m=0 will solve both problems
I tried m=0, but didn't fix the issue. I think I forgot to incorporate the max in the recursion. I did the following

Python:
def G(y, m, prev_lim):
#print(f'h is {h}, and k is {k}')
if m == M:
res = F(y)
else:
res =  0
u = prev_lim
while u < max(prev_lim, y):
res += G(y-u, m+1, u)*f(u)*du
u += du
return (math.factorial(K)/math.factorial(K-M))*res
But it didn't work as well!!

andrewkirk
Homework Helper
Gold Member
It looks to me like the multiplication by factorials, in the last line of function G, will happen for every level of recursion, whereas it should only happen once. To fix that, take that multiplication out of the G function and put it in the last line of the main code, which becomes
Code:
print((math.factorial(K) / math.factorial(K-M)) * G(2, 1, 0))
If you still have accuracy problems, you could consider using Simpson integration rather than the rectangular integration that is currently used.

If it runs for too long, there are other techniques that could be tried along the lines of what we've looked at earlier.

It looks to me like the multiplication by factorials, in the last line of function G, will happen for every level of recursion, whereas it should only happen once. To fix that, take that multiplication out of the G function and put it in the last line of the main code, which becomes
Code:
print((math.factorial(K) / math.factorial(K-M)) * G(2, 1, 0))
If you still have accuracy problems, you could consider using Simpson integration rather than the rectangular integration that is currently used.

If it runs for too long, there are other techniques that could be tried along the lines of what we've looked at earlier.
Thanks. You are right. The factorial fraction should be multiplied by the final result once at the end. Doing this started to give results between 0 and 1. That is a progress. But relative to the Monte-Carlo simulations, which is found as following

Python:
count = 0
N=100000
for n in range(0, N):
X = np.random.exponential(1, (1, K))
Y = np.random.exponential(1, (1, K))
Z = X/Y
Z = np.sort(Z[0])
if np.sum(Z[:M]) <= 2:
count += 1

print(count/N)

the results are not accurate. Is calling the function G(2, 1, 0) is correct, or we need to call G(2, 0, 0)? Also I am not sure about this segment

Python:
while u < max(prev_lim, y):
res += G(y-u, m+1, u)*f(u)*du
Is the condition of the while loop correct? Because I choose the max in the while loop condition, I don't have to choose the max in the subsequent calling of G, right?

andrewkirk
Homework Helper
Gold Member
Re your two questions, your code looks to me as though it implements the formula in the OP reliably.

Re the comparison to simulation results, what are you getting as the results from the numeric integration and the simulation? Note that the comparison assumes that the formula in the OP is correct (see last para below).

Since either of those could have an error, why not calculate an analytic value to compare it to? As I recall, when we tried to calculate the integral analytically, it worked for a two-deep nesting, then started to look too hard. Since your nesting above is only two-deep, why not dig out the analytic formula you got for a two-deep nest, calculate the value using that, and compare it to the numeric and simulation values to see which is more accurate?

Also, I am not sure the latex formula in the OP is correct. Can you link to where you derived it? I vaguely recall that the inner integral should have a factor that is a CDF of the inner integation variable, raised to some power, and another factor that is the PDF of the inner integration variable. But the version in the OP only has the CDF factor.

Last edited:
Re your two questions, your code looks to me as though it implements the formula in the OP reliably.

Re the comparison to simulation results, what are you getting as the results from the numeric integration and the simulation? Note that the comparison assumes that the formula in the OP is correct (see last para below).

Since either of those could have an error, why not calculate an analytic value to compare it to? As I recall, when we tried to calculate the integral analytically, it worked for a two-deep nesting, then started to look too hard. Since your nesting above is only two-deep, why not dig out the analytic formula you got for a two-deep nest, calculate the value using that, and compare it to the numeric and simulation values to see which is more accurate?

Also, I am not sure the latex formula in the OP is correct. Can you link to where you derived it? I vaguely recall that the inner integral should have a factor that is a CDF of the inner integation variable, raised to some power, and another factor that is the PDF of the inner integration variable. But the version in the OP only has the CDF factor.
I did derive the analytic result for $M=2$. So, I compared the analytic results, with the numerical integration and the MC simulations. Both the analytic result and the MC coincide, while the numerical integration is way off. To be specific, for $K=4$, $M=2$, and $y=2$, the analytical and MC results are 0.8105581 and 0.8114, respectively, while the numerical integration result is 0.14021.

OK, we didn't actually talked about this case. Here I have $K$ random variables $X_1,\,X_2,\,\ldots,\,X_K$, where $X_k=A_k/B_k$, where $A_k$ and $B_k$ are exponential RV with parameter 1 (all RVs are independent). Then I order the RVs as

$$X_{(1)}\leq X_{(2)}\leq\cdots X_{(K)}$$

and form the new RV $Y$ as the summation of the $M$ smallest random variables as

$$Y=\sum_{m=1}^MX_{(m)}$$

I want to find the CDF of $Y$, which is

$$F_Y(y)=\text{Pr}\left[\sum_{m=1}^MX_{(m)}\leq y\right]= \int_{x_1=0}^{y}\int_{x_{2}=x_1}^{\max(x_1,\,y-x_1)}\cdots \int_{x_M=x_{M-1}}^{\max(x_{M-1},\,y-\sum_{m=1}^{M-1}x_m)}f_{(1),(2),\ldots,(M)}(x_1,\,x_2,\,\ldots,\,x_M)\,dx_m\,dx_{M-1}\cdots dx_1$$

where

$$f_{(1),(2),\ldots,(M)}(x_1,\,x_2,\,\ldots,\,x_M)=\frac{K!}{(K-M)!}\left[1-F_X(x_M)\right]^{K-M}\prod_{m=1}^Mf_X(x_m)$$

is the joint PDF of the order statistics $\{X_{(m)}\}_{m=1}^M$, and $F_X(x_k)=1-1/(1+x_k)$ and $f_X(x_k)=1/(1+x_k)^2$ are the CDF and PDF of the random variable $X_k$. All of this yields eventually to the formula I posted in the original post. Is this correct?

PS: if I evaluate $1-F_Y(y)$, I get closer results to the analytic and MC simulations, but I am not sure if this is a coincidence, because my derivations is for $F_Y(y)$. and not $1-F_Y(y)$!!

Python:
while u < max(prev_lim, y):
res += G(y-u, m+1, u)*f(u)*du
Is the condition of the while loop correct? Because I choose the max in the while loop condition, I don't have to choose the max in the subsequent calling of G, right?
Well the use of "max" function looks suspicious. It seems to do nothing in your code. Perhaps you meant something like
Python:
while u < y-prev_lim:
but this will make the result even smaller so there's got to be something else.
Maybe make du=0.5 or 0.25 and trace all evaluations? (I assume Python cannot be debugged. If you can then it should be easy to find the mistake).

Well the use of "max" function looks suspicious. It seems to do nothing in your code. Perhaps you meant something like
Python:
while u < y-prev_lim:
but this will make the result even smaller so there's got to be something else.
Maybe make du=0.5 or 0.25 and trace all evaluations? (I assume Python cannot be debugged. If you can then it should be easy to find the mistake).
The max in the while loop implements the max in the upper limit of the inner integrals. I think it is correct as it is. If we remove it, the upper limit will be only y-prev_lim, which isn't always greater than prev_lim.

The max in the while loop implements the max in the upper limit of the inner integrals. I think it is correct as it is. If we remove it, the upper limit will be only y-prev_lim, which isn't always greater than prev_lim.
Yes but you start from prev_lim anyway and go only up. So the max function does not add any cycles of the while loop.

Yes but you start from prev_lim anyway and go only up. So the max function does not add any cycles of the while loop.
Ooops, the while loop condition should be

Python:
while u < max(prev_lim, y-prev_lim):
It is correct on my computer, but editing it here I forgot it. Does it make sense now?

Ooops, the while loop condition should be

Python:
while u < max(prev_lim, y-prev_lim):
It is correct on my computer, but editing it here I forgot it. Does it make sense now?
It's still the same. You should get the same result with the max, or just u<y-prev_lim.

It's still the same. You should get the same result with the max, or just u<y-prev_lim.
Actually, I do, but why? Where is the max limit, and how to implement it effectively. y-prev_lim alone doesn't sound right, and indeed the result is still not consistent with Monte-Carlo simulations!!

Actually, I do, but why? Where is the max limit, and how to implement it effectively. y-prev_lim alone doesn't sound right, and indeed the result is still not consistent with Monte-Carlo simulations!!
Maybe you need the max in mathematics because $\int_1^0 x\,dx$ is valid, but the code will return 0 if bottom bound is higher than upper bound.

Anyway the code looks about right, there's probably some m+1 instead of m or something similar. The best way to find it is by debugging. A worse way is to integrate function f(x)=1 and if it works, move to the full function in small steps.
It might help if you rewrite the integral expression to the form you implemented, with $\frac{1}{(1+x_m)^2}$ terms moved outside the integrals.

Mark44
Mentor
I assume Python cannot be debugged.
Not true. There's a debugger that comes with Python distributions. It's pretty primitive, but it's much better than no debugger at all. I wrote a couple of Insights articles on how to use it --

Not true. There's a debugger that comes with Python distributions. It's pretty primitive
Uh huh. In that case it might be actually easier to use debugging prints (with a large du of course).

I don't know how to use the debugger.

I think I know what the problem is. In my code, once $m=M$, the function $G$ returns $1/(1+x)^{K-M+2}$ only for the first value of the most inner integral, while it should computer $1/(1+x)^{K-M+2}$ for all values between $x_{M-1}$ up to $\max(x_{M-1}, y-\sum_{m=1}^{M-1}x_m)$. Also, in the recursion, the first argument should be the same as the upper limit of the while loop, and not $y-u$. So, the code must be something like this

Python:
K = 4
M = 2
du =0.001
N = 100000
def F(u):
return 1/(1+u)**(K-M+2)
#Define the PDF of U
def f(u):
return 1/((1+u))**2

def G(h, m, prev_lim):
#print(f'h is {h}, and k is {k}')
if m == M:
res = 0
u = prev_lim
while u <= max(prev_lim, h-prev_lim):
res += F(u)*du
u += du
else:
res =  0
u = prev_lim
#while u <= h-prev_lim:#max(prev_lim, ):
while u <= max(prev_lim, h-prev_lim):
res += G(max(prev_lim, h-prev_lim), m+1, u)*f(u)*du
u += du
return res
#(math.factorial(K)/math.factorial(K-M))*

print((math.factorial(K)/math.factorial(K-M))*G(2, 1, 0))

# Simulation
count = 0
for n in range(0, N):
X = np.random.exponential(1, (1, K))
Y = np.random.exponential(1, (1, K))
Z = X/Y
Z = np.sort(Z[0])
if np.sum(Z[:M]) <= 2:
count += 1

print(count/N)
In this case the numerical integration gives 0.819, while the simulations give 0.8121, which is very close.

Mark44
Mentor
Uh huh. In that case it might be actually easier to use debugging prints (with a large du of course).
It's not so primitive that extra print statements are necessary. You can set breakpoints and single-step through the code, query for the values of variables, and lots of other operations.

Specifying $du=0.001$ as an absolute value, regardless of $y$ is not practical, because if $y$ is very large, the numerical integration would take forever to evaluate, as the number of points would be huge. Instead of specifying the step size $du$ in the numerical integration, I can specify the number of areas/points $N$, and then calculate $du=y/N$, and then pass it as an argument to the recursive function. Does this make sense?

Specifying $du=0.001$ as an absolute value, regardless of $y$ is not practical, because if $y$ is very large, the numerical integration would take forever to evaluate, as the number of points would be huge. Instead of specifying the step size $du$ in the numerical integration, I can specify the number of areas/points $N$, and then calculate $du=y/N$, and then pass it as an argument to the recursive function. Does this make sense?
You may first want to let it run with small du to make sure that it works now, before optimizing.
You can play with du a bit but it will always be a trade between speed and accuracy.
Then you can try Simpson's rule, that's reasonably simple and improves accuracy even with bigger du. Requires constant du though.
If you really want to spend the time on it, you can instead make du smaller where the function changes faster (where u is small).

Is it possible to use the analytical solution for the last 2 levels? After you verify that the algorithm works, of course.

The code is working as I explained in post #19. I guess I can find an analytic solution to the last two levels. I will check Simpson's rule. I may have to go to C++ for faster running time (in the past I got 30x speedup for the same code in C++ as that on Python). Unless there is a way to vectorize my code in Python (or MATLAB), that can speed up the running time significantly, which would be easier for me, as I am not that proficient in C++, and it is not as user-friendly.

$$\int_a^bf(x)\,dx\simeq \frac{\Delta x}{3}\left[y_0+4\sum_{i=0}^{n/2}y_{2i+1}+2\sum_{j=1}^{n/2}y_{2j}\right]$$

where $y_i=f(a+i\Delta x)$, and $\Delta x = \frac{b-a}{n}$, where $n$ is even.

So, in my code, I just need to alternate between

Python:
res += 4*G(max(prev_lim, h-prev_lim), m+1, u)*f(u)*(du/3)
and

Python:
res += 2*G(max(prev_lim, h-prev_lim), m+1, u)*f(u)*(du/3)
after the first evaluation, and that's it?

Apparently so