# Solving inverse binomial equation

• EngWiPy
In summary: j 2.03653318e-05 +6.07274812e-06j 2.29999998e-05 +0.00000000e+00j 2.29999998e-05 +0.00000000e+00j 2.03653320e-05 +6.07274812e-06j 2.03653318e-05 -6.07274812e-06j 1.30650356e-05 +1.89286845e-05j 1.30650365e-05 -1.89286836e-05j 2.771
EngWiPy
Hello all,

I have this equation

$$\sum_{k=\lceil \frac{n}{2}\rceil}^n{n\choose k}P^k\left(1-P\right)^{n-k}=\epsilon$$

and I want to find P as a function of epsilon and n. Can I do that? If so, then how?

Note: $$\epsilon<10^{-3}$$ if it helps for any possible approximation.

Thanks

I doubt if there is any exact solution. However for small $\epsilon$, P will be small, so the first term is a good guess, and the second term an error estimate.

Isn't this the binomial expansion of ##(P+(1-P))^n =1^n## from [n/2] to n?

Right, and recall that the binomial distribution is symmetric around ##n/2## (or similar value closeby), and the binomial theorem can be used to solve this.

I looked at this. You can use an equation of that same form I think to express the (1-p)^w part, so it becomes pretty clear that your going to end up with some big polynomial in terms of P that needs to be solved. So it does have a solution, at least in the complex plane. Finding it seems tedious, possibly a good one for a computer program.

Fooality said:
So it does have a solution, at least in the complex plane.

Not if the polynomial is constant, which is what WWGD said. Of course, you don't have the full sum, only half of it, so you need to adjust that a bit.

Fooality said:
I looked at this. You can use an equation of that same form I think to express the (1-p)^w part, so it becomes pretty clear that your going to end up with some big polynomial in terms of P that needs to be solved. So it does have a solution, at least in the complex plane. Finding it seems tedious, possibly a good one for a computer program.

I thought about this. Is there any computer program will solve the roots as a function of n (epsilon is a given but I need to optimize n later). I know MATLAB can solve the roots if the coefficients are determined.

Thanks

S_David said:

I thought about this. Is there any computer program will solve the roots as a function of n (epsilon is a given but I need to optimize n later). I know MATLAB can solve the roots if the coefficients are determined.

Thanks

Yeah, this stuck in my craw because I ran into it before as well, so I actually just wrote up a program to solve it. However, right now its got either bugs, or big numeric problems. I can post what I have now if you want, it gives pretty good solutions for n =16 and epsilon like .5, but with smaller epsilon, there are problems. Do you want me to post it?

Fooality said:
Yeah, this stuck in my craw because I ran into it before as well, so I actually just wrote up a program to solve it. However, right now its got either bugs, or big numeric problems. I can post what I have now if you want, it gives pretty good solutions for n =16 and epsilon like .5, but with smaller epsilon, there are problems. Do you want me to post it?

Yes, if it is OK with you, and if possible the mathematics behind it to understand what is going on. Thanks

S_David said:
Yes, if it is OK with you, and if possible the mathematics behind it to understand what is going on. Thanks
Okay, here's what I got. To run this, you will need Python3 (free) and Scipy for Python3 (free) both are available for Windows, Mac, ad Linux.
Python:
from math import factorial
import numpy as np
from numpy.polynomial import Polynomial as Po
from random import random, randint

#This should run in Python 3 where big integers allowed, otherwise it should be swapped
#out with a smarter implentation, which I couldn't find in scipy for some reason.
def binomial(n, k): return factorial(n)/(factorial(k)*factorial(n-k))

#the original cumulative binomial function we are going to invert.
def cumulative_binomial(p, n): return sum([binomial(n, k)*p**k*(1-p)**(n-k) for k in range(int(n/2), n+1)])

# rephrasing of the same function, for mathematical convenience to see how the below works
def r_cumulative_binomial(p, n):
return sum([sum([binomial(n, k)*binomial(n-k, l)*(-1)**l*p**(l+k) for l in range(n-k+1)]) for k in range(int(n/2), n+1)])

#this returns the cumulative binomial as an array with which we can create a numpy polynomial, summed from s to n
def polynomial_form(n, s):
degree = 0 #holds the highest exponent, to know how to size numpy array
ll = [] #holds the many terms, as tuple (coefficient, integer power)
for k in range(s, n+1):
for l in range(n-k+1):
ll.append((binomial(n, k)*binomial(n-k, l)*(-1)**l, l+k)) #tuple of current term
if l+k > degree: degree= l+k #if biggest seen so far, set degree

out = np.array([complex(i) for i in np.zeros(degree+1)]) #make the array that will become the polynomial
for i in ll: out[i[1]] += i[0] #set its terms
return out

#this solves it, given epsilon, s, and n
def inverse_cumulative_binomial(eps, s, n):
pp = polynomial_form(n, s) #get the polynomial term
pp[0] -= eps #subtract epsilon from it
pp = Po(pp) #covert to numpy polynomial
return pp.roots() #return all its roots

#this filters the solution into real positive numbers
def get_real_positive(l, err=.00001): return [i.real for i in l if abs(i.imag)<err and i.real > 0]

#test it with random p
test_p = randint(1,10)/1000
test_n = randint(4,16)*2
test_s = int(test_n/2)
sol = cumulative_binomial(test_p, test_n)
print ("cumulative_binomial for p=", test_p, " n=",str(test_n),": \n", sol, "\n for same r_cumubino:\n", r_cumulative_binomial(test_p, test_n))ans = inverse_cumulative_binomial(sol, int(test_n/2), test_n)
print ("real positive solutions\n", get_real_positive(ans))
print ("full solution set\n", ans)
sample output:
Code:
>>> ================================ RESTART ================================
>>>
cumulative_binomial for p= 2.3e-05  n= 26 :
5.240825291647823e-54
for same r_cumubino:
5.240825291647822e-54
real positive solutions
[2.2999999804432893e-05]
full solution set
[ -2.23307500e-05 -5.50391444e-06j  -2.23307499e-05 +5.50391451e-06j
-1.72153203e-05 -1.52510079e-05j  -1.72153202e-05 +1.52510077e-05j
-8.15610608e-06 +2.15045885e-05j  -8.15610563e-06 -2.15045885e-05j
2.77180728e-06 -2.28319337e-05j   2.77180756e-06 +2.28319344e-05j
1.30650356e-05 -1.89286845e-05j   1.30650365e-05 +1.89286836e-05j
2.03653320e-05 +1.06888085e-05j   2.03653335e-05 -1.06888091e-05j
2.29999998e-05 +8.61109091e-13j   7.01207665e-01 -2.67618123e-01j
7.01207680e-01 +2.67618132e-01j   8.68461870e-01 -3.13377128e-01j
8.68462405e-01 +3.13377188e-01j   1.00429401e+00 -3.05910115e-01j
1.00429647e+00 +3.05905794e-01j   1.11392292e+00 +2.61165944e-01j
1.11394193e+00 -2.61178577e-01j   1.19529348e+00 +1.89279568e-01j
1.19532199e+00 -1.89223865e-01j   1.24548058e+00 -9.91693272e-02j
1.24558850e+00 +9.91979540e-02j   1.26252050e+00 -6.74453283e-05j]
>>>

edit: This basically puts it in form of a polynomial, then solves the polynomial using numpy's (part of scipy) built in capabilities. Also note that I am calling your function cumulative binomial, which may be a misnomer. And I am solving for any integer s, not just N/2

Last edited:
S_David said:
Yes, if it is OK with you, and if possible the mathematics behind it to understand what is going on. Thanks

So the mathematics behind that are basically seen in the "rephrase" of your original problem:
Python:
def r_cumulative_binomial(p, n):
returnsum([sum([binomial(n, k)*binomial(n-k, l)*(-1)**l*p**(l+k)for l inrange(n-k+1)])for k inrange(int(n/2), n+1)])
(If you don't read Python, know that ** is the same as ^ or exponent in Python ...sorry I suck at LaTex)
Basically (1-p)^n also has the form of your original equation, But with -p replacing p, and 1 replacing (1-P). This breaks down to the term you see above, where p^(l+k) is the polynomial term, and all that junk in front of it is the coefficient . The l comes from a second, inner sigma/summation form l=0 to n-k, for the term (1-p)^n.
So you end up with a bunch of these terms, many have the same integer power, so they have to be added. My code does that to get the polynomial, which I pass to scipy. for a builtin solution. Of course you would be doing the same with matlab.

Fooality said:
Okay, here's what I got. To run this, you will need Python3 (free) and Scipy for Python3 (free) both are available for Windows, Mac, ad Linux.
Python:
from math import factorial
import numpy as np
from numpy.polynomial import Polynomial as Po
from random import random, randint

#This should run in Python 3 where big integers allowed, otherwise it should be swapped
#out with a smarter implentation, which I couldn't find in scipy for some reason.
def binomial(n, k): return factorial(n)/(factorial(k)*factorial(n-k))

#the original cumulative binomial function we are going to invert.
def cumulative_binomial(p, n): return sum([binomial(n, k)*p**k*(1-p)**(n-k) for k in range(int(n/2), n+1)])

# rephrasing of the same function, for mathematical convenience to see how the below works
def r_cumulative_binomial(p, n):
return sum([sum([binomial(n, k)*binomial(n-k, l)*(-1)**l*p**(l+k) for l in range(n-k+1)]) for k in range(int(n/2), n+1)])

#this returns the cumulative binomial as an array with which we can create a numpy polynomial, summed from s to n
def polynomial_form(n, s):
degree = 0 #holds the highest exponent, to know how to size numpy array
ll = [] #holds the many terms, as tuple (coefficient, integer power)
for k in range(s, n+1):
for l in range(n-k+1):
ll.append((binomial(n, k)*binomial(n-k, l)*(-1)**l, l+k)) #tuple of current term
if l+k > degree: degree= l+k #if biggest seen so far, set degree

out = np.array([complex(i) for i in np.zeros(degree+1)]) #make the array that will become the polynomial
for i in ll: out[i[1]] += i[0] #set its terms
return out

#this solves it, given epsilon, s, and n
def inverse_cumulative_binomial(eps, s, n):
pp = polynomial_form(n, s) #get the polynomial term
pp[0] -= eps #subtract epsilon from it
pp = Po(pp) #covert to numpy polynomial
return pp.roots() #return all its roots

#this filters the solution into real positive numbers
def get_real_positive(l, err=.00001): return [i.real for i in l if abs(i.imag)<err and i.real > 0]

#test it with random p
test_p = randint(1,10)/1000
test_n = randint(4,16)*2
test_s = int(test_n/2)
sol = cumulative_binomial(test_p, test_n)
print ("cumulative_binomial for p=", test_p, " n=",str(test_n),": \n", sol, "\n for same r_cumubino:\n", r_cumulative_binomial(test_p, test_n))ans = inverse_cumulative_binomial(sol, int(test_n/2), test_n)
print ("real positive solutions\n", get_real_positive(ans))
print ("full solution set\n", ans)
sample output:
Code:
>>> ================================ RESTART ================================
>>>
cumulative_binomial for p= 2.3e-05  n= 26 :
5.240825291647823e-54
for same r_cumubino:
5.240825291647822e-54
real positive solutions
[2.2999999804432893e-05]
full solution set
[ -2.23307500e-05 -5.50391444e-06j  -2.23307499e-05 +5.50391451e-06j
-1.72153203e-05 -1.52510079e-05j  -1.72153202e-05 +1.52510077e-05j
-8.15610608e-06 +2.15045885e-05j  -8.15610563e-06 -2.15045885e-05j
2.77180728e-06 -2.28319337e-05j   2.77180756e-06 +2.28319344e-05j
1.30650356e-05 -1.89286845e-05j   1.30650365e-05 +1.89286836e-05j
2.03653320e-05 +1.06888085e-05j   2.03653335e-05 -1.06888091e-05j
2.29999998e-05 +8.61109091e-13j   7.01207665e-01 -2.67618123e-01j
7.01207680e-01 +2.67618132e-01j   8.68461870e-01 -3.13377128e-01j
8.68462405e-01 +3.13377188e-01j   1.00429401e+00 -3.05910115e-01j
1.00429647e+00 +3.05905794e-01j   1.11392292e+00 +2.61165944e-01j
1.11394193e+00 -2.61178577e-01j   1.19529348e+00 +1.89279568e-01j
1.19532199e+00 -1.89223865e-01j   1.24548058e+00 -9.91693272e-02j
1.24558850e+00 +9.91979540e-02j   1.26252050e+00 -6.74453283e-05j]
>>>

edit: This basically puts it in form of a polynomial, then solves the polynomial using numpy's (part of scipy) built in capabilities. Also note that I am calling your function cumulative binomial, which may be a misnomer. And I am solving for any integer s, not just N/2

Thanks. I will try to understand it, but from the first look it seems that you assigned n and epsilon some values, didn't you? If so, then you are just finding the roots of a polynomial. This is not the problem I am facing. I need to write P as a closed form function of n, and n is a variable. Thanks again

Yes, the problem I've run into is I needed to find P given epsilon and n and s, (which you have as n/2). The computer function def inverse_cumulative_binomial(eps, s, n) does this, for inputs epsilon, n, and s. The program identifies the correct polynomial and solves it. Sorry if there was misunderstanding an you were looking for something else.

Fooality said:
Yes, the problem I've run into is I needed to find P given epsilon and n and s, (which you have as n/2). The computer function def inverse_cumulative_binomial(eps, s, n) does this, for inputs epsilon, n, and s. The program identifies the correct polynomial and solves it. Sorry if there was misunderstanding an you were looking for something else.

I can use it though, it seems that finding what I want is something not easy to do, if at all. I will start with fixed n and see what happens later. Thanks for your interactions.

I couldn't understand your code Fooality, so I tried to write the equations in MATLAB as following:
Code:
clear all;
clc;

n=26;
L=ceil(n/2);

PolynomilaCoeff=zeros(n+1,1);%This is the polynomial coefficient vector [p_0 ... p_n]

for kk=L:n
for ii=0:n-kk
PolynomilaCoeff(kk+ii+1)=PolynomilaCoeff(kk+ii+1)+(-1)^ii*nchoosek(n,kk)*nchoosek(n-kk,ii);%The terms with same exponent kk+ii are added together
end
end

PolynomilaCoeff(1)=-2.3e-5;%epsilon=2.3e-5

PolynomilaCoeff1 = fliplr(PolynomilaCoeff');%This flips the coeffieinct vector because the roots function works with the vector [p_n ... p_0]

r=roots(PolynomilaCoeff1);

However, I didn't get the same results as yours. I hope you understand MATLAB to let me know if I have a logical error or something else. Also, in your solution set, right before the full solution set it is written

Code:
...
real positive solutions
[2.2999999804432893e-05]
...

even though it is not shown in the solution set. Where did this come from? I can see the solution (7,1) in the solution matrix is very close to real since the imaginary part exponent is -13, which makes it effectively real.

Thanks

Hey, Yeah. Basically what's happening with the numeric problems is that Python 3 has an arbitrary length integer type, but this gets cast to C for the Python scientific, so it gets converted to a C floating point type, and data is lost. Therefore it outputs solutions that are close, but not quite. To make up for this, I strip off the imaginary part when it is below a certain threshold. Its quick hack, but next time I run into it in practice I will write something using arbitrary precision math library, but that's too much work for right now, I just wanted to know the algorithm even though its in its messy form.

Because of those numeric problems in my code, MATLAB may actually be giving you more precise answers, so that may be the difference. If not:

So you basically have $$\sum\limits_{k=n/2}^n{n \choose k}p^{k} {\left(-p + 1\right)}^{-k + n}$$
Which has another form which is more polynomial like, which is something like this: $$\sum\limits_{k=n/2}^n \sum\limits_{l=0}^{n-k}{n \choose k}{n-k \choose l}{(-1)}^l p^{k+l}$$
Now with some mathematical messing around, its probably possible to re-arrange that into a true polynomial form, but for right now, l+k gives us the same power at many different points so our job is to add up the different coefficients for each time k+l sums to the same integer.

Now I see here: http://www.mathworks.com/help/matlab/ref/roots.html MATLAB wants its input as a column vector.
1) Determine what the biggest k+l will be. You can do this in a math way, or a lazy coder way like I did. I think if you look at the equation I think 2*n.
2) Create a column vector of that size, fill it with zeros.
Go through the loops necessary to get all the coefficients. Each sigma in the rearranged form will be a separate loop, so two nested loops, like my polynomial_form function. Each time you get them, add them to the (l+k)th slot in the column vector.
Subtract epsilon to the zero power slot in the column vector. In python its the 0th, but in MATLAB it appears to be the last.
4) Call the roots function from the link on the constructed column vector.
5) filter out results you want.

Last edited:
Fooality said:
Hey, Yeah. Basically what's happening with the numeric problems is that Python 3 has an arbitrary length integer type, but this gets cast to C for the Python scientific, so it gets converted to a C floating point type, and data is lost. Therefore it outputs solutions that are close, but not quite. To make up for this, I strip off the imaginary part when it is below a certain threshold. Its quick hack, but next time I run into it in practice I will write something using arbitrary precision math library, but that's too much work for right now, I just wanted to know the algorithm to satisfy myself.

Because of those numeric problems in my code, MATLAB may actually be giving you more precise answers. The question is, when you compute the answers and plug them back into the original outputs, do you get the right answers? If so, you've got it, if not I can write you what I do really clearly in human language pseudocode if it would help you.

When I set n=5 and epsilon=2.3*10^-5 I got the following roots

Code:
r =

1.2500 + 0.3227i
1.2500 - 0.3227i
0.0133 + 0.0000i
-0.0066 + 0.0114i
-0.0066 - 0.0114i

The verification segment is

Code:
P_total=zeros(length(r),1);

for rr=1:length(r)
for kk=L:n
for ll=0:n-kk
P_total(rr)=P_total(rr)+(-1)^ll*nchoosek(n,kk)*nchoosek(n-kk,ll)*r(rr)^(kk+ll);
end
end
end

and the output is

Code:
P_total =

1.0e-04 *

0.2300 - 0.0000i
0.2300 - 0.0000i
0.2300 - 0.0000i
0.2300 - 0.0000i
0.2300 - 0.0000i

So, I guess for these parameters P=0.0133 is the solution. However, when I set n=26 (as in you example) for the same epsilon, the solution set and their verifications are

Code:
r =

1.2625 + 0.0000i
1.2455 + 0.0992i
1.2455 - 0.0992i
1.1953 + 0.1893i
1.1953 - 0.1893i
1.1139 + 0.2612i
1.1139 - 0.2612i
1.0043 + 0.3059i
1.0043 - 0.3059i
0.8685 + 0.3134i
0.8685 - 0.3134i
0.7012 + 0.2676i
0.7012 - 0.2676i
0.1468 + 0.0000i
0.1205 + 0.0759i
0.1205 - 0.0759i
0.0628 + 0.1182i
0.0628 - 0.1182i
0.0005 + 0.1261i
0.0005 - 0.1261i
-0.0532 + 0.1080i
-0.0532 - 0.1080i
-0.1121 + 0.0251i
-0.1121 - 0.0251i
-0.0919 + 0.0718i
-0.0919 - 0.0718i

Code:
P_total =

0.0022 + 0.0000i
0.0001 + 0.0019i
0.0001 - 0.0019i
-0.0010 + 0.0001i
-0.0010 - 0.0001i
-0.0000 - 0.0003i
-0.0000 + 0.0003i
0.0001 - 0.0000i
0.0001 + 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i

which isn't correct! I increased the precision, and the same thing (with more digits) happen! What could be the problem?

Thanks

Hey, I added some more detail to my response to you above before you responded, but it looks like you got it mostly.

I don't know, it could be MATLAB has worse numeric problems than Python. Does your code work for all small N?

What are you really looking for, do you need large N? I think I've read that Binomial distribution converges to the Poisson distribution, the cumulative is invertible in MATLAB:
http://www.mathworks.com/help/stats/poissinv.html

Actually, the minimum exponent is L=ceil(n/2) and the maximum exponent is n. In my code, I created a zero vector of length n, and created two for loops, one for each sum, and then I added the coefficients of the same exponent, which happens when kk+ii points to the same vector element.

I am looking for an arbitrary n. I tried for different n, and up to n=19, it gives accurate results. After that, the total probability begins to diverge from epsilon for some roots, and this gets worsen as n gets larger!

Yeah, that's the numeric issues. I would look into that Poisson approximation (I think) if you need it far larger N, if nothing else because MATLAB has the cumulative inverse ready to go...

Fooality said:
Yeah, that's the numeric issues. I would look into that Poisson approximation (I think) if you need it far larger N, if nothing else because MATLAB has the cumulative inverse ready to go...

OK, I will consider this. Thanks

S_David said:
OK, I will consider this. Thanks
If you crack it, and find a total solution in some reduced form which is easy to compute, please post it here I'd love to know.

Fooality said:
If you crack it, and find a total solution in some reduced form which is easy to compute, please post it here I'd love to know.

I read about the approximation, and it approaches Gaussian distribution with mean nP and variance nP(1-P) based on the Central Limit Theorem. This is for one sample, i.e,

$${n\choose x}P^x(1-P)^{n-x}\sim\frac{1}{\sqrt{2\pi nP(1-P)}}e^{-\frac{(x-nP)^2}{2nP(1-P)}}$$,

but I am not sure what would be the approximation for

$$\sum_{x=L}^n{n\choose x}P^x(1-P)^{n-x}$$

for arbitrary integer L! I suspect it would be also Gaussian because it is the sum of Gaussian random variables.

In MATLAB, inverse normal distribution can be found for the C.D.F, but there is no such function for the P.D.F. Can we infer the latter from the former?

Your talking to a computer science guy... I'm math literate, but not a mathematician. I do remember hearing that it was approximated by the Poisson distribution though. Also, I think this has been tackled before, in a paper called "Efficient evaluation of the inverse Binomial cumulative" which I can't connect to for some reason. But here is a post that says the Poisson distribution is good for small p and large n:
http://stackoverflow.com/questions/...the-binomial-cumulative-distribution-function

EngWiPy
Having a beer and relaxing and reflecting on this...

The very existence of the paper I mentioned, and the fact that it had to be written by post grads suggests two things 1) I will not be able to come up with an efficient hack to solve this 2) You won't come up with a closed form, and are on the right track looking into approximations.

What gives me cause to write you is a memory I have, some years ago, of having to solve this problem for a large N, and finding an approximation I was quite impressed with. I can't tell you the math, what I found was the result of digging around in open source stats libraries, stuff that's probably in MATLAB if you know where to look. But if you look, you will find it, of that I am sure. All these core stats questions have been answered in approximation by calculus gurus, and coded by numerical analysts some time ago. What's inside probably isn't pretty, but the results are quite good. Be aware for google that its sometimes called the cumulative distribution function, though that phrase seems to apply to many things:
https://en.wikipedia.org/wiki/Binomial_distribution#Cumulative_distribution_function

EngWiPy

## 1. What is an inverse binomial equation?

An inverse binomial equation is a type of mathematical equation where the independent variable appears as an exponent in the equation, and the goal is to solve for the value of the independent variable.

## 2. How do you solve an inverse binomial equation?

To solve an inverse binomial equation, you need to isolate the variable with the exponent on one side of the equation. Then, take the logarithm of both sides and use algebraic manipulation to solve for the variable. The resulting solution will be the value of the independent variable.

## 3. What are some common applications of inverse binomial equations?

Inverse binomial equations are commonly used in fields such as physics, chemistry, and economics to model exponential growth or decay phenomena. They can also be used to solve problems related to compound interest and radioactive decay.

## 4. Are there any special techniques for solving inverse binomial equations?

Yes, there are a few techniques that can be helpful when solving inverse binomial equations. These include using the laws of logarithms, factoring, and substitution. It is also important to pay attention to any restrictions on the domain of the equation.

## 5. What are some tips for solving inverse binomial equations?

Some tips for solving inverse binomial equations include reading the problem carefully and identifying the independent variable, using the correct logarithm (natural or common) depending on the problem, and checking your solution to ensure it satisfies all given conditions in the problem.

Replies
4
Views
1K
Replies
11
Views
3K
Replies
9
Views
597
Replies
2
Views
812
Replies
3
Views
976
Replies
1
Views
1K
Replies
1
Views
895
Replies
2
Views
2K
Replies
4
Views
1K
Replies
14
Views
2K