# Solving inverse binomial equation

1. Jun 18, 2015

### S_David

Hello all,

I have this equation

$$\sum_{k=\lceil \frac{n}{2}\rceil}^n{n\choose k}P^k\left(1-P\right)^{n-k}=\epsilon$$

and I want to find P as a function of epsilon and n. Can I do that? If so, then how?

Note: $$\epsilon<10^{-3}$$ if it helps for any possible approximation.

Thanks

2. Jun 18, 2015

### mathman

I doubt if there is any exact solution. However for small $\epsilon$, P will be small, so the first term is a good guess, and the second term an error estimate.

3. Jun 18, 2015

### WWGD

Isn't this the binomial expansion of $(P+(1-P))^n =1^n$ from [n/2] to n?

4. Jun 18, 2015

### micromass

Right, and recall that the binomial distribution is symmetric around $n/2$ (or similar value closeby), and the binomial theorem can be used to solve this.

5. Jun 19, 2015

### Fooality

I looked at this. You can use an equation of that same form I think to express the (1-p)^w part, so it becomes pretty clear that your going to end up with some big polynomial in terms of P that needs to be solved. So it does have a solution, at least in the complex plane. Finding it seems tedious, possibly a good one for a computer program.

6. Jun 19, 2015

### micromass

Not if the polynomial is constant, which is what WWGD said. Of course, you don't have the full sum, only half of it, so you need to adjust that a bit.

7. Jun 19, 2015

### S_David

I thought about this. Is there any computer program will solve the roots as a function of n (epsilon is a given but I need to optimize n later). I know MATLAB can solve the roots if the coefficients are determined.

Thanks

8. Jun 19, 2015

### Fooality

Yeah, this stuck in my craw because I ran into it before as well, so I actually just wrote up a program to solve it. However, right now its got either bugs, or big numeric problems. I can post what I have now if you want, it gives pretty good solutions for n =16 and epsilon like .5, but with smaller epsilon, there are problems. Do you want me to post it?

9. Jun 19, 2015

### S_David

Yes, if it is OK with you, and if possible the mathematics behind it to understand what is going on. Thanks

10. Jun 19, 2015

### Fooality

Okay, here's what I got. To run this, you will need Python3 (free) and Scipy for Python3 (free) both are available for Windows, Mac, ad Linux.
Code (Python):

from math import factorial
import numpy as np
from numpy.polynomial import Polynomial as Po
from random import random, randint

#This should run in Python 3 where big integers allowed, otherwise it should be swapped
#out with a smarter implentation, which I couldn't find in scipy for some reason.
def binomial(n, k): return factorial(n)/(factorial(k)*factorial(n-k))

#the original cumulative binomial function we are going to invert.
def cumulative_binomial(p, n): return sum([binomial(n, k)*p**k*(1-p)**(n-k) for k in range(int(n/2), n+1)])

# rephrasing of the same function, for mathematical convenience to see how the below works
def r_cumulative_binomial(p, n):
return sum([sum([binomial(n, k)*binomial(n-k, l)*(-1)**l*p**(l+k) for l in range(n-k+1)]) for k in range(int(n/2), n+1)])

#this returns the cumulative binomial as an array with which we can create a numpy polynomial, summed from s to n
def polynomial_form(n, s):
degree = 0 #holds the highest exponent, to know how to size numpy array
ll = [] #holds the many terms, as tuple (coefficient, integer power)
for k in range(s, n+1):
for l in range(n-k+1):
ll.append((binomial(n, k)*binomial(n-k, l)*(-1)**l, l+k)) #tuple of current term
if l+k > degree: degree= l+k #if biggest seen so far, set degree

out = np.array([complex(i) for i in np.zeros(degree+1)]) #make the array that will become the polynomial
for i in ll: out[i[1]] += i[0] #set its terms
return out

#this solves it, given epsilon, s, and n
def inverse_cumulative_binomial(eps, s, n):
pp = polynomial_form(n, s) #get the polynomial term
pp[0] -= eps #subtract epsilon from it
pp = Po(pp) #covert to numpy polynomial
return pp.roots() #return all its roots

#this filters the solution into real positive numbers
def get_real_positive(l, err=.00001): return [i.real for i in l if abs(i.imag)<err and i.real > 0]

#test it with random p
test_p = randint(1,10)/1000
test_n = randint(4,16)*2
test_s = int(test_n/2)
sol = cumulative_binomial(test_p, test_n)
print ("cumulative_binomial for p=", test_p, " n=",str(test_n),": \n", sol, "\n for same r_cumubino:\n", r_cumulative_binomial(test_p, test_n))

ans = inverse_cumulative_binomial(sol, int(test_n/2), test_n)
print ("real positive solutions\n", get_real_positive(ans))
print ("full solution set\n", ans)

sample output:
Code (Text):

>>> ================================ RESTART ================================
>>>
cumulative_binomial for p= 2.3e-05  n= 26 :
5.240825291647823e-54
for same r_cumubino:
5.240825291647822e-54
real positive solutions
[2.2999999804432893e-05]
full solution set
[ -2.23307500e-05 -5.50391444e-06j  -2.23307499e-05 +5.50391451e-06j
-1.72153203e-05 -1.52510079e-05j  -1.72153202e-05 +1.52510077e-05j
-8.15610608e-06 +2.15045885e-05j  -8.15610563e-06 -2.15045885e-05j
2.77180728e-06 -2.28319337e-05j   2.77180756e-06 +2.28319344e-05j
1.30650356e-05 -1.89286845e-05j   1.30650365e-05 +1.89286836e-05j
2.03653320e-05 +1.06888085e-05j   2.03653335e-05 -1.06888091e-05j
2.29999998e-05 +8.61109091e-13j   7.01207665e-01 -2.67618123e-01j
7.01207680e-01 +2.67618132e-01j   8.68461870e-01 -3.13377128e-01j
8.68462405e-01 +3.13377188e-01j   1.00429401e+00 -3.05910115e-01j
1.00429647e+00 +3.05905794e-01j   1.11392292e+00 +2.61165944e-01j
1.11394193e+00 -2.61178577e-01j   1.19529348e+00 +1.89279568e-01j
1.19532199e+00 -1.89223865e-01j   1.24548058e+00 -9.91693272e-02j
1.24558850e+00 +9.91979540e-02j   1.26252050e+00 -6.74453283e-05j]
>>>

edit: This basically puts it in form of a polynomial, then solves the polynomial using numpy's (part of scipy) built in capabilities. Also note that I am calling your function cumulative binomial, which may be a misnomer. And I am solving for any integer s, not just N/2

Last edited: Jun 19, 2015
11. Jun 19, 2015

### Fooality

So the mathematics behind that are basically seen in the "rephrase" of your original problem:
Code (Python):

def r_cumulative_binomial(p, n):
returnsum([sum([binomial(n, k)*binomial(n-k, l)*(-1)**l*p**(l+k)for l inrange(n-k+1)])for k inrange(int(n/2), n+1)])

(If you don't read Python, know that ** is the same as ^ or exponent in Python ...sorry I suck at LaTex)
Basically (1-p)^n also has the form of your original equation, But with -p replacing p, and 1 replacing (1-P). This breaks down to the term you see above, where p^(l+k) is the polynomial term, and all that junk in front of it is the coefficient . The l comes from a second, inner sigma/summation form l=0 to n-k, for the term (1-p)^n.
So you end up with a bunch of these terms, many have the same integer power, so they have to be added. My code does that to get the polynomial, which I pass to scipy. for a builtin solution. Of course you would be doing the same with matlab.

12. Jun 19, 2015

### S_David

Thanks. I will try to understand it, but from the first look it seems that you assigned n and epsilon some values, didn't you? If so, then you are just finding the roots of a polynomial. This is not the problem I am facing. I need to write P as a closed form function of n, and n is a variable. Thanks again

13. Jun 19, 2015

### Fooality

Yes, the problem I've run into is I needed to find P given epsilon and n and s, (which you have as n/2). The computer function def inverse_cumulative_binomial(eps, s, n) does this, for inputs epsilon, n, and s. The program identifies the correct polynomial and solves it. Sorry if there was misunderstanding an you were looking for something else.

14. Jun 19, 2015

### S_David

I can use it though, it seems that finding what I want is something not easy to do, if at all. I will start with fixed n and see what happens later. Thanks for your interactions.

15. Jun 22, 2015

### S_David

I couldn't understand your code Fooality, so I tried to write the equations in MATLAB as following:
Code (Text):

clear all;
clc;

n=26;
L=ceil(n/2);

PolynomilaCoeff=zeros(n+1,1);%This is the polynomial coefficient vector [p_0 .... p_n]

for kk=L:n
for ii=0:n-kk
PolynomilaCoeff(kk+ii+1)=PolynomilaCoeff(kk+ii+1)+(-1)^ii*nchoosek(n,kk)*nchoosek(n-kk,ii);%The terms with same exponent kk+ii are added together
end
end

PolynomilaCoeff(1)=-2.3e-5;%epsilon=2.3e-5

PolynomilaCoeff1 = fliplr(PolynomilaCoeff');%This flips the coeffieinct vector because the roots function works with the vector [p_n .... p_0]

r=roots(PolynomilaCoeff1);

However, I didn't get the same results as yours. I hope you understand MATLAB to let me know if I have a logical error or something else. Also, in your solution set, right before the full solution set it is written

Code (Text):
....
real positive solutions
[2.2999999804432893e-05]
....
even though it is not shown in the solution set. Where did this come from? I can see the solution (7,1) in the solution matrix is very close to real since the imaginary part exponent is -13, which makes it effectively real.

Thanks

16. Jun 22, 2015

### Fooality

Hey, Yeah. Basically what's happening with the numeric problems is that Python 3 has an arbitrary length integer type, but this gets cast to C for the Python scientific, so it gets converted to a C floating point type, and data is lost. Therefore it outputs solutions that are close, but not quite. To make up for this, I strip off the imaginary part when it is below a certain threshold. Its quick hack, but next time I run into it in practice I will write something using arbitrary precision math library, but that's too much work for right now, I just wanted to know the algorithm even though its in its messy form.

Because of those numeric problems in my code, MATLAB may actually be giving you more precise answers, so that may be the difference. If not:

So you basically have $$\sum\limits_{k=n/2}^n{n \choose k}p^{k} {\left(-p + 1\right)}^{-k + n}$$
Which has another form which is more polynomial like, which is something like this: $$\sum\limits_{k=n/2}^n \sum\limits_{l=0}^{n-k}{n \choose k}{n-k \choose l}{(-1)}^l p^{k+l}$$
Now with some mathematical messing around, its probably possible to re-arrange that into a true polynomial form, but for right now, l+k gives us the same power at many different points so our job is to add up the different coefficients for each time k+l sums to the same integer.

Now I see here: http://www.mathworks.com/help/matlab/ref/roots.html MATLAB wants its input as a column vector.
1) Determine what the biggest k+l will be. You can do this in a math way, or a lazy coder way like I did. I think if you look at the equation I think 2*n.
2) Create a column vector of that size, fill it with zeros.
Go through the loops necessary to get all the coefficients. Each sigma in the rearranged form will be a separate loop, so two nested loops, like my polynomial_form function. Each time you get them, add them to the (l+k)th slot in the column vector.
Subtract epsilon to the zero power slot in the column vector. In python its the 0th, but in MATLAB it appears to be the last.
4) Call the roots function from the link on the constructed column vector.
5) filter out results you want.

Last edited: Jun 22, 2015
17. Jun 22, 2015

### S_David

When I set n=5 and epsilon=2.3*10^-5 I got the following roots

Code (Text):
r =

1.2500 + 0.3227i
1.2500 - 0.3227i
0.0133 + 0.0000i
-0.0066 + 0.0114i
-0.0066 - 0.0114i
The verification segment is

Code (Text):
P_total=zeros(length(r),1);

for rr=1:length(r)
for kk=L:n
for ll=0:n-kk
P_total(rr)=P_total(rr)+(-1)^ll*nchoosek(n,kk)*nchoosek(n-kk,ll)*r(rr)^(kk+ll);
end
end
end
and the output is

Code (Text):
P_total =

1.0e-04 *

0.2300 - 0.0000i
0.2300 - 0.0000i
0.2300 - 0.0000i
0.2300 - 0.0000i
0.2300 - 0.0000i
So, I guess for these parameters P=0.0133 is the solution. However, when I set n=26 (as in you example) for the same epsilon, the solution set and their verifications are

Code (Text):
r =

1.2625 + 0.0000i
1.2455 + 0.0992i
1.2455 - 0.0992i
1.1953 + 0.1893i
1.1953 - 0.1893i
1.1139 + 0.2612i
1.1139 - 0.2612i
1.0043 + 0.3059i
1.0043 - 0.3059i
0.8685 + 0.3134i
0.8685 - 0.3134i
0.7012 + 0.2676i
0.7012 - 0.2676i
0.1468 + 0.0000i
0.1205 + 0.0759i
0.1205 - 0.0759i
0.0628 + 0.1182i
0.0628 - 0.1182i
0.0005 + 0.1261i
0.0005 - 0.1261i
-0.0532 + 0.1080i
-0.0532 - 0.1080i
-0.1121 + 0.0251i
-0.1121 - 0.0251i
-0.0919 + 0.0718i
-0.0919 - 0.0718i
Code (Text):
P_total =

0.0022 + 0.0000i
0.0001 + 0.0019i
0.0001 - 0.0019i
-0.0010 + 0.0001i
-0.0010 - 0.0001i
-0.0000 - 0.0003i
-0.0000 + 0.0003i
0.0001 - 0.0000i
0.0001 + 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
which isn't correct!! I increased the precision, and the same thing (with more digits) happen!! What could be the problem?

Thanks

18. Jun 22, 2015

### Fooality

Hey, I added some more detail to my response to you above before you responded, but it looks like you got it mostly.

I don't know, it could be MATLAB has worse numeric problems than Python. Does your code work for all small N?

What are you really looking for, do you need large N? I think I've read that Binomial distribution converges to the Poisson distribution, the cumulative is invertible in MATLAB:
http://www.mathworks.com/help/stats/poissinv.html

19. Jun 22, 2015

### S_David

Actually, the minimum exponent is L=ceil(n/2) and the maximum exponent is n. In my code, I created a zero vector of length n, and created two for loops, one for each sum, and then I added the coefficients of the same exponent, which happens when kk+ii points to the same vector element.

20. Jun 22, 2015

### S_David

I am looking for an arbitrary n. I tried for different n, and up to n=19, it gives accurate results. After that, the total probability begins to diverge from epsilon for some roots, and this gets worsen as n gets larger!!