Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Solving inverse binomial equation

  1. Jun 18, 2015 #1
    Hello all,

    I have this equation

    [tex]\sum_{k=\lceil \frac{n}{2}\rceil}^n{n\choose k}P^k\left(1-P\right)^{n-k}=\epsilon[/tex]

    and I want to find P as a function of epsilon and n. Can I do that? If so, then how?

    Note: [tex]\epsilon<10^{-3}[/tex] if it helps for any possible approximation.

    Thanks
     
  2. jcsd
  3. Jun 18, 2015 #2

    mathman

    User Avatar
    Science Advisor
    Gold Member

    I doubt if there is any exact solution. However for small [itex]\epsilon[/itex], P will be small, so the first term is a good guess, and the second term an error estimate.
     
  4. Jun 18, 2015 #3

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    Isn't this the binomial expansion of ##(P+(1-P))^n =1^n## from [n/2] to n?
     
  5. Jun 18, 2015 #4

    micromass

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2016 Award

    Right, and recall that the binomial distribution is symmetric around ##n/2## (or similar value closeby), and the binomial theorem can be used to solve this.
     
  6. Jun 19, 2015 #5
    I looked at this. You can use an equation of that same form I think to express the (1-p)^w part, so it becomes pretty clear that your going to end up with some big polynomial in terms of P that needs to be solved. So it does have a solution, at least in the complex plane. Finding it seems tedious, possibly a good one for a computer program.
     
  7. Jun 19, 2015 #6

    micromass

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2016 Award

    Not if the polynomial is constant, which is what WWGD said. Of course, you don't have the full sum, only half of it, so you need to adjust that a bit.
     
  8. Jun 19, 2015 #7
    Thanks all for replying.

    I thought about this. Is there any computer program will solve the roots as a function of n (epsilon is a given but I need to optimize n later). I know MATLAB can solve the roots if the coefficients are determined.

    Thanks
     
  9. Jun 19, 2015 #8
    Yeah, this stuck in my craw because I ran into it before as well, so I actually just wrote up a program to solve it. However, right now its got either bugs, or big numeric problems. I can post what I have now if you want, it gives pretty good solutions for n =16 and epsilon like .5, but with smaller epsilon, there are problems. Do you want me to post it?
     
  10. Jun 19, 2015 #9
    Yes, if it is OK with you, and if possible the mathematics behind it to understand what is going on. Thanks
     
  11. Jun 19, 2015 #10
    Okay, here's what I got. To run this, you will need Python3 (free) and Scipy for Python3 (free) both are available for Windows, Mac, ad Linux.
    Code (Python):

    from math import factorial
    import numpy as np
    from numpy.polynomial import Polynomial as Po
    from random import random, randint

    #This should run in Python 3 where big integers allowed, otherwise it should be swapped
    #out with a smarter implentation, which I couldn't find in scipy for some reason.
    def binomial(n, k): return factorial(n)/(factorial(k)*factorial(n-k))

    #the original cumulative binomial function we are going to invert.
    def cumulative_binomial(p, n): return sum([binomial(n, k)*p**k*(1-p)**(n-k) for k in range(int(n/2), n+1)])

    # rephrasing of the same function, for mathematical convenience to see how the below works
    def r_cumulative_binomial(p, n):
        return sum([sum([binomial(n, k)*binomial(n-k, l)*(-1)**l*p**(l+k) for l in range(n-k+1)]) for k in range(int(n/2), n+1)])

    #this returns the cumulative binomial as an array with which we can create a numpy polynomial, summed from s to n
    def polynomial_form(n, s):
        degree = 0 #holds the highest exponent, to know how to size numpy array
        ll = [] #holds the many terms, as tuple (coefficient, integer power)
        for k in range(s, n+1):
            for l in range(n-k+1):
                ll.append((binomial(n, k)*binomial(n-k, l)*(-1)**l, l+k)) #tuple of current term
                if l+k > degree: degree= l+k #if biggest seen so far, set degree
         
        out = np.array([complex(i) for i in np.zeros(degree+1)]) #make the array that will become the polynomial
        for i in ll: out[i[1]] += i[0] #set its terms
        return out

    #this solves it, given epsilon, s, and n
    def inverse_cumulative_binomial(eps, s, n):
        pp = polynomial_form(n, s) #get the polynomial term
        pp[0] -= eps #subtract epsilon from it
        pp = Po(pp) #covert to numpy polynomial
        return pp.roots() #return all its roots

    #this filters the solution into real positive numbers
    def get_real_positive(l, err=.00001): return [i.real for i in l if abs(i.imag)<err and i.real > 0]

    #test it with random p
    test_p = randint(1,10)/1000
    test_n = randint(4,16)*2
    test_s = int(test_n/2)
    sol = cumulative_binomial(test_p, test_n)
    print ("cumulative_binomial for p=", test_p, " n=",str(test_n),": \n", sol, "\n for same r_cumubino:\n", r_cumulative_binomial(test_p, test_n))


    ans = inverse_cumulative_binomial(sol, int(test_n/2), test_n)
    print ("real positive solutions\n", get_real_positive(ans))
    print ("full solution set\n", ans)

     
    sample output:
    (edit, added a smaller epsilon)
    Code (Text):

    >>> ================================ RESTART ================================
    >>>
    cumulative_binomial for p= 2.3e-05  n= 26 :
     5.240825291647823e-54
     for same r_cumubino:
     5.240825291647822e-54
    real positive solutions
     [2.2999999804432893e-05]
    full solution set
     [ -2.23307500e-05 -5.50391444e-06j  -2.23307499e-05 +5.50391451e-06j
      -1.72153203e-05 -1.52510079e-05j  -1.72153202e-05 +1.52510077e-05j
      -8.15610608e-06 +2.15045885e-05j  -8.15610563e-06 -2.15045885e-05j
       2.77180728e-06 -2.28319337e-05j   2.77180756e-06 +2.28319344e-05j
       1.30650356e-05 -1.89286845e-05j   1.30650365e-05 +1.89286836e-05j
       2.03653320e-05 +1.06888085e-05j   2.03653335e-05 -1.06888091e-05j
       2.29999998e-05 +8.61109091e-13j   7.01207665e-01 -2.67618123e-01j
       7.01207680e-01 +2.67618132e-01j   8.68461870e-01 -3.13377128e-01j
       8.68462405e-01 +3.13377188e-01j   1.00429401e+00 -3.05910115e-01j
       1.00429647e+00 +3.05905794e-01j   1.11392292e+00 +2.61165944e-01j
       1.11394193e+00 -2.61178577e-01j   1.19529348e+00 +1.89279568e-01j
       1.19532199e+00 -1.89223865e-01j   1.24548058e+00 -9.91693272e-02j
       1.24558850e+00 +9.91979540e-02j   1.26252050e+00 -6.74453283e-05j]
    >>>
     
    edit: This basically puts it in form of a polynomial, then solves the polynomial using numpy's (part of scipy) built in capabilities. Also note that I am calling your function cumulative binomial, which may be a misnomer. And I am solving for any integer s, not just N/2
     
    Last edited: Jun 19, 2015
  12. Jun 19, 2015 #11
    So the mathematics behind that are basically seen in the "rephrase" of your original problem:
    Code (Python):

    def r_cumulative_binomial(p, n):
       returnsum([sum([binomial(n, k)*binomial(n-k, l)*(-1)**l*p**(l+k)for l inrange(n-k+1)])for k inrange(int(n/2), n+1)])
     
    (If you don't read Python, know that ** is the same as ^ or exponent in Python ...sorry I suck at LaTex)
    Basically (1-p)^n also has the form of your original equation, But with -p replacing p, and 1 replacing (1-P). This breaks down to the term you see above, where p^(l+k) is the polynomial term, and all that junk in front of it is the coefficient . The l comes from a second, inner sigma/summation form l=0 to n-k, for the term (1-p)^n.
    So you end up with a bunch of these terms, many have the same integer power, so they have to be added. My code does that to get the polynomial, which I pass to scipy. for a builtin solution. Of course you would be doing the same with matlab.
     
  13. Jun 19, 2015 #12
    Thanks. I will try to understand it, but from the first look it seems that you assigned n and epsilon some values, didn't you? If so, then you are just finding the roots of a polynomial. This is not the problem I am facing. I need to write P as a closed form function of n, and n is a variable. Thanks again
     
  14. Jun 19, 2015 #13
    Yes, the problem I've run into is I needed to find P given epsilon and n and s, (which you have as n/2). The computer function def inverse_cumulative_binomial(eps, s, n) does this, for inputs epsilon, n, and s. The program identifies the correct polynomial and solves it. Sorry if there was misunderstanding an you were looking for something else.
     
  15. Jun 19, 2015 #14
    I can use it though, it seems that finding what I want is something not easy to do, if at all. I will start with fixed n and see what happens later. Thanks for your interactions.
     
  16. Jun 22, 2015 #15
    I couldn't understand your code Fooality, so I tried to write the equations in MATLAB as following:
    Code (Text):

    clear all;
    clc;

    n=26;
    L=ceil(n/2);

    PolynomilaCoeff=zeros(n+1,1);%This is the polynomial coefficient vector [p_0 .... p_n]

    for kk=L:n
        for ii=0:n-kk
            PolynomilaCoeff(kk+ii+1)=PolynomilaCoeff(kk+ii+1)+(-1)^ii*nchoosek(n,kk)*nchoosek(n-kk,ii);%The terms with same exponent kk+ii are added together
        end
    end

    PolynomilaCoeff(1)=-2.3e-5;%epsilon=2.3e-5

    PolynomilaCoeff1 = fliplr(PolynomilaCoeff');%This flips the coeffieinct vector because the roots function works with the vector [p_n .... p_0]

    r=roots(PolynomilaCoeff1);
     
    However, I didn't get the same results as yours. I hope you understand MATLAB to let me know if I have a logical error or something else. Also, in your solution set, right before the full solution set it is written

    Code (Text):
    ....
    real positive solutions
    [2.2999999804432893e-05]
    ....
    even though it is not shown in the solution set. Where did this come from? I can see the solution (7,1) in the solution matrix is very close to real since the imaginary part exponent is -13, which makes it effectively real.

    Thanks
     
  17. Jun 22, 2015 #16
    Hey, Yeah. Basically what's happening with the numeric problems is that Python 3 has an arbitrary length integer type, but this gets cast to C for the Python scientific, so it gets converted to a C floating point type, and data is lost. Therefore it outputs solutions that are close, but not quite. To make up for this, I strip off the imaginary part when it is below a certain threshold. Its quick hack, but next time I run into it in practice I will write something using arbitrary precision math library, but that's too much work for right now, I just wanted to know the algorithm even though its in its messy form.

    Because of those numeric problems in my code, MATLAB may actually be giving you more precise answers, so that may be the difference. If not:

    So you basically have [tex] \sum\limits_{k=n/2}^n{n \choose k}p^{k} {\left(-p + 1\right)}^{-k + n}[/tex]
    Which has another form which is more polynomial like, which is something like this: [tex]
    \sum\limits_{k=n/2}^n \sum\limits_{l=0}^{n-k}{n \choose k}{n-k \choose l}{(-1)}^l p^{k+l}[/tex]
    Now with some mathematical messing around, its probably possible to re-arrange that into a true polynomial form, but for right now, l+k gives us the same power at many different points so our job is to add up the different coefficients for each time k+l sums to the same integer.

    Now I see here: http://www.mathworks.com/help/matlab/ref/roots.html MATLAB wants its input as a column vector.
    1) Determine what the biggest k+l will be. You can do this in a math way, or a lazy coder way like I did. I think if you look at the equation I think 2*n.
    2) Create a column vector of that size, fill it with zeros.
    Go through the loops necessary to get all the coefficients. Each sigma in the rearranged form will be a separate loop, so two nested loops, like my polynomial_form function. Each time you get them, add them to the (l+k)th slot in the column vector.
    Subtract epsilon to the zero power slot in the column vector. In python its the 0th, but in MATLAB it appears to be the last.
    4) Call the roots function from the link on the constructed column vector.
    5) filter out results you want.
     
    Last edited: Jun 22, 2015
  18. Jun 22, 2015 #17
    When I set n=5 and epsilon=2.3*10^-5 I got the following roots

    Code (Text):
    r =

       1.2500 + 0.3227i
       1.2500 - 0.3227i
       0.0133 + 0.0000i
      -0.0066 + 0.0114i
      -0.0066 - 0.0114i
    The verification segment is

    Code (Text):
    P_total=zeros(length(r),1);

    for rr=1:length(r)
        for kk=L:n
            for ll=0:n-kk
                P_total(rr)=P_total(rr)+(-1)^ll*nchoosek(n,kk)*nchoosek(n-kk,ll)*r(rr)^(kk+ll);
            end
        end
    end
    and the output is

    Code (Text):
    P_total =

       1.0e-04 *

       0.2300 - 0.0000i
       0.2300 - 0.0000i
       0.2300 - 0.0000i
       0.2300 - 0.0000i
       0.2300 - 0.0000i
    So, I guess for these parameters P=0.0133 is the solution. However, when I set n=26 (as in you example) for the same epsilon, the solution set and their verifications are

    Code (Text):
    r =

       1.2625 + 0.0000i
       1.2455 + 0.0992i
       1.2455 - 0.0992i
       1.1953 + 0.1893i
       1.1953 - 0.1893i
       1.1139 + 0.2612i
       1.1139 - 0.2612i
       1.0043 + 0.3059i
       1.0043 - 0.3059i
       0.8685 + 0.3134i
       0.8685 - 0.3134i
       0.7012 + 0.2676i
       0.7012 - 0.2676i
       0.1468 + 0.0000i
       0.1205 + 0.0759i
       0.1205 - 0.0759i
       0.0628 + 0.1182i
       0.0628 - 0.1182i
       0.0005 + 0.1261i
       0.0005 - 0.1261i
      -0.0532 + 0.1080i
      -0.0532 - 0.1080i
      -0.1121 + 0.0251i
      -0.1121 - 0.0251i
      -0.0919 + 0.0718i
      -0.0919 - 0.0718i
    Code (Text):
    P_total =

       0.0022 + 0.0000i
       0.0001 + 0.0019i
       0.0001 - 0.0019i
      -0.0010 + 0.0001i
      -0.0010 - 0.0001i
      -0.0000 - 0.0003i
      -0.0000 + 0.0003i
       0.0001 - 0.0000i
       0.0001 + 0.0000i
       0.0000 + 0.0000i
       0.0000 - 0.0000i
       0.0000 + 0.0000i
       0.0000 - 0.0000i
       0.0000 + 0.0000i
       0.0000 - 0.0000i
       0.0000 + 0.0000i
       0.0000 + 0.0000i
       0.0000 - 0.0000i
       0.0000 - 0.0000i
       0.0000 + 0.0000i
       0.0000 + 0.0000i
       0.0000 - 0.0000i
       0.0000 + 0.0000i
       0.0000 - 0.0000i
       0.0000 - 0.0000i
       0.0000 + 0.0000i
    which isn't correct!! I increased the precision, and the same thing (with more digits) happen!! What could be the problem?

    Thanks
     
  19. Jun 22, 2015 #18
    Hey, I added some more detail to my response to you above before you responded, but it looks like you got it mostly.

    I don't know, it could be MATLAB has worse numeric problems than Python. Does your code work for all small N?

    What are you really looking for, do you need large N? I think I've read that Binomial distribution converges to the Poisson distribution, the cumulative is invertible in MATLAB:
    http://www.mathworks.com/help/stats/poissinv.html
     
  20. Jun 22, 2015 #19
    Actually, the minimum exponent is L=ceil(n/2) and the maximum exponent is n. In my code, I created a zero vector of length n, and created two for loops, one for each sum, and then I added the coefficients of the same exponent, which happens when kk+ii points to the same vector element.
     
  21. Jun 22, 2015 #20
    I am looking for an arbitrary n. I tried for different n, and up to n=19, it gives accurate results. After that, the total probability begins to diverge from epsilon for some roots, and this gets worsen as n gets larger!!
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Solving inverse binomial equation
  1. Inverse equations (Replies: 7)

  2. Solving equation (Replies: 4)

  3. Solve an equation (Replies: 8)

Loading...