Help with proof for binomial distribution

AI Thread Summary
The discussion centers on deriving a formula for the average number of steps, s_tot, in a binomial distribution model related to DNA fragment replication. The user seeks to prove that s_tot can be expressed as c*E_f*(E_f+1)^(c-1), where E_f represents copying efficiency. Initial attempts included using proof by induction and the Binomial Theorem, but challenges arose in managing the summation limits. Responses provided clarification on the relationship between the summation terms and helped refine the user's approach, leading to a better understanding of the proof process. The conversation emphasizes the importance of correctly applying binomial coefficients and summation techniques in mathematical proofs.
DaanV
Messages
26
Reaction score
0
Greetings to you, Physics Forums regulars!

Please allow me to introduce myself a bit first. I'm a student in the Life Sciences, so I don't really have a lot of knowledge on mathematics past the basics.

I'm not sure if my problem belongs here. This is my first visit to this friendly-looking forum, so please forgive my ignorance if it should be elsewhere.

The problem I will describe here is what I came across during an internship. Also I'm sorry for the enormous amount of text in here. It's the only way I can make all of this make sense in my own head.

I'm fairly certain this whole thing can be solved without all the context, so if you wish you can skip straight to the bottom and see what it is I actually need help with.

Thanks a lot in advance for any help provided!


Homework Statement


Background information:
A fragment of DNA is fixed to a solid surface. It is subjected to subsequent cycles of varying conditions, and with each cycle the fragment is copied once. The new fragment will be 1 'step length' away from the parent fragment (in a random direction).

During the next cycle, both the parent and the daughter fragment are copied again, bringing about a total of 4 daughter strands (1 at the origin, 2 that have made one step and 1 that has made two steps).

Obviously the total amount of fragments (f_{tot}) in this instance can be described by
f_{tot}=2^c
With c the number of cycles

The distribution of the amount of fragments that have made a certain number of steps as a function of the number of cycles (F(s,c)) follows a binomial distribution:
F(s,c)=\left(^{c}_{s}\right)
With s the number of steps of each fragment

Sadly, life is hardly ever perfect. So it happens that the copying efficiency is not perfect. A relation can be found describing the efficiency, it's called E_f, and is a number between 0 and 1 (efficiency 0 means no copies are ever made, efficiency 1 means perfect replication). This efficiency is assumed to be constant throughout the process.

For example, with an efficiency of 0.9, the distribution would look (on average) as follows after c=2 cycles:

1 fragment at the origin (the original fragment), 1.8 fragments at one step length away (2*0.9 from the original) and 0.81 fragments at two step lengths away (0.9 * the 0.9 that were at s=1 after 1 cycle), for a total of 3.61 fragments.

The new equations now become:
f_{tot}=(E_f+1)^c
F(s,c)=\left(^{c}_{s}\right)*E_f\;^s

In order to calculate the average number of steps after a given number of cycles, I need the total number of steps, s_{tot}. This is simply a summation over all possible amount of steps (no more than there are cycles) and multiplying by the number of fragments that have made that many steps. In formula form:

s_{tot}=\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s

Now, here comes the question:
I am interested in finding a more 'direct' and elegant formula for this equation.
In my quest for this, I have found that the above formula can be rewritten to:
s_{tot}=c*E_f*(E_f+1)^{c-1}

However, I am unsure how I can solidly proof this. So far I have only simulated the two equations for various values of c and E_f and it seems to fit perfectly. I'm aware though that this is no real evidence.

Homework Equations


The mission:
Prove that
\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=c*E_f*(E_f+1)^{c-1}

It seems fairly obvious that the Binomial Theorem comes into play:
(x+y)^n=\sum^{n}_{k=0}\left(^{n}_{k}\right)*x^k*y^{n-k}

I'm struggling however with the differences between it and my formula.

The Attempt at a Solution


I tried to implement 'proof by induction'.

Obviously the equation goes down without a problem for c=0:
s_{tot}=0
Which also makes sense: After 0 cycles, no fragments will have made any steps.

Another basis (in case c=0) is not sufficient:
c=2

\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=0*1*E_f\;^0+1*2*E_f\;^1+2*1*E_f\;^2=2*E_f+2*E_f\;^2

c*E_f*(E_f+1)^{c-1}=2*E_f*(E_f+1)^1=2*E_f+2*E_f\;^2


Next up is what I consider the hard part, the actual induction. I guess the problem is that I don't know how to solve the summation when it goes to 'k+1':

Assume that for any given value k, the following is true:
\sum^{k}_{s=0}s *\left(^{k}_{s}\right)*E_f\;^s=k*E_f*(E_f+1)^{k-1}

Now it's time to prove that the same is true for k+1:
\sum^{k+1}_{s=0}s *\left(^{k+1}_{s}\right)*E_f\;^s=^{?}(k+1)*E_f*(E_f+1)^{(k+1)-1}

So.. Is there some axiom I'm missing that deals with summations up to an unknown value? How can I rewrite this?

Any help given would be very much appreciated. If things are unclear, please don't hesitate to ask.

Regards,

Daan
 
Physics news on Phys.org
DaanV said:
Greetings to you, Physics Forums regulars!

Please allow me to introduce myself a bit first. I'm a student in the Life Sciences, so I don't really have a lot of knowledge on mathematics past the basics.

I'm not sure if my problem belongs here. This is my first visit to this friendly-looking forum, so please forgive my ignorance if it should be elsewhere.

The problem I will describe here is what I came across during an internship. Also I'm sorry for the enormous amount of text in here. It's the only way I can make all of this make sense in my own head.

I'm fairly certain this whole thing can be solved without all the context, so if you wish you can skip straight to the bottom and see what it is I actually need help with.

Thanks a lot in advance for any help provided!


Homework Statement


Background information:
A fragment of DNA is fixed to a solid surface. It is subjected to subsequent cycles of varying conditions, and with each cycle the fragment is copied once. The new fragment will be 1 'step length' away from the parent fragment (in a random direction).

During the next cycle, both the parent and the daughter fragment are copied again, bringing about a total of 4 daughter strands (1 at the origin, 2 that have made one step and 1 that has made two steps).

Obviously the total amount of fragments (f_{tot}) in this instance can be described by
f_{tot}=2^c
With c the number of cycles

The distribution of the amount of fragments that have made a certain number of steps as a function of the number of cycles (F(s,c)) follows a binomial distribution:
F(s,c)=\left(^{c}_{s}\right)
With s the number of steps of each fragment

Sadly, life is hardly ever perfect. So it happens that the copying efficiency is not perfect. A relation can be found describing the efficiency, it's called E_f, and is a number between 0 and 1 (efficiency 0 means no copies are ever made, efficiency 1 means perfect replication). This efficiency is assumed to be constant throughout the process.

For example, with an efficiency of 0.9, the distribution would look (on average) as follows after c=2 cycles:

1 fragment at the origin (the original fragment), 1.8 fragments at one step length away (2*0.9 from the original) and 0.81 fragments at two step lengths away (0.9 * the 0.9 that were at s=1 after 1 cycle), for a total of 3.61 fragments.

The new equations now become:
f_{tot}=(E_f+1)^c
F(s,c)=\left(^{c}_{s}\right)*E_f\;^s

In order to calculate the average number of steps after a given number of cycles, I need the total number of steps, s_{tot}. This is simply a summation over all possible amount of steps (no more than there are cycles) and multiplying by the number of fragments that have made that many steps. In formula form:

s_{tot}=\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s

Now, here comes the question:
I am interested in finding a more 'direct' and elegant formula for this equation.
In my quest for this, I have found that the above formula can be rewritten to:
s_{tot}=c*E_f*(E_f+1)^{c-1}

However, I am unsure how I can solidly proof this. So far I have only simulated the two equations for various values of c and E_f and it seems to fit perfectly. I'm aware though that this is no real evidence.

Homework Equations


The mission:
Prove that
\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=c*E_f*(E_f+1)^{c-1}

It seems fairly obvious that the Binomial Theorem comes into play:
(x+y)^n=\sum^{n}_{k=0}\left(^{n}_{k}\right)*x^k*y^{n-k}

I'm struggling however with the differences between it and my formula.

The Attempt at a Solution


I tried to implement 'proof by induction'.

Obviously the equation goes down without a problem for c=0:
s_{tot}=0
Which also makes sense: After 0 cycles, no fragments will have made any steps.

Another basis (in case c=0) is not sufficient:
c=2

\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=0*1*E_f\;^0+1*2*E_f\;^1+2*1*E_f\;^2=2*E_f+2*E_f\;^2

c*E_f*(E_f+1)^{c-1}=2*E_f*(E_f+1)^1=2*E_f+2*E_f\;^2


Next up is what I consider the hard part, the actual induction. I guess the problem is that I don't know how to solve the summation when it goes to 'k+1':

Assume that for any given value k, the following is true:
\sum^{k}_{s=0}s *\left(^{k}_{s}\right)*E_f\;^s=k*E_f*(E_f+1)^{k-1}

Now it's time to prove that the same is true for k+1:
\sum^{k+1}_{s=0}s *\left(^{k+1}_{s}\right)*E_f\;^s=^{?}(k+1)*E_f*(E_f+1)^{(k+1)-1}

So.. Is there some axiom I'm missing that deals with summations up to an unknown value? How can I rewrite this?

Any help given would be very much appreciated. If things are unclear, please don't hesitate to ask.

Regards,

Daan

Use the easily-demonstrated fact that ##s {c \choose s}## is ##0## if ##s = 0## and is equal to ##c s {c-1 \choose s-1}## if ##1 \leq s \leq c##.
 
Thanks for your reply.

I presume you meant s\left(^c_s\right) is equal to c\left(^{c-1}_{s-1}\right), rather than cs\left(^{c-1}_{s-1}\right).

Thank you so much so far, that has really helped me a lot. I'm not quite there yet though.

Using Bionomial Theorem with x=Ef, y=1, k=s-1 and n=c-1 brings me:

c*E_f*\left(E_f+1\right)^{c-1}\;=\;c*E_f*\sum_{s-1=0}^{c-1}\left(^{c-1}_{s-1}\right)E_f\,^{s-1}

=\;\sum_{s=1}^{c-1}c\left(^{c-1}_{s-1}\right)E_f\,^{s}\;=\;\sum_{s=0}^{c-1}s\left(^{c}_{s}\right)E_f\,^{s}

So.. Basically the only problem left now is that the summation only goes up to c-1, or have I done something wrong in the above?

Anyway, thanks a lot for your help so far!

Daan
 
DaanV said:
Thanks for your reply.

I presume you meant s\left(^c_s\right) is equal to c\left(^{c-1}_{s-1}\right), rather than cs\left(^{c-1}_{s-1}\right).

Thank you so much so far, that has really helped me a lot. I'm not quite there yet though.

Using Bionomial Theorem with x=Ef, y=1, k=s-1 and n=c-1 brings me:

c*E_f*\left(E_f+1\right)^{c-1}\;=\;c*E_f*\sum_{s-1=0}^{c-1}\left(^{c-1}_{s-1}\right)E_f\,^{s-1}

=\;\sum_{s=1}^{c-1}c\left(^{c-1}_{s-1}\right)E_f\,^{s}\;=\;\sum_{s=0}^{c-1}s\left(^{c}_{s}\right)E_f\,^{s}

So.. Basically the only problem left now is that the summation only goes up to c-1, or have I done something wrong in the above?

Anyway, thanks a lot for your help so far!

Daan

Yes, sorry: that was a typo. In the original summation, so goes from 1 to c (because we can leave off the s=0 term, which is zero in s*C(c,s)E^s). Thus, s-1 = t goes from 0 to c-1 in the new sum, so we have
\sum_{s=0}^c s {c \choose s} E^s = \sum_{s=1}^c s {c \choose s} E^s<br /> = \sum_{s=1}^c c {c-1 \choose s-1} E^s = c E \sum_{s-1=0}^{c-1} {c-1 \choose s-1} E^{s-1}\\ = cE \sum_{t=0}^{c-1} {c-1 \choose t} E^t = cE(1+E)^{c-1}.
 
Last edited:
Ah yes, I see where I went wrong now.

Thanks again for the excellent help given!
Now on to the next problem..
 

Similar threads

Replies
18
Views
2K
Replies
9
Views
2K
Replies
18
Views
2K
Replies
1
Views
2K
Replies
7
Views
1K
Replies
15
Views
2K
Replies
12
Views
3K
Replies
16
Views
3K
Replies
1
Views
1K
Back
Top