DNA sequence evolution

1. Feb 3, 2015

bowlbase

1. The problem statement, all variables and given/known data
A strand of length L begins life as all A's. Assume that each letter evolves independent of all the rest until today, 1000 generations later. Within each generation there is a $\mu$ probability that the letter mutates to either C, G, T. Finally, assume that once a letter mutates that it cannot mutate again.
Calculate the number of A's as a function of $\mu$. Then equate this expectation to $N_A$ and write down a function for $\mu$ in terms of$N_A$.

2. Relevant equations

3. The attempt at a solution
So, I have 1000 generations where each A has the possibility to mutate to something else with probability $\mu$. The first generation the total number of A's is $N_A=L$. The second generation we must multiply each A by the mutation probability. Since there is L A's we will get: $N_A=\mu L$. The third generation occurs and we have to multiply the current number of A's by $\mu$ again. Which gives us $N_A=\mu \mu L$. Taking this to 1000 generations we'd have $N_A= \mu^{1000-1} L$ which doesn't really seem likely at all.

Any suggestions, or is this correct?

2. Feb 4, 2015

Brian T

What your solution is working toward is the number of non-A's in a given generation. What you want is to apply the opposite probability, the probability of not mutating.

For example, think if the probability was 1% to mutate. After the 1st generation, you would expect .99L A genes and .01L non-A genes. if you just took NA = μL, you would effectively be saying that NA in the first generation is (.01)L which would actually be the Nnot-A

So, your concept of multiplying the probability successively is correct, but you just need to use the right probability.

Last edited: Feb 4, 2015
3. Feb 4, 2015

bowlbase

You are right. I had a feeling that I was getting the opposite result that I was meaning to get. I should have realized I had them mixed up! Thanks!

4. Feb 4, 2015

Brian T

No problem!

And one more thing, if the second generation is P2L, and the third is P3L. Wouldn't the 1000th be P1000L? Just wondering since you put P1000-1L
(assuming P is the corrected probability)

Edit: nevermind, the first generation is just L, haha