Picking an appropriate distribution

aaaa202 · Feb 28, 2012

I am studying a biological system comprised of roughly 10000 cells. My model studies the probability that a cell accumulates four independent mutations and thus transform into a vicious cancer cell.
Starting from basic theory of the binomial distribution it is easy to write an expression for the probability that a particular cell acquires k mutations after n timesteps. Calling the probability that an arbitrary cell acquires a mutation for p we have for a single cell:
pcell = p/N
And thus:

p(k mutations on n tries) = K(n,k) * (p/N)^k * (1-p)^(n-k)

And summing all these up should give us the total probability that one cell has acquires k mutations. Now multiplying by N wouldn't actually work since p is actually specific to each cell (I assumed it to be the same for simplicity).

Now my question is: This expression becomes quite nasty when we add the fact that p differs from cell to cell. Is it possibly to make some estimations to make the expression more easy to work with. As N is pretty big (we could make it a lot bigger) would it be possible to model the distribution as a poisson distribution? And would that then make cell dependence of p easier to work with, or could we at least then find a straightforward expression for the deviation from the mean amount of mutations?

bp_psy · Feb 28, 2012

Could you explain your model a little more clearly? First what exactly is N and "a mutation for p"?

Picking an appropriate distribution

1. What is the purpose of choosing an appropriate distribution?

2. How do I know which distribution to choose?

3. What are some common distributions used in statistics?

4. Can I use the same distribution for all types of data?

5. How do I assess if the chosen distribution is a good fit for my data?

Similar threads

Hot Threads

Recent Insights