Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Two random distribution problems

  1. Sep 21, 2012 #1
    Basically i got to solve two problems that can be described as:

    if you throw 80 apples at 100 buckets randomly, and each apple must enter a bucket. How many buckets will on avarage contain 0 apple's? how many wil contain 1?, and 2?, and more?

    problem two:

    if a given number of random moving objects, that are contained in a given amount of space. what is the probabillity that, at any one time, two or more objects are within a given distance of one another.

    hoe can i approach these problems?
  2. jcsd
  3. Sep 21, 2012 #2

    Stephen Tashi

    User Avatar
    Science Advisor

    You can find information about he first problem by searching on "the occupancy problem" or "occupancy numbers".

    The question you ask is slightly ambiguous. When we speak of "the average" of some quantity, we have to specify what population of situaions we are averaging over. For example "The avergage number of buckets with 1 apple" computed over all possible ways the apples can land is different than "The average number of buckets with 1 apple" averaged over those situations where the number of buckets containing 2 apples is exaclty equal to the average number we would compute for that, instead of more or less. If this is a real world problem, explain what you want to do with these averages. It may not be useful to assume that the occupancy numbers will all come out to be near their respective averages at the same time.

    At the moment, I'm half asleep, so I offer this Google-it-yourself kind of answer.
  4. Sep 22, 2012 #3
    Thank you Stephen,

    I am working on a cancer research project involving the trapping of circulating tumor cells using micro-filters. For specific treatment and specific DNA sequencing it is very desirable to trap cells singly.
    The amount of single separated cells is influenced by the filter design and the filter conditions. For instance using a design with single pores in each filter well, the amount of cells separated singly in each well is increased due to the blocking of the filter pore, and the consequent change in filter flow directions. We need to be able to compare the filtration outcome to the outcome of a random filtration event to prove these designs and conditions.

    I wonder what the avarage outcome of the apple problem would be if it were to be run in a computer simulation a million or so times. And could this be calculated without actually running the simulation?

    As for my second problem, that seems to be more difficult, can that be run on a computer simulation? And if yes, is there any program available that can do this for me?

  5. Sep 22, 2012 #4
    For one: consider one bucket. You have a binomial distribution with 80 trials and 1/100 chance of success. You can easily calculate that. Each bucket has that same distribution, so multiply that answer by 100.

    For two: I would try this by calculating the probability that NO two objects are within distance X of one another. (Reversing the problem like this is always the first basic trick to try.) Start with N objects, S space, and X the distance. Chose one object: it has to be somewhere. You now have a binomial with N-1 trials and (S-2X)/S chance of success for each trial. Assume that all the trials succeed and there is no object within X distance. Take another object and repeat with N-2 trials and (S-4X)/S chance of success. Whoops! That won't work. The trials aren't independent because the regions might overlap. It's a mess. There might be some clever way to do it, but I would just simulate it with a computer.
  6. Sep 22, 2012 #5

    Stephen Tashi

    User Avatar
    Science Advisor

    I think it can be calculated instead of Monte-Carlo'ed but we might need a computer. Your request for "the average outcome" is ambiguous. You have to say precisely what that means. I suggest you develop that definition by considering what data you can actually measure from the real filters. Will you be counting the number of cells that you find in each well? Or will you do some sort of sampling of the wells? What averages will you use to summarize your data?

    Whatever your definition of "average" turns out to be, I suspect that the mathematics have been studied by some specialist. But we need a precise definition in order to look it up.

    Attemping to analyze the problem from first principles would begin like this:

    Assume each bucketl is given a unique number i = 1,2,..100. Assume we measure the number of apples in each bucket and find there are [itex] r_i [/itex] apples in bucket [itex] i [/itex]. The vector of numbers [itex] r_1,r_2,...r_{100} [/itex] is a vector of "occpancy numbers". The probabilty of a particular vector can be calculated as follows.

    The number of ways that the 80 apples can produce the occupany numbers is;
    [itex] \frac {80!} {r_1! r_2!....r_{100}! } [/itex]

    Each of the above ways has probability [itex] \frac {1}{ 100^{80}} [/itex].

    Suppose we only care how many buckets have exactly one apple in them, not which buckets do. To find the probability that 50 of the buckets have exactly 1 apple in them and the rest of the buckets have any other number of apples, you need to add up the probabilities of the vectors of occupany numbers that meet this requirement.

    A simplified example of a similar calculation is given in the last example in this PDF: http://www.google.com/url?sa=t&rct=...sg=AFQjCNFSxiYgXEk5Y7J35bSOXeTmL4Ec8A&cad=rja

    To compute the average number of buckets that have exactly 1 apple in them, you need to perform the calculation: (1)(probability exactly 1 bucket has exaclty 1 apple) + (2)(probability exactly 2 buckets have exactly 1 apple) + (3)(...) + (80) (probababilty exactly 80 buckets have exaclty 1 apple).

    (The discussion could get confusing because the typical math example is putting "balls in cells" and you are putting "cells in wells".)
  7. Sep 22, 2012 #6
    Thank you Looser and Stephen,

    Currently in unable to do much productive work. However i can state that we will most likely be comparing the percentage of singly separated cells after the filtration proceses.

  8. Sep 25, 2012 #7
    Hello again,

    Based on the PDF you linked i made the following calculation for 5 cells in 10 wells:

    1111100000 -> (10!/(5!*5!))*((5!/1!)) = 30240
    2111000000 -> (10!/(6!*3!*1!))*((5!/2!)) = 50400
    2210000000 -> (10!/(7!*2!))*((5!/2!*2!)) = 10800
    3110000000 -> (10!/(7!*2!))*((5!/3!)) = 7200
    3200000000 -> (10!/(8!))*((5!/3!*2!)) = 900
    4100000000 -> (10!/(8!))*((5!/4!)) = 450
    5000000000 -> (10!/(9!))*((5!/5!)) = 10
    Total: 100.000 different distributions

    0 cells: 30240*5/10 + 50400*6/10 + 10800*7/10 + 7200*7/10 + 900*8/10 + 450*8/10 + 10*9/10 = 59049/100000 = 59.049%

    1cell: 30240*5/10 + 50400*3/10 + 10800*1/10 + 7200*2/10 + 450*1/10 = 32805/100000 = 32.805%

    2cells: 50400*1/10 + 10800*2/10 + 900*1/10 = 7290/100000 = 7.290%

    3cells: 7200*1/10 + 900*1/10 = 810/100000 = 0.81%

    4cells: 450*1/10 = 45/100000 = 0.045%

    5cells: 10*1/10 = 1/100000 = 0.001%

    On average 59.049% of wells will contain 0 cells, 32.805% of all wells will contain 1 cell, etc.
    This means that: 32.805/(32.805+7.29+0.81+0.045+0.001) = 80.108% of all wells that contain cells, contain 1 cell.


    Assuming this is correct, how do i rewrite the outcome that 32.805% of wells contain 1 cell, to: "the percentage of cells that are singly separated in a well, compared to the amount of cells filtered"

  9. Sep 25, 2012 #8
    Problem 2:
    This is difficult and maybe you should just simulate it! However there is a related problem that is a lot easier: "what is the expected number of pairs of objects within a given distance of one another?"

    For this modified problem you can look at one pair of objects and calculate the probability p that they are within the given distance of each other. The expected number of pairs within the distance for n pairs of objects is then p * Choose(n, 2), because of linearity of expectation.
  10. Sep 25, 2012 #9

    Stephen Tashi

    User Avatar
    Science Advisor

    This problem assumes all 5 cells are filtered, so calculate the mean fraction of 5:

    (30240*5/5 + 50400*3/5 + 10800*1/5 + 7200*2/5 + 450*1/5) (1/100000) =

    To compare experimental data to this number, you would also need some idea of how the actual fraction that is singly filtered varies from its expected value as you repeat the experiment of tossing 5 balls at 10 buckets. The variance of the fraction that are singly filtered can be calculated and provides some information about this.
  11. Sep 26, 2012 #10


    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    Problem 2:
    Volume V, n particles, want prob that no two are within d.
    First particle effectively has an exclusion zone size z = 4πd3/3. Prob that 2nd particle lies outside this is 1-α, where α = z/V (ignoring boundary effects.). Given 1st two particles are at least d apart, prob that 3rd particle is at least d from each is approx 1-2α. It's a little higher than that because the first two exclusion zones can overlap. Leaving that aside for now, we get prob that no two of n are closer than d >= ∏(1-rα), r = 1.. n-1. Writing G for the gamma function (can't get the latex menu to work right now), this is αn-1G(1/α)/G((1/α)-n+1).
    So what about the overlap of exclusion zones? If I've calculated correctly, the 3rd particle's prob is 1-2α+β, where β=α269/32. The 4th particle's prob is then a bit less than 1-3α+3β (less because the overlaps can overlap). Likewise, the (r+1)th particle's prob is a little under 1-rα+r(r-1) β/2. Given that β is approx 2α2, this suggests the general approximation 1/(1+rα), giving 1/∏(1+rα), or α-n+1G((1/α)+n-1)/G(1/α).
    Either way, asymptotically it's looking like e-αn2.
  12. Sep 26, 2012 #11

    Stephen Tashi

    User Avatar
    Science Advisor

    Saying "at any one time" in the sense of clock time is a different question than asking the probability that two objects are close in a one randomly selected arrangement of the particles. For example, if we select a new random arrangement of the particles a million times per second, the probability we find an arrangement where the particles are close at some "time" in a twenty minute period might be rather high.

    it would be best to clarify how this question relates to a real world situation. Is it related to modeling how particles in a fluid impact the face of a filter as the fluid flows into the face?
  13. Sep 26, 2012 #12
    The design of the filter allows for a non random distribution of cells. However when when two cells in the suspension are contained in a sub partition of that suspension that is smaller than the volume of a filter well, then two cells can throretically still enter a single well.
    Knowing the probabillity of this could predict how often this is supposed to happen.

  14. Sep 26, 2012 #13

    Stephen Tashi

    User Avatar
    Science Advisor

    Which is the better way to visualize the filtration?:

    1. The wells are circles in the 2D face of a filter and fluid containing suspeded cells is flowing through the face of the filter (including the part of the face where there are no wells) at a uniform rate.

    2. Same as the above, except the fluid can only flow through the face of the filter where there are wells.

    3. The wells are 3D cyclinders. Filtration consists to filling these cylinders with the fluid and letting the fluid drain out through the well. (This might amount to the same situation as 2, if we consider the volume of the well in this case to be the total volume of fluid filtered by the well in situation 2.

    - or is there some other description that is better?
  15. Sep 26, 2012 #14
    3 would best fit.

    The wells are indeed 3d cylinders that contain a single small pore at the bottom of the cylinder. Once a cell blocks this pore, fluid movement is redirected to other wells.
  16. Sep 27, 2012 #15

    Stephen Tashi

    User Avatar
    Science Advisor

    I understand the second question now that I visualize the well as a cylinder. As you said before:
    A simplified model is:

    B cells are suspended uniformly in a volume V of fluid and the fluid is divided into N subvolumes of size w (so V = N w).

    To answer questions like "What is the probability that a randomly selected subvolume contains two or more cells?", we are back to scenario of throwing B apples at N buckets.

    A model showing that fluid flows through a well until the well is blocked or showing the spatial and temporal distribution of particles in 3D would require different math.

    As others have suggested, I too strongly recommend using computer simulations as a supplement to any theoretical analysis, even if you don't do highly detailed simulations.

    I think we haven't done full justice to the mathematics of the occupancy numbers yet. Modern software mght be able to handle the large numbers involved in calculating the probabilities in your problems, but I think that this model should have been studied in pre-computer day and people would have developed approximation formulas that apply for large numbers of apples and buckets. ( A first-principles approach to that would be to use Stirlings formula to compute the factorials. )
  17. Sep 27, 2012 #16
    QUOTE: I think we haven't done full justice to the mathematics of the occupancy numbers yet. Modern software mght be able to handle the large numbers involved in calculating the probabilities in your problems, but I think that this model should have been studied in pre-computer day and people would have developed approximation formulas that apply for large numbers of apples and buckets. ( A first-principles approach to that would be to use Stirlings formula to compute the factorials. )

    Well i have used the simple calculations as a threshhold because i think that the percentages dont differ that much. And shurely they will be lower. I calculated that 65% of all cells should enter a single well in the example of 5 cells in 10 wells. For 860 cells in 1920 wells this percentage should be lower. Thus any vallue we find in a real filtration that is significantly above 65% will most likely not be random.

    For the second problem:

    The vollume of a single well is about 1.4 nanoliters, the 860 cells are contained in a vollume of 1ml. Thus the amount of subpartions that those 860 are contained in is: 1000000nl/1.4nl = 714286 partitions. Then if we devide these partitions by 215 = 3322 partitions. We get the amount of partitions that hold 4 cells on average. And calculating the amount of partitions that contain 2+ cells in that example is doable. Infact i calculated it to be 0.18 partitions. So it should occur once in every 5.5 filtrations.

  18. Sep 28, 2012 #17
    For one: the number of buckets times the probability that the first bucket has exactly k balls (as per post #4).

    For two: you could apply the inclusion-exclusion principle to P(A12 u A13 u ...) where Aij is the event that particles i and j are within distance d of each other. For large numbers of particles it would be infeasible to calculate exactly (2^nC2 terms and complex cases to consider) but perhaps upper and lower bound could be estimated for the truncated sum. I think simulation is more likely to give an accurate result.
  19. Sep 29, 2012 #18
    I agree with simulation, but hoe does one simulate this? With what programm
  20. Oct 6, 2012 #19


    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    That is essentially what I did in post #10
  21. Oct 6, 2012 #20
    I can't see where the Inclusion-exclusion principle is used in post #10 (if it did, surely the argument would be much longer and with a much less elegant result :). Speaking of which, I wonder if the approximation α-n+1G((1/α)+n-1)/G(1/α) can be strengthened to an upper bound, to match the lower bound αn-1G(1/α)/G((1/α)-n+1) that is valid for α(n-1)<1?
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook