Random sequence - full alphabet run length

  1. Hi,

    Suppose we're looking at a random sequence of digits from 0 to 9. We start off reading the digits until every digit from 0 to 9 has been seen at least once and we mark the count of digits read up to that point (run length). We then reset the run length and continue until the whole random sequence has been read. In the end, for every finite random sequence, there is a corresponding sequence of run lengths.

    How would we be able to analytically arrive at average length of such these runs? What is the formal mathematical name given to such a run? What branch of mathematics concerns itself with this?

  2. jcsd
  3. Let me know if I am in the wrong place

    Just want to make sure this forum is appropriate for the question above - I've seen some people have looked at the problem but so far I garnered zero responses. This is not a homework problem.

    Again, basically, given finite alphabet and a finite sequence composed using that alphabet, we start reading the sequence until every symbol in the alphabet has been at read least once. We then record how many symbols it took us to reach that point. We repeat, until we've read the whole sequence, thus finishing with sequence of run lengths. The question is, what is average (expected) length of these runs (looks like around 27 for random sequence of digits from 0 to 9).
  4. mathman

    mathman 6,756
    Science Advisor
    Gold Member

    I suspect that you haven't gotten any response because it appears to be a very difficult calculation. Did you use some sort of Monte Carlo simulation to get 27?
  5. I believe that your question is equivalent to what's known as the "coupon collector's problem." You should be able to find an answer using that information.
  6. Once you've seen n digits, the next digit has a probability of n/10 of already been seen, so there is a probability of n/10 that a second digit is needed, n^2/100 that a third digit is needed etc. the sum of this is 1 + n/10 + n^2/100 + ... = 1/(1 - n/10). (sum of geometric series). This is the average number of digits you need to read until you get a new digit.
    To get the average run length until all the digits are seen you need to sum 1/(1 - n/10) for n ranging from 0 up to 9, so you get 1 + 10/9 + 10/8 + 10/7 + ... . + 10/2 + 10/1 wich is 29.2896....
Know someone interested in this topic? Share this thead via email, Google+, Twitter, or Facebook

Have something to add?

Draft saved Draft deleted