Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Probability question

  1. Oct 30, 2008 #1

    I have the following probability question:

    Given a random bit string S of length m, if I take a copy of a proper suffix of S (a suffix of S that is neither equal to S, nor empty), and align it with the start of the string, how many consecutive matches starting from the beginning of the string do I expect to see on average?

    For example:
    S[1..9] = 101010110
    If I take the suffix starting from position 3: S[3..9] = 1010110
    and align it with S:
    S[1..9] = 101010110
    S[3..9] = 1010110

    How many consecutive matches on average starting from position 1 do I expect to see?

    My guess is that since for each position the probability that a symbol matches another is 1/2, half of the sequence would be matches, but I am not sure of this since this line of thought doesn't take in account the consecutive ordering of the sequence of matches.

    Can someone please enlighten me?

  2. jcsd
  3. Oct 30, 2008 #2


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Basically, you're comparing the element in position k with the element in position k+r for some r>0. If the events are independent, whether they are equal has probability 1/2. So the probability distribution of the length of consecutive matches is just geometric, with the added caveat that you can't have more than m-r consecutive matches (since the suffix is of length m).
  4. Oct 30, 2008 #3
    So the expected length of matches is 2-(m-r)?
  5. Oct 30, 2008 #4


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Obviously not, since you expect the expected length to be at least 1/2. The probability that the length is k is going to be geometric in k (since you need to have k matches then a failure To calculate the expected value, just use the formula

    [tex]E(X) = \sum_{n=1}^{m-r}P(X=k)*k[/tex]

    Which in this case is


    I'm not sure what a closed form for that is off the top of my head
  6. Nov 2, 2008 #5
    According to the information here I don't understand how you arrived at that formula for the expected value.

    Shouldn't it be
    [tex]\sum_{k=0}^{m-r}{(1-p)^kp \cdot k}[/tex]

    Ok. For the case of binary strings your formula is correct because p = 1/2 but if you consider strings over a finite alfabet
    and this formula is the one to use.
    Last edited: Nov 2, 2008
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook