Probability Q: Average Consecutive Matches in Bit Strings

In summary, the probability of having an average of consecutive matches in a bit string can be calculated by using the binomial distribution formula. This formula takes into account the length of the bit string, the probability of a match occurring, and the number of consecutive matches desired. As the number of consecutive matches increases, the probability decreases exponentially. This concept is often used in cryptography and coding theory to analyze the likelihood of a sequence of bits being repeated in a random string.
  • #1
dabd
25
0
Hi,

I have the following probability question:

Given a random bit string S of length m, if I take a copy of a proper suffix of S (a suffix of S that is neither equal to S, nor empty), and align it with the start of the string, how many consecutive matches starting from the beginning of the string do I expect to see on average?

For example:
Given
S[1..9] = 101010110
If I take the suffix starting from position 3: S[3..9] = 1010110
and align it with S:
S[1..9] = 101010110
S[3..9] = 1010110

How many consecutive matches on average starting from position 1 do I expect to see?

My guess is that since for each position the probability that a symbol matches another is 1/2, half of the sequence would be matches, but I am not sure of this since this line of thought doesn't take in account the consecutive ordering of the sequence of matches.

Can someone please enlighten me?

Thanks.
 
Physics news on Phys.org
  • #2
Basically, you're comparing the element in position k with the element in position k+r for some r>0. If the events are independent, whether they are equal has probability 1/2. So the probability distribution of the length of consecutive matches is just geometric, with the added caveat that you can't have more than m-r consecutive matches (since the suffix is of length m).
 
  • #3
Office_Shredder said:
Basically, you're comparing the element in position k with the element in position k+r for some r>0. If the events are independent, whether they are equal has probability 1/2. So the probability distribution of the length of consecutive matches is just geometric, with the added caveat that you can't have more than m-r consecutive matches (since the suffix is of length m).

So the expected length of matches is 2-(m-r)?
 
  • #4
Obviously not, since you expect the expected length to be at least 1/2. The probability that the length is k is going to be geometric in k (since you need to have k matches then a failure To calculate the expected value, just use the formula

[tex]E(X) = \sum_{n=1}^{m-r}P(X=k)*k[/tex]

Which in this case is

[tex]\sum_{n=1}^{m-r}\frac{k}{2^k}[/tex]

I'm not sure what a closed form for that is off the top of my head
 
  • #5
Office_Shredder said:
Obviously not, since you expect the expected length to be at least 1/2. The probability that the length is k is going to be geometric in k (since you need to have k matches then a failure To calculate the expected value, just use the formula

[tex]E(X) = \sum_{n=1}^{m-r}P(X=k)*k[/tex]

Which in this case is

[tex]\sum_{n=1}^{m-r}\frac{k}{2^k}[/tex]

I'm not sure what a closed form for that is off the top of my head

According to the information here I don't understand how you arrived at that formula for the expected value.
http://en.wikipedia.org/wiki/Geometric_distribution

Shouldn't it be
[tex]\sum_{k=0}^{m-r}{(1-p)^kp \cdot k}[/tex]

Ok. For the case of binary strings your formula is correct because p = 1/2 but if you consider strings over a finite alfabet
[tex]|\Sigma|>2[/tex]
then
[tex]p=\frac{1}{|\Sigma|}[/tex]
and this formula is the one to use.
 
Last edited:

1. What is the concept of probability in the context of bit strings?

In probability, a bit string refers to a sequence of binary digits (0s and 1s) that represent the outcomes of an event. The concept of probability in this context is used to determine the likelihood of a certain sequence of bits occurring in a given set of bit strings.

2. How is probability calculated for consecutive matches in bit strings?

The probability of consecutive matches in bit strings is calculated by dividing the number of bit strings that contain the desired sequence by the total number of possible bit strings. This can be represented as a fraction or a decimal value.

3. What does the average consecutive matches in bit strings represent?

The average consecutive matches in bit strings represents the expected number of consecutive matches that will occur within a given set of bit strings. It can also be thought of as the average number of times the desired sequence will appear in a set of bit strings.

4. How is the average consecutive matches in bit strings affected by the length of the bit strings?

The average consecutive matches in bit strings is directly affected by the length of the bit strings. As the length of the bit strings increases, the average number of consecutive matches also increases. This is due to the fact that longer bit strings have a higher likelihood of containing the desired sequence.

5. Can probability be used to predict the occurrence of consecutive matches in bit strings?

Yes, probability can be used to predict the occurrence of consecutive matches in bit strings. By calculating the probability, we can determine the likelihood of a certain sequence appearing and make predictions based on that information. However, it should be noted that probability is not a guarantee and the actual outcome may differ from the predicted result.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
36
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
635
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
936
Back
Top