How Do You Calculate the Probability of Repeated Letters in Random Strings?

  • Thread starter Thread starter Robin04
  • Start date Start date
  • Tags Tags
    Probability
AI Thread Summary
The discussion focuses on calculating the probability of repeated letters in random strings formed from a four-letter alphabet, specifically the probability of having at least two identical letters next to each other in a string of length N, denoted as p(N). The problem is framed as a theory of runs issue, where the goal is to determine the probability that the maximum run size is less than or equal to 2. Various mathematical approaches are suggested, including Markov Chains, Poisson approximations, and generating functions, with a preference for using a 2x2 matrix for simplicity. The conversation emphasizes the need for relevant equations and methods to tackle the problem effectively. Overall, the participants aim to clarify the interpretation and calculation of p(N) to meet the specified probability threshold.
Robin04
Messages
259
Reaction score
16
Homework Statement
We try to send a message in a very noisy environment. The messages are coded with a four-letter alphabet. By convention, only the messages without lonely letters are considered to be meaningful. A letter is lonely if the next and previous letters are different (the first letter is lonely if the next is different and the last letter is lonely if the previous is different). At least how long does the message have to be if we want to have the chance of having a meaningful message to be less then ##10^{-10}## because of the noise? The noise changes the letters to other letters in the alphabet. Every letter appears independently with an equal chance.
Relevant Equations
-
The phrasing of the problem is a bit messy but here's how I understood it so far:
We make random character strings out of a four-letter alphabet. Every letter appears independently with an equal chance. The chance of having at least two identical letters next to each other in a string of length ##N## is ##p(N)##. We are looking for ##N## given that ##1-p(N) \leq 10^{-10}##
So the key is to find ##p(N)##. Can you help me a bit with that?
 
Physics news on Phys.org
Interesting...

I think your take on how to interpret the problem is helpful and I'd run with it -- the problem statement itself is a bit ambiguous so your interpretation seems to fill gaps. I presume this loneliness tag doesn't apply to the first and last letters -- i.e. it applies to interrior letters only.

What is missing is relevant equations-- there are many ways to tackle something like this, some of which are in reach and many (I suspect) are out of reach.

In essence it appears to be theory of runs problem -- what is the probability that the maximal run size is ##\leq 2## (i.e. event : no runs at all aka the degenerate run length of 1, union event: run of size 2) or equivalently, the probability that there is never a run of length 3. Markov Chain and poisson approximations comes to mind. There's going to be an ordinary generating function approach as well. Many other possibilities... The problem is small enough that I think a direct calculation is doable without too much trouble as well --- this requires inclusion-exclusion and carefully working through events.

edit:
my vote is for markov chains -- it boils down a 2x2 matrix... which is very easy to work with.
my second favorite here would be to set this up as a linear recurrence.
 
Last edited:
I picked up this problem from the Schaum's series book titled "College Mathematics" by Ayres/Schmidt. It is a solved problem in the book. But what surprised me was that the solution to this problem was given in one line without any explanation. I could, therefore, not understand how the given one-line solution was reached. The one-line solution in the book says: The equation is ##x \cos{\omega} +y \sin{\omega} - 5 = 0##, ##\omega## being the parameter. From my side, the only thing I could...
Back
Top