How Do You Calculate the Probability of Repeated Letters in Random Strings?

  • Thread starter Thread starter Robin04
  • Start date Start date
  • Tags Tags
    Probability
Click For Summary
SUMMARY

The discussion focuses on calculating the probability of repeated letters in random strings generated from a four-letter alphabet. The probability function, denoted as ##p(N)##, is defined for strings of length ##N##, specifically under the condition that ##1-p(N) \leq 10^{-10}##. Key methods for solving this problem include the theory of runs, Markov Chains, and generating functions. The participants emphasize the importance of relevant equations and suggest that a direct calculation using inclusion-exclusion principles is feasible.

PREREQUISITES
  • Understanding of probability theory, particularly the theory of runs.
  • Familiarity with Markov Chains and their applications in probability.
  • Knowledge of generating functions and their use in combinatorial problems.
  • Experience with inclusion-exclusion principles in probability calculations.
NEXT STEPS
  • Research the theory of runs in probability to understand its application in string analysis.
  • Learn about Markov Chain models and how to construct transition matrices for probability problems.
  • Explore generating functions and their role in solving combinatorial probability problems.
  • Study inclusion-exclusion principles in depth to apply them effectively in probability calculations.
USEFUL FOR

Mathematicians, statisticians, and computer scientists interested in probability theory, particularly those working on string analysis and combinatorial problems.

Robin04
Messages
259
Reaction score
16
Homework Statement
We try to send a message in a very noisy environment. The messages are coded with a four-letter alphabet. By convention, only the messages without lonely letters are considered to be meaningful. A letter is lonely if the next and previous letters are different (the first letter is lonely if the next is different and the last letter is lonely if the previous is different). At least how long does the message have to be if we want to have the chance of having a meaningful message to be less then ##10^{-10}## because of the noise? The noise changes the letters to other letters in the alphabet. Every letter appears independently with an equal chance.
Relevant Equations
-
The phrasing of the problem is a bit messy but here's how I understood it so far:
We make random character strings out of a four-letter alphabet. Every letter appears independently with an equal chance. The chance of having at least two identical letters next to each other in a string of length ##N## is ##p(N)##. We are looking for ##N## given that ##1-p(N) \leq 10^{-10}##
So the key is to find ##p(N)##. Can you help me a bit with that?
 
Physics news on Phys.org
Interesting...

I think your take on how to interpret the problem is helpful and I'd run with it -- the problem statement itself is a bit ambiguous so your interpretation seems to fill gaps. I presume this loneliness tag doesn't apply to the first and last letters -- i.e. it applies to interrior letters only.

What is missing is relevant equations-- there are many ways to tackle something like this, some of which are in reach and many (I suspect) are out of reach.

In essence it appears to be theory of runs problem -- what is the probability that the maximal run size is ##\leq 2## (i.e. event : no runs at all aka the degenerate run length of 1, union event: run of size 2) or equivalently, the probability that there is never a run of length 3. Markov Chain and poisson approximations comes to mind. There's going to be an ordinary generating function approach as well. Many other possibilities... The problem is small enough that I think a direct calculation is doable without too much trouble as well --- this requires inclusion-exclusion and carefully working through events.

edit:
my vote is for markov chains -- it boils down a 2x2 matrix... which is very easy to work with.
my second favorite here would be to set this up as a linear recurrence.
 
Last edited:

Similar threads

  • · Replies 16 ·
Replies
16
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
2
Views
1K
  • · Replies 10 ·
Replies
10
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 13 ·
Replies
13
Views
6K
Replies
6
Views
3K