Undergrad Probability of some sequence in a list of numbers

Click For Summary
The discussion centers on determining the probability of finding a specific sequence of integers within a larger sequence of random integers. It explores the relationship between the lengths of the sequences (n and m) and the range of integers (1-k), proposing a formula for calculating probability. The complexity of the problem is highlighted, noting that the probability can vary based on the specific sequence being examined and may require Markov chain analysis for accurate calculation. Additionally, the conversation touches on the challenges of applying simple formulas to scenarios like roulette betting, where long sequences of the same outcome can skew results. Overall, the topic emphasizes the intricacies of probability in random sequences and the need for careful analysis.
Vrbic
Messages
400
Reaction score
18
Hello,
would like to derive a length of list of random numbers in which I may find some special sequence of few numbers with some probability.

For clearness I give an example: I have two generator of (pseudo) random numbers with same range of numbers, let's say (1-k). First generator give a random sequence of n numbers from range (1-k) and numbers may repiet. The second do the same but sequence of m numbers and m>n. What has to be m and n if I would like to find n in m (anywhere) with probability p?
 
Mathematics news on Phys.org
Vrbic said:
Hello,
would like to derive a length of list of random numbers in which I may find some special sequence of few numbers with some probability.

For clearness I give an example: I have two generator of (pseudo) random numbers with same range of numbers, let's say (1-k). First generator give a random sequence of n numbers from range (1-k) and numbers may repiet. The second do the same but sequence of m numbers and m>n. What has to be m and n if I would like to find n in m (anywhere) with probability p?
Ok my opinion is:
in long sequence "m" is ##m-n+1## of "n" sequences. A number of combinations of small sequence "n" is ##k^n##. So probability than is ##p=\frac{m-n+1}{k^n}##
Is it right?
 
You need to explain more. For instance, you should say if you are talking about integers in some range or about real numbers. And what type of "special sequence" are you talking about? -- exact values, increasing numbers, etc. etc.
 
FactChecker said:
You need to explain more. For instance, you should say if you are talking about integers in some range or about real numbers. And what type of "special sequence" are you talking about? -- exact values, increasing numbers, etc. etc.
Sorry, I'm talking about integers. And both sequences are random sequences of numbers from range (1-k). The word special was wrong.
 
Vrbic said:
Sorry, I'm talking about integers. And both sequences are random sequences of numbers from range (1-k). The word special was wrong.
Sorry, I see that that was probably clear in your original post. I misread it.
 
FactChecker said:
Sorry, I see that that was probably clear in your original post. I misread it.
Nevermind, and what do you mean about my formula from post #2?
 
Vrbic said:
Nevermind, and what do you mean about my formula from post #2?
I don't know the answer. And I'm still not completely clear on the question. Are you starting with a given sequence that needs to be matched or are you asking about any repeats of a certain length? Either way, I am not sure that I can answer your questions. I'll have to leave that to people who are better at it.
 
It is complicated.

Let's start with an easier question: For a given sequence of n integers from 1 to k, what is the probability that this sequence appears in a sequence of m random integers from 1 to k?
It turns out that this probability depends on the sequence. As a simple example, consider k=1, m=3 and compare the sequences "11" and "12". For the string of length 3, there are 8 options (111, 112, 121, 122, 211, 212, 221, 222), three of them contain "11" but four of them contain "12". That does not mean "11" would be less likely to appear, but it appears twice in the same string (in 111) while "12" does not.

For a given sequence, you can calculate the probability that it appears with a Markov chain. There is no nice general formula, and you have to do this for every type of pattern a sequence of n integers can form.
If n=1 or k is large, the approximation of m-n+1 independent sequences of length n will give a reasonable approximation in most cases, and the cases where it fails are long repetitions, they are unlikely for large k anyway (especially for large n).
 
  • Like
Likes FactChecker
mfb said:
It is complicated.

Let's start with an easier question: For a given sequence of n integers from 1 to k, what is the probability that this sequence appears in a sequence of m random integers from 1 to k?
It turns out that this probability depends on the sequence. As a simple example, consider k=1, m=3 and compare the sequences "11" and "12". For the string of length 3, there are 8 options (111, 112, 121, 122, 211, 212, 221, 222), three of them contain "11" but four of them contain "12". That does not mean "11" would be less likely to appear, but it appears twice in the same string (in 111) while "12" does not.

For a given sequence, you can calculate the probability that it appears with a Markov chain. There is no nice general formula, and you have to do this for every type of pattern a sequence of n integers can form.
If n=1 or k is large, the approximation of m-n+1 independent sequences of length n will give a reasonable approximation in most cases, and the cases where it fails are long repetitions, they are unlikely for large k anyway (especially for large n).
What is behind:
Roulette and very well known myth about winning due a betting on one color and doubling of a bet. Probably you know it. You bet still one color (black/red) and when you lose you double your bet until you win.
So question is what is probability of sequence of let's say 6-10 same color in line?
May I use my formula for that?
 
  • #10
Neglecting 0 we have the case k=2, where your formula shows large deviations.
 
  • #11
mfb said:
Neglecting 0 we have the case k=2, where your formula shows large deviations.
Ok, so there doesn't exist some easy formula how to describe such situation?
 
  • #12
Not that I am aware of, but calculating it with Markov chains is fast. You can probably find some approximation for most cases, where only some special cases (like long series of the same color) lead to some deviations, but these special cases are rare.
 
  • #13
mfb said:
Not that I am aware of, but calculating it with Markov chains is fast. You can probably find some approximation for most cases, where only some special cases (like long series of the same color) lead to some deviations, but these special cases are rare.
Ok, thank you. And do you know about good article or some text about Markov chains?
 
  • #14
The Wikipedia article should give a good introduction, and there are various textbooks with more details.
 

Similar threads

  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 3 ·
Replies
3
Views
764
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 13 ·
Replies
13
Views
9K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K