Probability of some sequence in a list of numbers

Click For Summary

Discussion Overview

The discussion revolves around the probability of finding a specific sequence of integers within a larger sequence of random integers generated from a defined range. Participants explore the mathematical formulation of this probability, the conditions under which it applies, and the implications of different types of sequences.

Discussion Character

  • Exploratory
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant proposes a formula for the probability of finding a sequence of length n within a sequence of length m, suggesting that the probability p can be expressed as p = (m-n+1) / (k^n).
  • Another participant requests clarification on the nature of the sequences, questioning whether they are integers or real numbers and what constitutes a "special sequence."
  • A participant notes that the probability of a sequence appearing can depend on the specific sequence itself, providing examples to illustrate this point.
  • There is mention of using Markov chains to calculate the probability of a sequence appearing, with acknowledgment that there is no simple general formula applicable to all cases.
  • Concerns are raised about the limitations of the proposed formula, particularly in cases of long repetitions or specific patterns.
  • Participants discuss the implications of a betting strategy related to sequences in roulette, questioning the applicability of the earlier formula in that context.
  • One participant suggests that while approximations may exist for most cases, special cases can lead to significant deviations.
  • There is a request for resources on Markov chains, with a suggestion to refer to Wikipedia and textbooks for further reading.

Areas of Agreement / Disagreement

Participants express uncertainty about the validity of the proposed formula and the conditions under which it applies. There is no consensus on a definitive answer or a universally applicable formula, and multiple perspectives on the problem remain present.

Contextual Notes

The discussion highlights the complexity of calculating probabilities related to sequences, with various assumptions and conditions that may affect the outcomes. Specific cases, such as long sequences of identical numbers, are noted as potential exceptions to general approximations.

Vrbic
Messages
400
Reaction score
18
Hello,
would like to derive a length of list of random numbers in which I may find some special sequence of few numbers with some probability.

For clearness I give an example: I have two generator of (pseudo) random numbers with same range of numbers, let's say (1-k). First generator give a random sequence of n numbers from range (1-k) and numbers may repiet. The second do the same but sequence of m numbers and m>n. What has to be m and n if I would like to find n in m (anywhere) with probability p?
 
Physics news on Phys.org
Vrbic said:
Hello,
would like to derive a length of list of random numbers in which I may find some special sequence of few numbers with some probability.

For clearness I give an example: I have two generator of (pseudo) random numbers with same range of numbers, let's say (1-k). First generator give a random sequence of n numbers from range (1-k) and numbers may repiet. The second do the same but sequence of m numbers and m>n. What has to be m and n if I would like to find n in m (anywhere) with probability p?
Ok my opinion is:
in long sequence "m" is ##m-n+1## of "n" sequences. A number of combinations of small sequence "n" is ##k^n##. So probability than is ##p=\frac{m-n+1}{k^n}##
Is it right?
 
You need to explain more. For instance, you should say if you are talking about integers in some range or about real numbers. And what type of "special sequence" are you talking about? -- exact values, increasing numbers, etc. etc.
 
FactChecker said:
You need to explain more. For instance, you should say if you are talking about integers in some range or about real numbers. And what type of "special sequence" are you talking about? -- exact values, increasing numbers, etc. etc.
Sorry, I'm talking about integers. And both sequences are random sequences of numbers from range (1-k). The word special was wrong.
 
Vrbic said:
Sorry, I'm talking about integers. And both sequences are random sequences of numbers from range (1-k). The word special was wrong.
Sorry, I see that that was probably clear in your original post. I misread it.
 
FactChecker said:
Sorry, I see that that was probably clear in your original post. I misread it.
Nevermind, and what do you mean about my formula from post #2?
 
Vrbic said:
Nevermind, and what do you mean about my formula from post #2?
I don't know the answer. And I'm still not completely clear on the question. Are you starting with a given sequence that needs to be matched or are you asking about any repeats of a certain length? Either way, I am not sure that I can answer your questions. I'll have to leave that to people who are better at it.
 
It is complicated.

Let's start with an easier question: For a given sequence of n integers from 1 to k, what is the probability that this sequence appears in a sequence of m random integers from 1 to k?
It turns out that this probability depends on the sequence. As a simple example, consider k=1, m=3 and compare the sequences "11" and "12". For the string of length 3, there are 8 options (111, 112, 121, 122, 211, 212, 221, 222), three of them contain "11" but four of them contain "12". That does not mean "11" would be less likely to appear, but it appears twice in the same string (in 111) while "12" does not.

For a given sequence, you can calculate the probability that it appears with a Markov chain. There is no nice general formula, and you have to do this for every type of pattern a sequence of n integers can form.
If n=1 or k is large, the approximation of m-n+1 independent sequences of length n will give a reasonable approximation in most cases, and the cases where it fails are long repetitions, they are unlikely for large k anyway (especially for large n).
 
  • Like
Likes   Reactions: FactChecker
mfb said:
It is complicated.

Let's start with an easier question: For a given sequence of n integers from 1 to k, what is the probability that this sequence appears in a sequence of m random integers from 1 to k?
It turns out that this probability depends on the sequence. As a simple example, consider k=1, m=3 and compare the sequences "11" and "12". For the string of length 3, there are 8 options (111, 112, 121, 122, 211, 212, 221, 222), three of them contain "11" but four of them contain "12". That does not mean "11" would be less likely to appear, but it appears twice in the same string (in 111) while "12" does not.

For a given sequence, you can calculate the probability that it appears with a Markov chain. There is no nice general formula, and you have to do this for every type of pattern a sequence of n integers can form.
If n=1 or k is large, the approximation of m-n+1 independent sequences of length n will give a reasonable approximation in most cases, and the cases where it fails are long repetitions, they are unlikely for large k anyway (especially for large n).
What is behind:
Roulette and very well known myth about winning due a betting on one color and doubling of a bet. Probably you know it. You bet still one color (black/red) and when you lose you double your bet until you win.
So question is what is probability of sequence of let's say 6-10 same color in line?
May I use my formula for that?
 
  • #10
Neglecting 0 we have the case k=2, where your formula shows large deviations.
 
  • #11
mfb said:
Neglecting 0 we have the case k=2, where your formula shows large deviations.
Ok, so there doesn't exist some easy formula how to describe such situation?
 
  • #12
Not that I am aware of, but calculating it with Markov chains is fast. You can probably find some approximation for most cases, where only some special cases (like long series of the same color) lead to some deviations, but these special cases are rare.
 
  • #13
mfb said:
Not that I am aware of, but calculating it with Markov chains is fast. You can probably find some approximation for most cases, where only some special cases (like long series of the same color) lead to some deviations, but these special cases are rare.
Ok, thank you. And do you know about good article or some text about Markov chains?
 
  • #14
The Wikipedia article should give a good introduction, and there are various textbooks with more details.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 29 ·
Replies
29
Views
6K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 27 ·
Replies
27
Views
3K