Probability of some sequence in a list of numbers

In summary: Markov chains is fast. You can probably find some approximation for most cases, where only some special cases (like long series of the same color) lead to some deviations, but these special cases are rare.In summary, the conversation discusses the probability of finding a given sequence of numbers in a longer sequence of random numbers. The formula for calculating this probability depends on the specific sequence and can be calculated using Markov chains. For simpler cases, an approximation using independent sequences may suffice, but for more complex patterns, this may not be accurate. The conversation also touches on the topic of roulette and the myth of doubling bets on a single color, and the probability of a long series of the same color occurring. There is no easy formula for
  • #1
Vrbic
407
18
Hello,
would like to derive a length of list of random numbers in which I may find some special sequence of few numbers with some probability.

For clearness I give an example: I have two generator of (pseudo) random numbers with same range of numbers, let's say (1-k). First generator give a random sequence of n numbers from range (1-k) and numbers may repiet. The second do the same but sequence of m numbers and m>n. What has to be m and n if I would like to find n in m (anywhere) with probability p?
 
Mathematics news on Phys.org
  • #2
Vrbic said:
Hello,
would like to derive a length of list of random numbers in which I may find some special sequence of few numbers with some probability.

For clearness I give an example: I have two generator of (pseudo) random numbers with same range of numbers, let's say (1-k). First generator give a random sequence of n numbers from range (1-k) and numbers may repiet. The second do the same but sequence of m numbers and m>n. What has to be m and n if I would like to find n in m (anywhere) with probability p?
Ok my opinion is:
in long sequence "m" is ##m-n+1## of "n" sequences. A number of combinations of small sequence "n" is ##k^n##. So probability than is ##p=\frac{m-n+1}{k^n}##
Is it right?
 
  • #3
You need to explain more. For instance, you should say if you are talking about integers in some range or about real numbers. And what type of "special sequence" are you talking about? -- exact values, increasing numbers, etc. etc.
 
  • #4
FactChecker said:
You need to explain more. For instance, you should say if you are talking about integers in some range or about real numbers. And what type of "special sequence" are you talking about? -- exact values, increasing numbers, etc. etc.
Sorry, I'm talking about integers. And both sequences are random sequences of numbers from range (1-k). The word special was wrong.
 
  • #5
Vrbic said:
Sorry, I'm talking about integers. And both sequences are random sequences of numbers from range (1-k). The word special was wrong.
Sorry, I see that that was probably clear in your original post. I misread it.
 
  • #6
FactChecker said:
Sorry, I see that that was probably clear in your original post. I misread it.
Nevermind, and what do you mean about my formula from post #2?
 
  • #7
Vrbic said:
Nevermind, and what do you mean about my formula from post #2?
I don't know the answer. And I'm still not completely clear on the question. Are you starting with a given sequence that needs to be matched or are you asking about any repeats of a certain length? Either way, I am not sure that I can answer your questions. I'll have to leave that to people who are better at it.
 
  • #8
It is complicated.

Let's start with an easier question: For a given sequence of n integers from 1 to k, what is the probability that this sequence appears in a sequence of m random integers from 1 to k?
It turns out that this probability depends on the sequence. As a simple example, consider k=1, m=3 and compare the sequences "11" and "12". For the string of length 3, there are 8 options (111, 112, 121, 122, 211, 212, 221, 222), three of them contain "11" but four of them contain "12". That does not mean "11" would be less likely to appear, but it appears twice in the same string (in 111) while "12" does not.

For a given sequence, you can calculate the probability that it appears with a Markov chain. There is no nice general formula, and you have to do this for every type of pattern a sequence of n integers can form.
If n=1 or k is large, the approximation of m-n+1 independent sequences of length n will give a reasonable approximation in most cases, and the cases where it fails are long repetitions, they are unlikely for large k anyway (especially for large n).
 
  • Like
Likes FactChecker
  • #9
mfb said:
It is complicated.

Let's start with an easier question: For a given sequence of n integers from 1 to k, what is the probability that this sequence appears in a sequence of m random integers from 1 to k?
It turns out that this probability depends on the sequence. As a simple example, consider k=1, m=3 and compare the sequences "11" and "12". For the string of length 3, there are 8 options (111, 112, 121, 122, 211, 212, 221, 222), three of them contain "11" but four of them contain "12". That does not mean "11" would be less likely to appear, but it appears twice in the same string (in 111) while "12" does not.

For a given sequence, you can calculate the probability that it appears with a Markov chain. There is no nice general formula, and you have to do this for every type of pattern a sequence of n integers can form.
If n=1 or k is large, the approximation of m-n+1 independent sequences of length n will give a reasonable approximation in most cases, and the cases where it fails are long repetitions, they are unlikely for large k anyway (especially for large n).
What is behind:
Roulette and very well known myth about winning due a betting on one color and doubling of a bet. Probably you know it. You bet still one color (black/red) and when you lose you double your bet until you win.
So question is what is probability of sequence of let's say 6-10 same color in line?
May I use my formula for that?
 
  • #10
Neglecting 0 we have the case k=2, where your formula shows large deviations.
 
  • #11
mfb said:
Neglecting 0 we have the case k=2, where your formula shows large deviations.
Ok, so there doesn't exist some easy formula how to describe such situation?
 
  • #12
Not that I am aware of, but calculating it with Markov chains is fast. You can probably find some approximation for most cases, where only some special cases (like long series of the same color) lead to some deviations, but these special cases are rare.
 
  • #13
mfb said:
Not that I am aware of, but calculating it with Markov chains is fast. You can probably find some approximation for most cases, where only some special cases (like long series of the same color) lead to some deviations, but these special cases are rare.
Ok, thank you. And do you know about good article or some text about Markov chains?
 
  • #14
The Wikipedia article should give a good introduction, and there are various textbooks with more details.
 

1. What is probability?

Probability is a measure of the likelihood that a certain event will occur. It is expressed as a number between 0 and 1, where 0 indicates impossibility and 1 indicates certainty.

2. How do you calculate the probability of a sequence in a list of numbers?

The probability of a sequence in a list of numbers can be calculated by dividing the number of times the sequence appears in the list by the total number of sequences possible. For example, if a list of 10 numbers contains 3 instances of the sequence 1-2-3, the probability would be 3/10 or 0.3.

3. What factors affect the probability of a sequence in a list of numbers?

The probability of a sequence in a list of numbers can be affected by the length of the sequence, the size of the list, and the randomness of the numbers in the list. A longer sequence or a smaller list will result in a lower probability, while a more random list will result in a higher probability.

4. How can probability be used in real life?

Probability is used in a variety of fields, such as statistics, economics, and gambling. In real life, it can be used to make predictions and informed decisions, assess risk, and understand the likelihood of certain events occurring.

5. What is the relationship between probability and statistics?

Probability and statistics are closely related, as probability is the theoretical foundation of statistics. Probability is used to describe the likelihood of an event occurring, while statistics is used to analyze and interpret data in order to make predictions and draw conclusions about a population based on a sample.

Similar threads

  • General Math
Replies
1
Views
1K
Replies
35
Views
2K
Replies
5
Views
900
Replies
5
Views
1K
Replies
5
Views
2K
  • General Math
Replies
3
Views
1K
Replies
2
Views
1K
Replies
55
Views
3K
Replies
4
Views
423
Back
Top