How can you calculate the longest streaks of successes in any number of trials?

KenNKC · Aug 20, 2012

Let's say you have 5 successes in 7 trials. There are a total of 21 different ways to arrange them. How many times will the longest streak of successes be 5, 4, 3, and 2? I figured this out by brute force, but wondered if there was a formula to calculate it, one that could be used for any x successes in y trials. The answers for the example are:

5 in a row: 3
4 in a row: 6
3 in a row: 9
2 in a row: 3

Thanks for any help.

Ken

awkward · Aug 20, 2012

It's fairly complicated. See
http://mathworld.wolfram.com/Run.html

KenNKC · Aug 21, 2012

Thank you for the link! I am trying to understand the formulas and adapt them for my purposes.

techmologist · Aug 22, 2012

For small numbers (say less than 10), it is probably easiest to do it recursively. Even by hand. For somewhat larger numbers, you would need a computer. Take a look at this thread:

https://www.physicsforums.com/showthread.php?t=405379

In it the original poster discovers several recursive solutions to a problem that is similar to yours, but slightly different. That might give you some ideas.

KenNKC · Aug 24, 2012

Thank you, I appreciate the helpful reply. Looking at that older thread, I believe this is going to be too involved for me to tackle. I want to apply this to a baseball season where a team goes 100-62 (but the W-L sequence is unknown), then calculate the probabilities of maximum winning streaks of various lengths.

If we change the problem to independent trials with the probability of winning each game at 100/162, there is a great site that calculates this for you:

http://www.pulcinientertainment.com/info/Streak-Calculator-enter.html

There are even Excel sheets to download in the site that show in detail how the results are determined.

techmologist · Aug 24, 2012

That is a handy website. If you are still interested in doing the problem as you originally stated it (where the number of successes and the number of trials are given), it is doable. If you are comfortable with counting techniques and binomial coefficients, I believe it is even possible to obtain a formula (kinda nasty looking...but still manageable) in terms of those two given quantities. For fairly large values like 162, it may be easier to have a computer evaluate the formula than to have it solve the problem recursively. If you want to pursue this, I'll try to help.

EDIT:
I'll go ahead and post the formula I have in mind, and you can decide if it's something you want to look into. Here n is the number of trials, s the number of successes, and R the longest run or streak of successes. Then the number of ways of getting a run of at least r is

[tex]N(R \geq r) = \sum_{b=1}^{B} \binom{n-s+1}{b}\sum_{j=1}^{J}(-1)^{j+1}\binom{b}{j}\binom{s-j(r-1)-1}{b-1}[/tex]

here J = min{b, integer part of s/r}

and

B = min{s, n-s-1}

To find the number of ways N(R=r) where the longest streak is exactly r, you can subtract N(R >= r+1) from N(R >= r).

KenNKC · Aug 25, 2012

Thanks for the formula! I minored in statistics in college many many years ago, so should be able to make sense of it. I can get estimates of some of these values via simulation, so that will help in checking against what I get from the formula.

techmologist · Aug 25, 2012

You're welcome. I hope it's correct. I posted it soon after I was convinced that the argument leading to it made sense, but I didn't try it out on any numbers.

KenNKC · Aug 31, 2012

I ran 1000 seasons of a baseball team going exactly 100-62, recording the longest winning streak in each. This would be a with replacement experiment. In the table below, the first column lists the maximum streak lengths. The second column lists the actual probabilities of at least that long of a streak for without replacement (probability of winning each game exactly 100/162), which I got from the site linked in post 5. The third column lists the percentages from the 1000 trials. As you would expect, without replacement has a better chance for longer streaks, but for the shorter ones, the results are almost identical. Overall, these numbers are a lot closer than I thought they would be.

Code:

	Random	Specific
Streak	Pct	Pct
4	100.0	100.0
5	99.9	99.9
6	98.0	98.0
7	89.9	89.3
8	74.3	73.1
9	55.7	52.5
10	38.8	37.2
11	25.8	24.8
12	16.6	14.4
13	10.5	8.7
14	6.5	5.8
15	4.1	4.0
16	2.5	2.0
17	1.5	1.0
18	0.9	0.6
19	0.6	0.5

techmologist · Sep 2, 2012

Interesting. For long streaks, which are infrequent, it looks like you can treat it as a Poisson process. If p is the probability of success and q = 1-p, then the expected number of streaks of length at least r in a sequence of n trials is given by

[tex]\lambda = p^r + (n-r)qp^r \approx (n-r)qp^r[/tex]

Then the probability of at least one streak of length r or more is given by

[tex]P(R \geq r) \approx 1-e^{-\lambda}[/tex]

Using p = 100/162, I tried it for r=10 and r = 15 and got answers that agree well with what you got by simulation.

techmologist · Sep 4, 2012

here are the values I get with the Poisson approximation (which asssumes independent trials with probability of success p = 100/162 each time):

Code:

Streak 
length

5       .996
6       .965
7       .873
8       .717
9       .539
10      .378
11      .253
12      .164
13      .104
14      .065
15      .040
16      .025
17      .015
18      .009
19      .006

KenNKC · Sep 6, 2012

I mistakenly thought the Poisson formula was for dependent trials; for independent ones, the results do agree quite closely for streaks of 11 or more. See second column of the chart in post 9 . It appears there is no easy way to calculate or estimate streak probabilities for dependent trials such as the 100-62 baseball season.

techmologist · Sep 6, 2012

Yeah, the Poisson approximation works quite well as long as trials are only weakly dependent. For your problem, I think this condition holds for streaks of moderate length, say 10 to 15 or 20. Shorter streaks happen too frequently, and so the occurrence of a streak of length 5 starting on the 10th trial eliminates the possibility of another streak of length 5 starting on the 14th trial, for example (it can't be the start of one because you are already in the middle of one). On the other hand, since we are given the total number of successes, 100, a very long streak like 30 or more means that successes are rare from there on out. So again, the trials are too strongly dependent for the Poisson approximation to apply. Still, I am surprised that the values you gave for dependent and independent trials are as close as they are over such a large range of streak lengths. Are you primarily interested in very long streaks?

EDIT:

Also, wouldn't independent trials be a better assumption for a season of baseball games, anyway? Assuming that the team was drawing from a bag with a predetermined number of "win" and "lose" marbles in it isn't all that realistic. I'm just curious to know if you are primarily interested in the mathematics, which is pretty interesting itself, or if you are more interested in an application (like wagering?).

How can you calculate the longest streaks of successes in any number of trials?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect