How can you calculate the longest streaks of successes in any number of trials?

  • Context: Undergrad 
  • Thread starter Thread starter KenNKC
  • Start date Start date
Click For Summary

Discussion Overview

The discussion revolves around calculating the longest streaks of successes in a series of trials, specifically in the context of a baseball season with a known win-loss record. Participants explore various methods for determining the probabilities of achieving streaks of different lengths, including recursive approaches and simulations, while also considering the implications of independent versus dependent trials.

Discussion Character

  • Exploratory
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant presents a specific case of 5 successes in 7 trials and seeks a formula for calculating the longest streaks of successes.
  • Another participant suggests that for small numbers of trials, recursive methods may be the easiest approach, while larger numbers may require computational assistance.
  • A participant shares a formula for calculating the number of ways to achieve a streak of successes, noting its complexity and the need for binomial coefficients.
  • Simulation results from a participant indicate that the probabilities of maximum winning streaks for a baseball team align closely with theoretical estimates, particularly for shorter streaks.
  • Discussion includes a Poisson approximation for estimating the probability of streaks, with some participants noting its applicability under certain conditions.
  • Concerns are raised about the assumptions of independence in trials, particularly in the context of a baseball season, where outcomes may not be truly independent.

Areas of Agreement / Disagreement

Participants express varying opinions on the best methods for calculating streak probabilities, with some favoring recursive approaches and others leaning towards simulation and Poisson approximations. There is no consensus on the most effective method, and the discussion remains open-ended regarding the implications of dependent versus independent trials.

Contextual Notes

Participants acknowledge limitations in their approaches, particularly regarding the assumptions of independence in trials and the complexity of the calculations involved. The discussion highlights the challenges of applying theoretical models to real-world scenarios.

Who May Find This Useful

Readers interested in probability theory, statistical modeling, and applications in sports analytics may find the discussion relevant.

KenNKC
Messages
11
Reaction score
1
Let's say you have 5 successes in 7 trials. There are a total of 21 different ways to arrange them. How many times will the longest streak of successes be 5, 4, 3, and 2? I figured this out by brute force, but wondered if there was a formula to calculate it, one that could be used for any x successes in y trials. The answers for the example are:

5 in a row: 3
4 in a row: 6
3 in a row: 9
2 in a row: 3

Thanks for any help.

Ken
 
Physics news on Phys.org
Thank you for the link! I am trying to understand the formulas and adapt them for my purposes.
 
For small numbers (say less than 10), it is probably easiest to do it recursively. Even by hand. For somewhat larger numbers, you would need a computer. Take a look at this thread:

https://www.physicsforums.com/showthread.php?t=405379

In it the original poster discovers several recursive solutions to a problem that is similar to yours, but slightly different. That might give you some ideas.
 
Thank you, I appreciate the helpful reply. Looking at that older thread, I believe this is going to be too involved for me to tackle. I want to apply this to a baseball season where a team goes 100-62 (but the W-L sequence is unknown), then calculate the probabilities of maximum winning streaks of various lengths.

If we change the problem to independent trials with the probability of winning each game at 100/162, there is a great site that calculates this for you:

http://www.pulcinientertainment.com/info/Streak-Calculator-enter.html

There are even Excel sheets to download in the site that show in detail how the results are determined.
 
That is a handy website. If you are still interested in doing the problem as you originally stated it (where the number of successes and the number of trials are given), it is doable. If you are comfortable with counting techniques and binomial coefficients, I believe it is even possible to obtain a formula (kinda nasty looking...but still manageable) in terms of those two given quantities. For fairly large values like 162, it may be easier to have a computer evaluate the formula than to have it solve the problem recursively. If you want to pursue this, I'll try to help.

EDIT:
I'll go ahead and post the formula I have in mind, and you can decide if it's something you want to look into. Here n is the number of trials, s the number of successes, and R the longest run or streak of successes. Then the number of ways of getting a run of at least r is

N(R \geq r) = \sum_{b=1}^{B} \binom{n-s+1}{b}\sum_{j=1}^{J}(-1)^{j+1}\binom{b}{j}\binom{s-j(r-1)-1}{b-1}

here J = min{b, integer part of s/r}

and

B = min{s, n-s-1}

To find the number of ways N(R=r) where the longest streak is exactly r, you can subtract N(R >= r+1) from N(R >= r).
 
Last edited:
Thanks for the formula! I minored in statistics in college many many years ago, so should be able to make sense of it. I can get estimates of some of these values via simulation, so that will help in checking against what I get from the formula.
 
You're welcome. I hope it's correct. I posted it soon after I was convinced that the argument leading to it made sense, but I didn't try it out on any numbers.
 
I ran 1000 seasons of a baseball team going exactly 100-62, recording the longest winning streak in each. This would be a with replacement experiment. In the table below, the first column lists the maximum streak lengths. The second column lists the actual probabilities of at least that long of a streak for without replacement (probability of winning each game exactly 100/162), which I got from the site linked in post 5. The third column lists the percentages from the 1000 trials. As you would expect, without replacement has a better chance for longer streaks, but for the shorter ones, the results are almost identical. Overall, these numbers are a lot closer than I thought they would be.
Code:
	Random	Specific
Streak	Pct	Pct
4	100.0	100.0
5	99.9	99.9
6	98.0	98.0
7	89.9	89.3
8	74.3	73.1
9	55.7	52.5
10	38.8	37.2
11	25.8	24.8
12	16.6	14.4
13	10.5	8.7
14	6.5	5.8
15	4.1	4.0
16	2.5	2.0
17	1.5	1.0
18	0.9	0.6
19	0.6	0.5
 
Last edited:
  • #10
Interesting. For long streaks, which are infrequent, it looks like you can treat it as a Poisson process. If p is the probability of success and q = 1-p, then the expected number of streaks of length at least r in a sequence of n trials is given by

\lambda = p^r + (n-r)qp^r \approx (n-r)qp^r

Then the probability of at least one streak of length r or more is given by

P(R \geq r) \approx 1-e^{-\lambda}


Using p = 100/162, I tried it for r=10 and r = 15 and got answers that agree well with what you got by simulation.
 
  • #11
here are the values I get with the Poisson approximation (which asssumes independent trials with probability of success p = 100/162 each time):

Code:
Streak 
length

5       .996
6       .965
7       .873
8       .717
9       .539
10      .378
11      .253
12      .164
13      .104
14      .065
15      .040
16      .025
17      .015
18      .009
19      .006
 
  • #12
I mistakenly thought the Poisson formula was for dependent trials; for independent ones, the results do agree quite closely for streaks of 11 or more. See second column of the chart in post 9 . It appears there is no easy way to calculate or estimate streak probabilities for dependent trials such as the 100-62 baseball season.
 
  • #13
Yeah, the Poisson approximation works quite well as long as trials are only weakly dependent. For your problem, I think this condition holds for streaks of moderate length, say 10 to 15 or 20. Shorter streaks happen too frequently, and so the occurrence of a streak of length 5 starting on the 10th trial eliminates the possibility of another streak of length 5 starting on the 14th trial, for example (it can't be the start of one because you are already in the middle of one). On the other hand, since we are given the total number of successes, 100, a very long streak like 30 or more means that successes are rare from there on out. So again, the trials are too strongly dependent for the Poisson approximation to apply. Still, I am surprised that the values you gave for dependent and independent trials are as close as they are over such a large range of streak lengths. Are you primarily interested in very long streaks?

EDIT:

Also, wouldn't independent trials be a better assumption for a season of baseball games, anyway? Assuming that the team was drawing from a bag with a predetermined number of "win" and "lose" marbles in it isn't all that realistic. I'm just curious to know if you are primarily interested in the mathematics, which is pretty interesting itself, or if you are more interested in an application (like wagering?).
 
Last edited:

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 32 ·
2
Replies
32
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 41 ·
2
Replies
41
Views
6K
  • · Replies 4 ·
Replies
4
Views
2K