## Longest streaks of successes

Let's say you have 5 successes in 7 trials. There are a total of 21 different ways to arrange them. How many times will the longest streak of successes be 5, 4, 3, and 2? I figured this out by brute force, but wondered if there was a formula to calculate it, one that could be used for any x successes in y trials. The answers for the example are:

5 in a row: 3
4 in a row: 6
3 in a row: 9
2 in a row: 3

Thanks for any help.

Ken

 PhysOrg.com science news on PhysOrg.com >> Front-row seats to climate change>> Attacking MRSA with metals from antibacterial clays>> New formula invented for microscope viewing, substitutes for federally controlled drug
 It's fairly complicated. See http://mathworld.wolfram.com/Run.html
 Thank you for the link! I am trying to understand the formulas and adapt them for my purposes.

## Longest streaks of successes

For small numbers (say less than 10), it is probably easiest to do it recursively. Even by hand. For somewhat larger numbers, you would need a computer. Take a look at this thread:

In it the original poster discovers several recursive solutions to a problem that is similar to yours, but slightly different. That might give you some ideas.

 Thank you, I appreciate the helpful reply. Looking at that older thread, I believe this is going to be too involved for me to tackle. I want to apply this to a baseball season where a team goes 100-62 (but the W-L sequence is unknown), then calculate the probabilities of maximum winning streaks of various lengths. If we change the problem to independent trials with the probability of winning each game at 100/162, there is a great site that calculates this for you: http://www.pulcinientertainment.com/...tor-enter.html There are even Excel sheets to download in the site that show in detail how the results are determined.
 That is a handy website. If you are still interested in doing the problem as you originally stated it (where the number of successes and the number of trials are given), it is doable. If you are comfortable with counting techniques and binomial coefficients, I believe it is even possible to obtain a formula (kinda nasty looking...but still manageable) in terms of those two given quantities. For fairly large values like 162, it may be easier to have a computer evaluate the formula than to have it solve the problem recursively. If you want to pursue this, I'll try to help. EDIT: I'll go ahead and post the formula I have in mind, and you can decide if it's something you want to look into. Here n is the number of trials, s the number of successes, and R the longest run or streak of successes. Then the number of ways of getting a run of at least r is $$N(R \geq r) = \sum_{b=1}^{B} \binom{n-s+1}{b}\sum_{j=1}^{J}(-1)^{j+1}\binom{b}{j}\binom{s-j(r-1)-1}{b-1}$$ here J = min{b, integer part of s/r} and B = min{s, n-s-1} To find the number of ways N(R=r) where the longest streak is exactly r, you can subtract N(R >= r+1) from N(R >= r).
 Thanks for the formula! I minored in statistics in college many many years ago, so should be able to make sense of it. I can get estimates of some of these values via simulation, so that will help in checking against what I get from the formula.
 You're welcome. I hope it's correct. I posted it soon after I was convinced that the argument leading to it made sense, but I didn't try it out on any numbers.
 I ran 1000 seasons of a baseball team going exactly 100-62, recording the longest winning streak in each. This would be a with replacement experiment. In the table below, the first column lists the maximum streak lengths. The second column lists the actual probabilities of at least that long of a streak for without replacement (probability of winning each game exactly 100/162), which I got from the site linked in post 5. The third column lists the percentages from the 1000 trials. As you would expect, without replacement has a better chance for longer streaks, but for the shorter ones, the results are almost identical. Overall, these numbers are a lot closer than I thought they would be. Code:  Random Specific Streak Pct Pct 4 100.0 100.0 5 99.9 99.9 6 98.0 98.0 7 89.9 89.3 8 74.3 73.1 9 55.7 52.5 10 38.8 37.2 11 25.8 24.8 12 16.6 14.4 13 10.5 8.7 14 6.5 5.8 15 4.1 4.0 16 2.5 2.0 17 1.5 1.0 18 0.9 0.6 19 0.6 0.5
 Interesting. For long streaks, which are infrequent, it looks like you can treat it as a Poisson process. If p is the probability of success and q = 1-p, then the expected number of streaks of length at least r in a sequence of n trials is given by $$\lambda = p^r + (n-r)qp^r \approx (n-r)qp^r$$ Then the probability of at least one streak of length r or more is given by $$P(R \geq r) \approx 1-e^{-\lambda}$$ Using p = 100/162, I tried it for r=10 and r = 15 and got answers that agree well with what you got by simulation.
 here are the values I get with the Poisson approximation (which asssumes independent trials with probability of success p = 100/162 each time): Code: Streak length 5 .996 6 .965 7 .873 8 .717 9 .539 10 .378 11 .253 12 .164 13 .104 14 .065 15 .040 16 .025 17 .015 18 .009 19 .006
 I mistakenly thought the Poisson formula was for dependent trials; for independent ones, the results do agree quite closely for streaks of 11 or more. See second column of the chart in post 9 . It appears there is no easy way to calculate or estimate streak probabilities for dependent trials such as the 100-62 baseball season.
 Yeah, the Poisson approximation works quite well as long as trials are only weakly dependent. For your problem, I think this condition holds for streaks of moderate length, say 10 to 15 or 20. Shorter streaks happen too frequently, and so the occurrence of a streak of length 5 starting on the 10th trial eliminates the possibility of another streak of length 5 starting on the 14th trial, for example (it can't be the start of one because you are already in the middle of one). On the other hand, since we are given the total number of successes, 100, a very long streak like 30 or more means that successes are rare from there on out. So again, the trials are too strongly dependent for the Poisson approximation to apply. Still, I am surprised that the values you gave for dependent and independent trials are as close as they are over such a large range of streak lengths. Are you primarily interested in very long streaks? EDIT: Also, wouldn't independent trials be a better assumption for a season of baseball games, anyway? Assuming that the team was drawing from a bag with a predetermined number of "win" and "lose" marbles in it isn't all that realistic. I'm just curious to know if you are primarily interested in the mathematics, which is pretty interesting itself, or if you are more interested in an application (like wagering?).