# What is the probability of breaking a 56 game hitting streak record?

• B
538 is a website that does election data and polling analysis. On its Culture page there are some math, probability, and statistics puzzles. The following is one that stumped me:
Joe DiMaggio's record 56 game hitting streak is considered hard to break(getting at least one base hit in 56 consecutive games)
The question was what is the probability of a .300 hitter breaking this record in a 20 year baseball career?
The simplifying assumptions are:
a) The hitter gets 4 at-bats in each game
b) he plays in 160 games/year in each of his 20 years of playing.
c) There are no walks or sacrifice flies in his plate appearances
d) a hitting streak can span across seasons

I got part of the puzzle right but I couldn't complete it
The parts I got right
A)The probability of getting one hit in a game
.3^1 x .7^3=.1029x4=41.16%,since there are 4 ways of getting one hit
probability of 2 hits in a a game is .0441x6=26.46% since there are 6 ways of doing this
similarly 3 hits in a game ,the probability is 7.56%
and 4 hits in a game the probability is 0.81%

The sum is the probability of getting at least one hit in a game is 75.99%
The other part I think is right is the probability of a 57 game hitting streak is : (0.7599)^57

Now the part where I am confused.
How do you figure out the probability of getting this streak in a 3200 game career(160 games per year x 20 years)?

T

## Answers and Replies

Related Set Theory, Logic, Probability, Statistics News on Phys.org
BvU
Science Advisor
Homework Helper
Hi,

How many different 57 game streaks are there in a 3200 game career ?

• Dale
If there is only one 57 game hitting streak in the career, then the hitting streak can start on the first game of his career or the(3200-57) game i.e. the 3143 game of his career. So I would think 3143 game streaks.

PeroK
Science Advisor
Homework Helper
Gold Member
2020 Award
If there is only one 57 game hitting streak in the career, then the hitting streak can start on the first game of his career or the(3200-57) game i.e. the 3143 game of his career. So I would think 3143 game streaks.
That gets you started. Can you take it any further?

Another approach is as follows:

Calculate the probability of getting a base hit in a single game. Let's call this ##p##.

His career is essentially a sequence of binomial trials with parameter ##p##.

Now, we consider each "streak" of consecutive games with a base hit. A streak could be any length from ##0##. What is the probability of a streak being 57 games or more? (a)

Next, can you calculate the average length of a streak? Note that there is a general result for binomial trials.

That would give you an estimate of the number of streaks in a 3200-game career. (b)

Finally, can you use (a) and (b) to estimate the probability of a 57 game streak in a 3200 game career?

PeroK
Science Advisor
Homework Helper
Gold Member
2020 Award
I got part of the puzzle right but I couldn't complete it
The parts I got right
A)The probability of getting one hit in a game
.3^1 x .7^3=.1029x4=41.16%,since there are 4 ways of getting one hit
probability of 2 hits in a a game is .0441x6=26.46% since there are 6 ways of doing this
similarly 3 hits in a game ,the probability is 7.56%
and 4 hits in a game the probability is 0.81%

The sum is the probability of getting at least one hit in a game is 75.99%
The other part I think is right is the probability of a 57 game hitting streak is : (0.7599)^57

Now the part where I am confused.
How do you figure out the probability of getting this streak in a 3200 game career(160 games per year x 20 years)?
This is correct. There was an easier way. The probability of no hit is ##0.7##. So the probably of no hit in a 4-at-bat game is ##0.7^4## and the probability of at least one hit is ##1 - 0.7^4 = 0.7599##. Let's call this ##p##, as above.

And, yes, if you take the first streak of a player's career, then the probability of getting 57 hitting games is ##p^{57}##.

Now see post #4 for how to continue.

playing 160 games per year for 20 years is an heroic assumption. If you look at the careers of the elite few hitters with 20 year careers, there are few seasons where they played 160 games

Adrian Beltre, for example just retired after 21 seasons, where he played a total of 2933 games and 11,068 at-bats, for an average of 140 games/ season and 3.8 at-bats per game.

https://www.baseball-reference.com/players/b/beltrad01.shtml
If you are going to calculate the odds of a streak, cant assume 4 at-bats per game - there are games where a player for one reason or another only gets one or two attempts at the plate.

And averaging hit probabilities entails averaging pitcher difficulty, which if calculating a streak, is not a good assumption. You cant assume batting 300 in a game against an elite pitcher who is performing well. A player will see a couple of these in any 56 game streak

Of course the real probability relative to DiMaggio is not the odds of a single 300 hitter, but the odds of any hitter in MLB getting a 56 game streak - the sample is every batter in the league that plays at least 56 games

PeroK
Science Advisor
Homework Helper
Gold Member
2020 Award
Of course the real probability relative to DiMaggio is not the odds of a single 300 hitter, but the odds of any hitter in MLB getting a 56 game streak - the sample is every batter in the league that plays at least 56 games
The problem as stated is the odds against an individual batter achieving something over a career. The odds that any batter does something in a given season is a different problem altogether.

StoneTemplePython
Science Advisor
Gold Member
The Riddler puzzles at 538 can be a lot of fun.

With no pun intended, this is a problem of runs, and dealing with dependencies is the challenge here. I haven't seen this done properly yet with any of the posts on this thread. I've written about this topic many times on the forum, most recently at major length here

https://www.physicsforums.com/threads/probability-of-n-consecutive-tails-in-n-coin-tosses.959909/- - - - -
You can also get an extremely nice Poisson estimation for this problem, which is a standard technique when dealing with rare events like having 56 games in a row with some attribute. (Though understanding why it works and in particular bounding the error, takes a lot more machinery.)

I like simple, powerful and general things, so my vote is to get the Union Bound estimate, which I suppose is what posts 2 and 4 might be guiding towards. For this puzzle, union bound gives a very good estimate for all of the 'regular' batters (and a much weaker bound for the steroids batter).

Marching through (i.e. calculating via a for loop) the recurrence relation is a reasonable, albeit computationally intensive way of getting the exact answer. There are a few ways of setting up the recurrence -- the one I used for this Riddler appears longer than the official solution, though it has much nicer attributes (in particular is amenable to exponential tilting).

- - - - -
I put a lot of work into that linked thread. Unfortunately OP didn't understand the math or that this is a solved problem but somehow felt qualified to argue in favor of a personal theory.

Last edited:
• BvU and PeroK
Thanks PeroK but I amstill stumped. I do not see why the probability is not just 3143x(.7599)^57

StoneTemplePython
Science Advisor
Gold Member
Thanks PeroK but I amstill stumped. I do not see why the probability is not just 3143x(.7599)^57
Setting aside minor 0 vs 1 indexing issues, this is the result that can be arrived at by applying the union bound.
- - - - -
You should know basic probability rules -- you can multiply probabilities of (intersections of) independent events, and you can add probabilities of (unions of) mutually exclusive events.

This puzzle isn't purely comprised of either so you simply cannot write the answer as equals ##\sum_{i=1}^{3143} p^{57}##, where ##p = 0.7599##

you can use that as an inequality though-- that is the union bound
- - - - -

a smarter approach
as outlined in that link, partition things so
##A_0## is event that the first ##k=57## games have a run /streak,
and everything else of interest is
##A_i## -- the event that a run/streak occurs preceded by a non-hitting game. for ##i\geq 1##

Thus you should have
##P\big(A_0\big) = p^{57}## and
##P\big(A_i\big) = (1-p)\cdot p^{57}##
again, where ##p = 0.7599##
This should give union bound estimate of
##P\big(A_0\big)+ (n-k) \cdot P\big(A_i\big) = 0.0001206##

(with possibly an immaterial fencepost error on n-k vs n-k+1 related to indexing nits, no pun intended of course)

while technically an upper bound on the probability of having a record breaking streak, this is extremely close to the exact result.

PeroK
Science Advisor
Homework Helper
Gold Member
2020 Award
Thanks PeroK but I amstill stumped. I do not see why the probability is not just 3143x(.7599)^57
There are two obvious problems with that. First, if ##n = 3143## is the number of games, then by increasing ##n## you would get a probability greater than 1.

Second, if ##p_s## is the probability of a streak of 57 games, then a player does not have 3143 streaks in his career. You could look at each streak as a "set" of variable length. A typical career might look like:

##1101011111100110- \dots##

This is the first 16 games of a player's career with ##1## for a hit and ##0## for no hits in that game. You can see that there were only 5 streaks there.

The probability you have calculated is roughly the probability of a player beating the record if they play until they have had a total of 3143 games without a hit. That's more like 10,000 - 15,000 games.

That's why the average number of games in a streak is important, as that deteremines how many chances the player has in his career to break the record. It's not directly the number of games.

Buzz Bloom
Gold Member
Hi @Thecla:

I hope you will find this of some interest.

Using a spreadsheet, I performed a Monte Carlo simulation of 10,000 careers, each of 3200 games, each game with a probability of 0.7599 of at least one hit in that game. I saw two careers with a continuous series of games, each with at least one hit, in which the series was equal to or greater than 56 games. One was 56 games, and the other 58 games.

I think that there is a possible psychological element that statistics do not consider. After a batter has begun a long streak, pitchers may well change there style of pitching, especially in the last at bat of that game for the hitter. It would be interesting to know the relative frequency during a streak of say more than 30 games, of a hit in the last at bat compared with other at bats.

Regards,
Buzz

• PeroK
PeroK
Science Advisor
Homework Helper
Gold Member
2020 Award
Hi @Thecla:

I hope you will find this of some interest.

Using a spreadsheet, I performed a Monte Carlo simulation of 10,000 careers, each of 3200 games, each game with a probability of 0.7599 of at least one hit in that game. I saw two careers with a continuous series of games, each with at least one hit, in which the series was equal to or greater than 56 games. One was 56 games, and the other 58 games.
To back this up with my crude estimate. We have a series of binomial trails with success at 0.760.76. The average length of a run of games is 4.164.16 games (including the failure). In general it is 1/prob of failure.

Roughly, each batter gets 3200/4.163200/4.16 runs of games in a career, which is roughly the number of chances he has to break the record. I reduced this to 3144/4.16=7563144/4.16=756 as any run that starts too late in his career cannot break the record.

The probability of getting a run of 57 or more games is about 1.6×10−71.6×10−7.

Finally, therefore, the probability of breaking the record in a career with about 756756 attempts on this is roughly 756756 times the above probability. This gives a probability of 0.0001210.000121.

This is roughly once in every 83008300 careers.

PS the estimate of equalling or breaking the record (i.e. getting 56 or more games) is once every 63006300 careers.

Last edited: