What is the probability of breaking a 56 game hitting streak record?

In summary, 538 provides information on elections, polling analysis, and probability puzzles. One question on the website stumped the user, and the probability of a .300 hitter breaking Joe DiMaggio's record 56 game hitting streak is .1029x4=41.16%. A player's career is essentially a sequence of binomial trials with parameter .3, and the probability of a 57 game hitting streak is estimated to be .7599x11,068=7,544.
  • #1
Thecla
132
10
538 is a website that does election data and polling analysis. On its Culture page there are some math, probability, and statistics puzzles. The following is one that stumped me:
Joe DiMaggio's record 56 game hitting streak is considered hard to break(getting at least one base hit in 56 consecutive games)
The question was what is the probability of a .300 hitter breaking this record in a 20 year baseball career?
The simplifying assumptions are:
a) The hitter gets 4 at-bats in each game
b) he plays in 160 games/year in each of his 20 years of playing.
c) There are no walks or sacrifice flies in his plate appearances
d) a hitting streak can span across seasons

I got part of the puzzle right but I couldn't complete it
The parts I got right
A)The probability of getting one hit in a game
.3^1 x .7^3=.1029x4=41.16%,since there are 4 ways of getting one hit
probability of 2 hits in a a game is .0441x6=26.46% since there are 6 ways of doing this
similarly 3 hits in a game ,the probability is 7.56%
and 4 hits in a game the probability is 0.81%

The sum is the probability of getting at least one hit in a game is 75.99%
The other part I think is right is the probability of a 57 game hitting streak is : (0.7599)^57

Now the part where I am confused.
How do you figure out the probability of getting this streak in a 3200 game career(160 games per year x 20 years)?T
 
Physics news on Phys.org
  • #2
Hi,

How many different 57 game streaks are there in a 3200 game career ?
 
  • Like
Likes Dale
  • #3
If there is only one 57 game hitting streak in the career, then the hitting streak can start on the first game of his career or the(3200-57) game i.e. the 3143 game of his career. So I would think 3143 game streaks.
 
  • #4
Thecla said:
If there is only one 57 game hitting streak in the career, then the hitting streak can start on the first game of his career or the(3200-57) game i.e. the 3143 game of his career. So I would think 3143 game streaks.
That gets you started. Can you take it any further?

Another approach is as follows:

Calculate the probability of getting a base hit in a single game. Let's call this ##p##.

His career is essentially a sequence of binomial trials with parameter ##p##.

Now, we consider each "streak" of consecutive games with a base hit. A streak could be any length from ##0##. What is the probability of a streak being 57 games or more? (a)

Next, can you calculate the average length of a streak? Note that there is a general result for binomial trials.

That would give you an estimate of the number of streaks in a 3200-game career. (b)

Finally, can you use (a) and (b) to estimate the probability of a 57 game streak in a 3200 game career?
 
  • #5
Thecla said:
I got part of the puzzle right but I couldn't complete it
The parts I got right
A)The probability of getting one hit in a game
.3^1 x .7^3=.1029x4=41.16%,since there are 4 ways of getting one hit
probability of 2 hits in a a game is .0441x6=26.46% since there are 6 ways of doing this
similarly 3 hits in a game ,the probability is 7.56%
and 4 hits in a game the probability is 0.81%

The sum is the probability of getting at least one hit in a game is 75.99%
The other part I think is right is the probability of a 57 game hitting streak is : (0.7599)^57

Now the part where I am confused.
How do you figure out the probability of getting this streak in a 3200 game career(160 games per year x 20 years)?

This is correct. There was an easier way. The probability of no hit is ##0.7##. So the probably of no hit in a 4-at-bat game is ##0.7^4## and the probability of at least one hit is ##1 - 0.7^4 = 0.7599##. Let's call this ##p##, as above.

And, yes, if you take the first streak of a player's career, then the probability of getting 57 hitting games is ##p^{57}##.

Now see post #4 for how to continue.
 
  • #6
playing 160 games per year for 20 years is an heroic assumption. If you look at the careers of the elite few hitters with 20 year careers, there are few seasons where they played 160 games

Adrian Beltre, for example just retired after 21 seasons, where he played a total of 2933 games and 11,068 at-bats, for an average of 140 games/ season and 3.8 at-bats per game.

https://www.baseball-reference.com/players/b/beltrad01.shtml
If you are going to calculate the odds of a streak, can't assume 4 at-bats per game - there are games where a player for one reason or another only gets one or two attempts at the plate.

And averaging hit probabilities entails averaging pitcher difficulty, which if calculating a streak, is not a good assumption. You can't assume batting 300 in a game against an elite pitcher who is performing well. A player will see a couple of these in any 56 game streak

Of course the real probability relative to DiMaggio is not the odds of a single 300 hitter, but the odds of any hitter in MLB getting a 56 game streak - the sample is every batter in the league that plays at least 56 games
 
  • #7
BWV said:
Of course the real probability relative to DiMaggio is not the odds of a single 300 hitter, but the odds of any hitter in MLB getting a 56 game streak - the sample is every batter in the league that plays at least 56 games

The problem as stated is the odds against an individual batter achieving something over a career. The odds that any batter does something in a given season is a different problem altogether.
 
  • #8
The Riddler puzzles at 538 can be a lot of fun.

With no pun intended, this is a problem of runs, and dealing with dependencies is the challenge here. I haven't seen this done properly yet with any of the posts on this thread. I've written about this topic many times on the forum, most recently at major length here

https://www.physicsforums.com/threads/probability-of-n-consecutive-tails-in-n-coin-tosses.959909/- - - - -
You can also get an extremely nice Poisson estimation for this problem, which is a standard technique when dealing with rare events like having 56 games in a row with some attribute. (Though understanding why it works and in particular bounding the error, takes a lot more machinery.)

I like simple, powerful and general things, so my vote is to get the Union Bound estimate, which I suppose is what posts 2 and 4 might be guiding towards. For this puzzle, union bound gives a very good estimate for all of the 'regular' batters (and a much weaker bound for the steroids batter).

Marching through (i.e. calculating via a for loop) the recurrence relation is a reasonable, albeit computationally intensive way of getting the exact answer. There are a few ways of setting up the recurrence -- the one I used for this Riddler appears longer than the official solution, though it has much nicer attributes (in particular is amenable to exponential tilting).

- - - - -
I put a lot of work into that linked thread. Unfortunately OP didn't understand the math or that this is a solved problem but somehow felt qualified to argue in favor of a personal theory.
 
Last edited:
  • Like
Likes BvU and PeroK
  • #9
Thanks PeroK but I amstill stumped. I do not see why the probability is not just 3143x(.7599)^57
 
  • #10
Thecla said:
Thanks PeroK but I amstill stumped. I do not see why the probability is not just 3143x(.7599)^57

Setting aside minor 0 vs 1 indexing issues, this is the result that can be arrived at by applying the union bound.
- - - - -
You should know basic probability rules -- you can multiply probabilities of (intersections of) independent events, and you can add probabilities of (unions of) mutually exclusive events.

This puzzle isn't purely comprised of either so you simply cannot write the answer as equals ##\sum_{i=1}^{3143} p^{57}##, where ##p = 0.7599##

you can use that as an inequality though-- that is the union bound
- - - - -

a smarter approach
as outlined in that link, partition things so
##A_0## is event that the first ##k=57## games have a run /streak,
and everything else of interest is
##A_i## -- the event that a run/streak occurs preceded by a non-hitting game. for ##i\geq 1##

Thus you should have
##P\big(A_0\big) = p^{57}## and
##P\big(A_i\big) = (1-p)\cdot p^{57}##
again, where ##p = 0.7599##
This should give union bound estimate of
##P\big(A_0\big)+ (n-k) \cdot P\big(A_i\big) = 0.0001206##

(with possibly an immaterial fencepost error on n-k vs n-k+1 related to indexing nits, no pun intended of course)

while technically an upper bound on the probability of having a record breaking streak, this is extremely close to the exact result.
 
  • #11
Thecla said:
Thanks PeroK but I amstill stumped. I do not see why the probability is not just 3143x(.7599)^57

There are two obvious problems with that. First, if ##n = 3143## is the number of games, then by increasing ##n## you would get a probability greater than 1.

Second, if ##p_s## is the probability of a streak of 57 games, then a player does not have 3143 streaks in his career. You could look at each streak as a "set" of variable length. A typical career might look like:

##1101011111100110- \dots##

This is the first 16 games of a player's career with ##1## for a hit and ##0## for no hits in that game. You can see that there were only 5 streaks there.

The probability you have calculated is roughly the probability of a player beating the record if they play until they have had a total of 3143 games without a hit. That's more like 10,000 - 15,000 games.

That's why the average number of games in a streak is important, as that deteremines how many chances the player has in his career to break the record. It's not directly the number of games.
 
  • #12
Hi @Thecla:

I hope you will find this of some interest.

Using a spreadsheet, I performed a Monte Carlo simulation of 10,000 careers, each of 3200 games, each game with a probability of 0.7599 of at least one hit in that game. I saw two careers with a continuous series of games, each with at least one hit, in which the series was equal to or greater than 56 games. One was 56 games, and the other 58 games.

I think that there is a possible psychological element that statistics do not consider. After a batter has begun a long streak, pitchers may well change there style of pitching, especially in the last at bat of that game for the hitter. It would be interesting to know the relative frequency during a streak of say more than 30 games, of a hit in the last at bat compared with other at bats.

Regards,
Buzz
 
  • Like
Likes PeroK
  • #13
Buzz Bloom said:
Hi @Thecla:

I hope you will find this of some interest.

Using a spreadsheet, I performed a Monte Carlo simulation of 10,000 careers, each of 3200 games, each game with a probability of 0.7599 of at least one hit in that game. I saw two careers with a continuous series of games, each with at least one hit, in which the series was equal to or greater than 56 games. One was 56 games, and the other 58 games.

To back this up with my crude estimate. We have a series of binomial trails with success at 0.760.76. The average length of a run of games is 4.164.16 games (including the failure). In general it is 1/prob of failure.

Roughly, each batter gets 3200/4.163200/4.16 runs of games in a career, which is roughly the number of chances he has to break the record. I reduced this to 3144/4.16=7563144/4.16=756 as any run that starts too late in his career cannot break the record.

The probability of getting a run of 57 or more games is about 1.6×10−71.6×10−7.

Finally, therefore, the probability of breaking the record in a career with about 756756 attempts on this is roughly 756756 times the above probability. This gives a probability of 0.0001210.000121.

This is roughly once in every 83008300 careers.

PS the estimate of equalling or breaking the record (i.e. getting 56 or more games) is once every 63006300 careers.
 
Last edited:

1. What is the probability of breaking a 56 game hitting streak record?

The probability of breaking a 56 game hitting streak record is extremely low. In fact, it has only been accomplished once in the history of Major League Baseball by Joe DiMaggio in 1941. The chances of a player having a 56 game hitting streak are estimated to be 1 in 48,000.

2. How do you calculate the probability of breaking a 56 game hitting streak record?

The probability of breaking a 56 game hitting streak record can be calculated by multiplying the individual probabilities of getting a hit in each game. For example, if a player has a 25% chance of getting a hit in each game, the probability of a 56 game hitting streak would be 0.25^56, which is approximately 1 in 1.2 trillion.

3. What factors affect the probability of breaking a 56 game hitting streak record?

There are several factors that can affect the probability of breaking a 56 game hitting streak record. These include the skill level of the player, the quality of opposing pitchers, injuries, and luck. Additionally, as the streak continues, the pressure and media attention can also impact a player's performance.

4. Has anyone come close to breaking the 56 game hitting streak record since Joe DiMaggio?

Yes, there have been a few players who have come close to breaking the 56 game hitting streak record since Joe DiMaggio. In 1978, Pete Rose had a 44 game hitting streak, and in 1987, Paul Molitor had a 39 game hitting streak. However, no one has been able to surpass DiMaggio's record.

5. Is it possible for a player to break the 56 game hitting streak record in the modern era?

While it is not impossible for a player to break the 56 game hitting streak record in the modern era, it is highly unlikely. The level of competition and specialization in pitching has increased significantly since DiMaggio's record was set in 1941. Additionally, with the advancements in technology and analytics, opposing teams can better strategize and prepare for a player on a hitting streak. This makes it even more challenging for a player to maintain a long hitting streak.

Similar threads

Replies
9
Views
972
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
5K
  • STEM Academic Advising
Replies
6
Views
2K
  • General Discussion
Replies
6
Views
1K
  • Biology and Medical
9
Replies
287
Views
19K
  • STEM Career Guidance
Replies
3
Views
2K
  • Introductory Physics Homework Help
Replies
14
Views
2K
  • High Energy, Nuclear, Particle Physics
2
Replies
69
Views
12K
  • Sci-Fi Writing and World Building
Replies
2
Views
2K
Back
Top