Improving Tennis Analysis: Incorporating Three Outcomes and Serve Probabilities

Wes Turner · Aug 26, 2018

I am trying to analyze the game of tennis. I started very simply by analyzing just the ground strokes, ignoring the serve, and calculating the odds of winning a single point given the odds of returning each ground stroke.

I got a surprising (to me) result showing that the player to goes first always has a lower probability of winning the point than the opponent. My derivations and results are in the attached PDF. I would appreciate any critical comments.

Are my formulas correct?

Thanks

PeroK · Aug 26, 2018

Wes Turner said:

I am trying to analyze the game of tennis. I started very simply by analyzing just the ground strokes, ignoring the serve, and calculating the odds of winning a single point given the odds of returning each ground stroke.

I got a surprising (to me) result showing that the player to goes first always has a lower probability of winning the point than the opponent. My derivations and results are in the attached PDF. I would appreciate any critical comments.

Are my formulas correct?

Thanks

If I understand what you are doing: you have a game where the players take turns to hit a ball. If it lands in, the point continues. If it lands out the player loses the point.

If the players have an equal chance on every stroke of hitting the ball in, then the player to go first will lose more often than win.

That would seem to be obvious, since there's nothing to be gained by going first. You can't put your opponent under pressure or influence their stroke in any way, under your assumptions. All you can do is lose the point. The best you can hope for is to get close to 50%.

Wes Turner · Aug 26, 2018

PeroK said:

If I understand what you are doing: you have a game where the players take turns to hit a ball. If it lands in, the point continues. If it lands out the player loses the point.

If the players have an equal chance on every stroke of hitting the ball in, then the player to go first will lose more often than win.

That would seem to be obvious, since there's nothing to be gained by going first. You can't put your opponent under pressure or influence their stroke in any way, under your assumptions. All you can do is lose the point. The best you can hope for is to get close to 50%.

It may be obvious to you, but it wasn't to me and it goes against my on-court experience. So either my perception of my on-court experience is wrong or my assumptions are incorrect or incomplete.

One thing that came out of this analysis that I hadn't thought about before is that the player to go first is also the one with the first chance to lose. I think that's part pf what you are saying. I think that's a useful way to look at the game.

It's not accurate to say that the players cannot put pressure on each other. They can and do. I don't think that falls out from my assumptions. Some players try to hit "winners" on almost every shot. If they are successful, that is, get the ball in, they usually win that point. But they also make more errors than less aggressive players. But even cautious players try to hit the ball where it is more difficult to return. My analysis is based on averages. The aggressive players win more points, but they hit more losers. Cautious players hit fewer winners, but they hit fewer losers.

This simple analysis does not accurately reflect the actual game of tennis because it does not include the serve, which is how tennis points really start. I plan to expand the analysis to include percentages for the serve and serve return. That could change the odds. The serve is the one shot where the server has complete control. In the pros, most players "hold serve" most of the time. This is partly because they win the point outright with a serve that is not returned and partly because the return is weak giving the server an advantage on the first ground stroke. But at our level of play, I would wager that the serve stats are not that much different from those for the ground strokes.

I have uploaded the spreadsheet that I used to make these calculations. It shows that you are correct that the best one can do is get close to 50%. Here are some numbers:

Code:

 PGS-A   PGS-B   PPA-A   PPA-B
  50%     50%     33%     67%
  60%     60%     38%     63%
  70%     70%     41%     59%
  80%     80%     44%     56%
  90%     90%     47%     53%
  95%     95%     49%     51%

And here's another interesting (to me) artifact. A player whose ground stroke percentage is 50% can never have more than a 50% chance of winning any point because they lose 50% on their first hit. Here are some more numbers showing the difference is return rates needed for Player A to have more than a 50% chance of winning the point.

Code:

 PGS-A   PGS-B   PPA-A   PPA-B
  50%      2%     49%     51%
  60%     30%     51%     49%
  70%     55%     51%     49%
  80%     74%     51%     49%
  90%     88%     52%     48%

Did you check my equations to see if they are correct?

PeroK · Aug 26, 2018

Wes Turner said:

It may be obvious to you, but it wasn't to me and it goes against my on-court experience. So either my perception of my on-court experience is wrong or my assumptions are incorrect or incomplete.

The assumptions are certainly inadequate to bear any relation to on-court experience. Your assumptions are more in line with, say, a game where each player takes turns to serve and the first to serve a fault loses. That would fit your model. Or, where each player tosses a coin and the first to toss a tail loses. In all these cases, going first is clearly a disadvantage.

But, each shot in a rally may be different from the previous one, so there is no obvious way to analyse the problem along these lines. Each player must choose between a low risk shot and a high risk shot each time etc.

Wes Turner said:

I have uploaded the spreadsheet that I used to make these calculations. It shows that you are correct that the best one can do is get close to 50%.

There is a quick way to do this for the simple situation where each player takes turns and each is equally likely to fail on each attempt:

Let ##p## be the probability of success on anyone shot and ##P## be the probability that the player going first wins.

The first shot has two possible outcomes: success (with probability ##p##) and failure with probability ##1-p##. In order for the first player to win the game they must be successful on their first shot. If they are, then the second player is left in an identical position, from which their chance of winning must be ##P##. This gives:

##P = p(1-P)##

Which can be rearranged to give:

##P = \frac{p}{1+p}##

For example, if ##p = 1/2## then ##P = 1/3##.

Wes Turner · Aug 26, 2018

PeroK said:

The assumptions are certainly inadequate to bear any relation to on-court experience.

Isn't that just a tad harsh? Surely they bear some relation.

Your assumptions are more in line with, say, a game where each player takes turns to serve and the first to serve a fault loses. That would fit your model. Or, where each player tosses a coin and the first to toss a tail loses. In all these cases, going first is clearly a disadvantage.

But, each shot in a rally may be different from the previous one, so there is no obvious way to analyze the problem along these lines. Each player must choose between a low risk shot and a high risk shot each time etc.

Yes, of course, in a real game no two shots are exactly the same and players will take more chances in some situations than others. But averages are still of some use in analyzing what makes a difference in the play. No two games in any sport are exactly the same, yet the odds makers still publish and use averages. A .250 hitter will, on average, get a hit every 4th time at bat even though the pitchers and the pitches vary dramatically as do the situations. But a .300 hitter will always get paid more than a .250 hitter, everything else equal, because, on average, they will get more hits.

My goal is to get a general sense of the impact on a player's overall success (winning points, games, sets, and matches) by improving their success rate at returning each ball. If a player improves their return rate by 5%, what does that do to their odds of winning that point, that game, that set, and that match? For that analysis, I think working with averages has merit. It's not perfect, but I think it has validity.

I posted here mainly to check my preliminary equations. So far, neither you nor anyone else has commented on that.

Once I confirm that these equations are correct, I plan to expand them to include parameters for first and second serve and the returns, with the ability to assign different average probabilities for each. That will allow me to calculate expected odds of winning a point, game, set, and match.

There is a quick way to do this for the simple situation where each player takes turns and each is equally likely to fail on each attempt:

Let ##p## be the probability of success on anyone shot and ##P## be the probability that the player going first wins.

The first shot has two possible outcomes: success (with probability ##p##) and failure with probability ##1-p##. In order for the first player to win the game they must be successful on their first shot. If they are, then the second player is left in an identical position, from which their chance of winning must be ##P##. This gives:

##P = p(1-P)##

Which can be rearranged to give:

##P = \frac{p}{1+p}##

For example, if ##p = 1/2## then ##P = 1/3##.

That agrees with my results for the special case where both players have the same return rates. But I don't follow the logic that got you to

##P = p(1-P)##

How did you avoid having to sum the infinite series? Maybe my math skills are not up to this task.

In any case, I am looking for the more general case where the players are not at the same skill level. That's why I used the infinite series, as explained in the PDF I attached.

Is there a similar simple approach to the case where p1 is the probability of success on anyone shot for player 1 and p2 is the probability for player 2?

mfb · Aug 26, 2018

The first two shots have completely different success rates (you can even repeat the first one once!). I would expect that even the third and fourth one differ.

Wes Turner said:

How did you avoid having to sum the infinite series?

By using the symmetry. If the serve was successful (and if we assume every shot is the same) then the second player must win with probability P. Which means player 1 must win with probability (1-P) in this case. To win, player one must make their first shot (probability p) and then get this 1-P chance. Therefore their winning probability is p(1-P). And it is P by definition. P=p(1-P).

PeroK · Aug 27, 2018

Wes Turner said:

In any case, I am looking for the more general case where the players are not at the same skill level. That's why I used the infinite series, as explained in the PDF I attached.

Is there a similar simple approach to the case where p1 is the probability of success on anyone shot for player 1 and p2 is the probability for player 2?

Yes:

##P_1 = p_1(1 - P_2)##

Where ##P_1## is the probability of player 1 winning the game (if they start) and ##P_2## is the probability of player 2 winning the game if they start. With ##p_1, p_2## the probabilities of success on any shot.

And, again by symmetry:

##P_2 = p_2(1 - P_1)##

Putting these together gives:

##P_1 = \frac{p_1(1-p_2)}{1-p_1p_2}##

Note (sanity check): if ##p_1 = p_2## then this reduces to the previous result.

Note also that this is not, in my opinion, any use for tennis, but is valid for other games where the players take turns at a non-changing task, with different skill levels (or different odds of success).

PS you can do these by infinite sums, but the recursive shortcut is a neat trick in these types of problem.

PeroK · Aug 27, 2018

Wes Turner said:

My goal is to get a general sense of the impact on a player's overall success (winning points, games, sets, and matches) by improving their success rate at returning each ball. If a player improves their return rate by 5%, what does that do to their odds of winning that point, that game, that set, and that match? For that analysis, I think working with averages has merit. It's not perfect, but I think it has validity.

Let's take the serve as an example. I start with a pit-pat serve that every time you hit past me for a winner. I say to my coach: I'm getting 100% first serves in but I'm losing every point? The answer is that I'm not hitting my first serve hard enough.

So, I serve harder and get only 50% of first serves in. But, I win 80% of those points. Now, I'm winning more points with a lower first serve percentage.

It's really what you do with the ball (not just getting it into court) that counts.

PeroK · Aug 27, 2018

@Wes Turner Here's an older thread that might interest you:

https://www.physicsforums.com/threads/easy-probability-problem.768127/#post-4836431

Wes Turner · Aug 29, 2018

PeroK said:

Yes:

##P_1 = p_1(1 - P_2)##

Where ##P_1## is the probability of player 1 winning the game (if they start) and ##P_2## is the probability of player 2 winning the game if they start. With ##p_1, p_2## the probabilities of success on any shot.

And, again by symmetry:

##P_2 = p_2(1 - P_1)##

Putting these together gives:

##P_1 = \frac{p_1(1-p_2)}{1-p_1p_2}##

Note (sanity check): if ##p_1 = p_2## then this reduces to the previous result.

That's the same result I got using infinite sums, but I just can't follow your symmetry path. Is there someplace you suggest I go read up?

Note also that this is not, in my opinion, any use for tennis, but is valid for other games where the players take turns at a non-changing task, with different skill levels (or different odds of success).

Yes, you have said that several times. And I have replied each time that this is just the first step in a much larger effort to approximate the impact of improving one player's efficiency at getting the ball in the court -- everything else held constant. If I can figure this step out, I'll add the rest and then maybe you will agree with me that it is of more than "not of any use".

PS you can do these by infinite sums, but the recursive shortcut is a neat trick in these types of problem.

Clearly, if only I could comprehend it.

Wes Turner · Aug 29, 2018

PeroK said:

Let's take the serve as an example. I start with a pit-pat serve that every time you hit past me for a winner. I say to my coach: I'm getting 100% first serves in but I'm losing every point? The answer is that I'm not hitting my first serve hard enough.

This is a pathological and essentially impossible scenario. But suppose you got your weak serve in 95% of the time, I hit a winner off of it 95% of the time, and there were some odds for the returns after that. In my final (completed) analysis, I would have the 95% for your pit-pat serve, the 95% for my return, and the odds for the returns after that as parameters. Then you could change your serve percentage parameter from 95% to 96% and see, everything else the same, what difference that would make. Or you could lower it to 90% and also lower the return percentage and see what that does.

So, I serve harder and get only 50% of first serves in. But, I win 80% of those points. Now, I'm winning more points with a lower first serve percentage.

It's really what you do with the ball (not just getting it into court) that counts.

Yes, of course. But is it better to get 50% in and win 80% or get 60% in and win 75%? If I can get this analysis done, you should be able to make that comparison.

Again, these are averages, so actual play will differ, but over the hundreds of shots in a tennis match, I think they will indicate something useful.

So, if you will just be a little patient and help me get the math right, I think I will be able to show you some value. Deal?

PeroK · Aug 29, 2018

I'm not sure how much I can help. If you want to increase the complexity you may be better with a computer model.

Anyway, take some time out to watch the US Open.

Wes Turner · Aug 29, 2018

PeroK said:

I'm not sure how much I can help. If you want to increase the complexity you may be better with a computer model.

You have already been very helpful. As I improve my model, I'll post more questions. If you have something to contribute, I'll be happy to have it.

Anyway, take some time out to watch the US Open.

Ok

PeroK · Aug 29, 2018

Wes Turner said:

That's the same result I got using infinite sums, but I just can't follow your symmetry path. Is there someplace you suggest I go read up?

Clearly, if only I could comprehend it.

I'm going to use ##p,q## for the probabilities to save all those subscript.

If we look at the first two shots, then we have three outcomes:

The first player loses on the first shot, with probability ##1-p##.

The first player wins if the second player misses their first shot, with probability ##p(1-q)##.

Both players make their first shot, with probability ##pq##.

In this last case we are back to the initial situation. The first player has the same probability of winning from here as he did initially. If this probability is ##P##, then we have:

##P = p(1-q) + pqP##

Which we can easily all solve for ##P##.

This avoids the need to do an infinite sum. And is even neater!

russ_watters · Aug 29, 2018

Wes Turner said:

Isn't that just a tad harsh? Surely they bear some relation.

You pretty much pulled the assumptions out of thin air even while knowing they go against on-court experience and you know the resulting answer goes against on-court experience. I have to ask: why haven't you pulled actual data to find out what the real probabilities are? The US Open is going on right now. Pull some stats!

...also, I only skimmed, but I see you defined terms for serve and return percentages, but then didn't use them? They are the most critical factors in winning percentage. Heck, even the following groundstroke winning percentages are influenced by the serve.

Dr. Courtney · Aug 30, 2018

Your model emphasizes avoiding unforced errors rather than winners. An unforced error is a ball a player gets a racquet on with a good chance to return it, but hits it out or into the net. A winner is an offensive shot placed so well that the opponent can't touch it (or can only hit it with the frame rather than the strings).

I'm an amateur tennis enthusiast and local Senior Golden Games champ in my age bracket. Competing in the state Senior Olympics next month. Doing well there would qualify me for the national Senior Olympics next June. In any case, my style of play is what is most commonly called being a "pusher." Just trying to get everything back against most opponents while waiting for them to make the first unforced error. At my level of recreational play, this can be a very effective strategy, But it does tend to lead to long points and lots of running around the court chasing the ball down. Against a lot of opponents, my fitness really helps. (I bike 2000 miles a year.) Most of my matches are won by making few unforced errors. Highlight reel winners are rare events.

But a few weeks ago, I played a gentleman who was an even better and more committed pusher than I was. The only chance I had was to create offense, so I devised a strategy to wait for a good approach shot and then go to the net to try and force an error or hit a winner. This dude had a great lob, so I lost the first match, but made some adjustments and beat him the following week. Your model would work very poorly for matches like this where winners are a large component.

Take a gander at a few US Open matches this week. After each set, they have a graphic showing (among other vital stats) the number of winners and the number of unforced errors each player had in the set. See: https://www.eurosport.com/tennis/roger-federer-v-marin-cilic-match-stats_sto6503082/story.shtml

bhobba · Aug 30, 2018

Interesting thread. Tennis and cricket are my favorite watching sports - table tennis was my favorite playing sport before arthritis stopped me.

Just a bit of humor (with maybe some serious stuff thrown in) - analysing tennis is interesting - but can you analyse Kyrgios - if so Tennis Australia will likely pay you a fortune. Player psychology seems one of the most important factors, if not the most important factor. Kyrgios doesn't even seem to even care when playing lesser players, in fact he seems to play tennis just for fun - he is not really in love with Tennis - it's just something he, when hot, is very good at. Kind of reminds me of another great Australian player - Lew Hoad. He couldn't care less about playing lesser players but bought his A game out when playing someone he thought worth it such as his good friend Pancho Gonzales who said of Lew - 'He was the only guy who, if I was playing my best tennis, could still beat me'. Gonzales was well known not to be - how to put it - the friendly type - but he really liked Lew - everybody did - for Lew having good friends was more important than winning. When matched against someone who he thought worthy of his talent Lew gave his best - not because he enjoyed humbling opponents - only because it was unworthy of him as their friend not to give his best.

Thanks
Bill

bhobba · Aug 30, 2018

Dr. Courtney said:

In any case, my style of play is what is most commonly called being a "pusher."

So was I - the other characteristic of a pusher, and this was me for sure - they do not see the court as side to side but up and down. I would try to get to the mid court. If a player stayed on the baseline I would try and drop them or come and and do a drop volley - if they came in I would try and lob them. Like all pushers my nemesis was a true serve volleyer - they made me look sick - it was humiliating.

Seriously tactics vary with the type of player you are. Best to be an all court player - they have so many different strategies they can employ. They can be the unpredictable player - serving and volleying or staying back totally unpredictably. They can play percentage tennis. They can be the chameleon - the player your opponent would least like to see eg for a serve-volleyer the best strategy is to serve volley and take the net away from them - they hate that.

Thanks
Bill

Orodruin · Aug 30, 2018

As has already been indicated, the first step in making a more realistic model would be to include three possible outcomes, winner, returm, and unforced error. That should already make the model a lot better even if you are really on a continuum of possible shots. The next step would likely be to include different probabilities for the serve since that is the most impactful shot and the server has complete control.

As it stands, your model goes clearly against the most basic experience and data and in my book that makes it useless unless you refine it.

Improving Tennis Analysis: Incorporating Three Outcomes and Serve Probabilities

Attachments

Attachments

What is the purpose of analyzing the game of tennis?

What are the key elements of tennis that are analyzed?

How is data collected and analyzed in tennis?

What are the benefits of analyzing the game of tennis?

What are the limitations of analyzing the game of tennis?

Similar threads

Hot Threads

Recent Insights