Prove Player A Has Higher Batting Avg Than B for Entire Season

  • Thread starter Thread starter bigplanet401
  • Start date Start date
Click For Summary
SUMMARY

The discussion centers on proving whether Player A has a higher batting average than Player B for the entire baseball season, given that Player A has higher averages in both halves of the season. Participants analyze the problem using weighted averages based on the number of at-bats and batting averages for each half. A counterexample is provided, demonstrating that despite Player A's higher averages in both halves, Player B can still achieve a higher overall average due to differing at-bat counts, illustrating the complexities of comparing averages across different sample sizes.

PREREQUISITES
  • Understanding of arithmetic mean and weighted averages.
  • Familiarity with baseball statistics, specifically batting averages.
  • Basic algebra skills for manipulating inequalities.
  • Knowledge of Simpson's Paradox and its implications in statistics.
NEXT STEPS
  • Study the concept of Simpson's Paradox in detail.
  • Learn how to calculate weighted averages in statistical analysis.
  • Explore examples of counterexamples in statistical proofs.
  • Investigate the impact of sample size on statistical conclusions.
USEFUL FOR

Statisticians, baseball analysts, mathematics students, and anyone interested in understanding the implications of averages in comparative statistics.

bigplanet401
Messages
101
Reaction score
0

Homework Statement


Player A has a higher batting average than player B for the first half of the baseball season. Player A also has a higher batting average than player B for the second half of the season. Prove, or disprove, that player A has a higher batting average than player B for the entire season.

Homework Equations



Arithmetic mean

The Attempt at a Solution



Let rA, rB be the batting average of A and B, respectively in the first half of the season (and rA', rB' in the second half of the season). I tried to compare the overall average by taking a weighted average of each player's performance in the first and second half (nA and nB are the number of balls hit in the first season, primes for the second season)
[/B]
<br /> \frac{n_A r_A + n^\prime_A r^\prime_A}{n_A + n^\prime_A} \lessgtr<br /> \frac{n_B r_B + n^\prime_B r^\prime_B}{n_B + n^\prime_B}<br />

But then I get lost when I try to find something that has rA and rB on one side of the inequality sign. The algebra seems to get very tedious and I'm wondering if I'm on the right track.
 
Physics news on Phys.org
You have generalised the question to the point where it is no longer true. E.g. scores (0; 8, 8) produce averages (0; 4), while scores (1, 1; 9) produce averages (1; 9). Take the 'half season' more literally.
 
I'm not sure I understand. Do you mean take nA = n'A and nB = n'B? The number of pitches (the n's here) might be different between seasons (more balls were pitched) and batters.
 
bigplanet401 said:

Homework Statement


Player A has a higher batting average than player B for the first half of the baseball season. Player A also has a higher batting average than player B for the second half of the season. Prove, or disprove, that player A has a higher batting average than player B for the entire season.

Homework Equations



Arithmetic mean

The Attempt at a Solution



Let rA, rB be the batting average of A and B, respectively in the first half of the season (and rA', rB' in the second half of the season). I tried to compare the overall average by taking a weighted average of each player's performance in the first and second half (nA and nB are the number of balls hit in the first season, primes for the second season)
[/B]
<br /> \frac{n_A r_A + n^\prime_A r^\prime_A}{n_A + n^\prime_A} \lessgtr<br /> \frac{n_B r_B + n^\prime_B r^\prime_B}{n_B + n^\prime_B}<br />

But then I get lost when I try to find something that has rA and rB on one side of the inequality sign. The algebra seems to get very tedious and I'm wondering if I'm on the right track.
Do you suspect it's true and thus are trying to prove that it's true?

or -

Do you suspect it's false, so you need a counter example .
 
bigplanet401 said:
I'm not sure I understand. Do you mean take nA = n'A and nB = n'B? The number of pitches (the n's here) might be different between seasons (more balls were pitched) and batters.
I think I just displayed my ignorance of sports.
If the 'half season' is irrelevant, for the reason you give, then the question is flawed. It should read "prove or disprove that player A necessarily has a higher batting average...". On that basis, my first reply was overhelpful.
 
I'm not sure and am only trying to work through the algebra (if this is the right approach) so that I can make the right deduction. Right now this looks like an algebraic mess, which makes me believe there's a simpler way.
 
bigplanet401 said:
I'm not sure and am only trying to work through the algebra (if this is the right approach) so that I can make the right deduction. Right now this looks like an algebraic mess, which makes me believe there's a simpler way.
If the claim is true then your algebraic approach should be the way to go. If you cannot see how to proceed, maybe it's because the claim is false. Time to look for a counterexample.
(With such questions, I generally alternate between looking for a proof that it's true and looking for a proof that it's not true. Dead ends in one can lead to insights into how to proceed with the other.)
 
Okay...I think I might have a solution.

Suppose A has the higher overall average. Then

<br /> \frac{n_Ar_A+n^\prime_A r^\prime_A}{n_A+n^\prime_A} &gt; \frac{n_B r_B+n^\prime_B r^\prime_B}{n_B+n^\prime_B}<br />

Now suppose, without loss of generality, that B's performance his best in the second season, and that A's performance was his worst in the first season. If B (A) had a very large number of at-bats in his second (first) season, then the worst A could have done is r_A and the best B could have done is r'_B, and we have to ask if

<br /> r_A &gt; r^\prime_B \, (*)<br />

We're only told that r_A > r_B and r'_A > r'_B, though. What if r_A = 0.200 and r'_B = 0.250? We can just say r_B = 0.175 and r'_A = 0.300 and still keep these assumptions. So (*) breaks and we can't assume that A has the higher average for the season.
 
As every sports commentator who ever lived would dispute your suggestion that this is false, can we have a clear countereaxmple with all the relevant numbers?

Hint: why not have ##n_a = 1## and ##n'_b = 1##
 
  • #10
Here is one example that meets the initial assumptions but shows why B wins out in the end:

nA = 100, n'A = 5; nB = 5, n'B = 100

rA = .200, r'A = .300; rB = .175, r'B = .250

<br /> \frac{20 + 1.5}{105} \overset{?}{&gt;} \frac{0.875 + 25}{105}<br />
<br /> 21.5 \overset{?}{&gt;} 25.875 \qquad \times<br />
 
  • #11
Would that be a surprise for most baseball fans, do you think?
 
  • #12
Why didn't I notice this earlier? (← rhetorical question to self)

Batting average is: ##\ \displaystyle \frac{\text{number of hits}}{\text{number of (official) at bats}}##

.
 
  • #13
bigplanet401 said:
Here is one example that meets the initial assumptions but shows why B wins out in the end:

nA = 100, n'A = 5; nB = 5, n'B = 100

rA = .200, r'A = .300; rB = .175, r'B = .250

<br /> \frac{20 + 1.5}{105} \overset{?}{&gt;} \frac{0.875 + 25}{105}<br />
<br /> 21.5 \overset{?}{&gt;} 25.875 \qquad \times<br />
Perhaps it was too cryptic, but if you reread my post #2 you'll see I gave you a simple counterexample.
 
  • #14
PeroK said:
As every sports commentator who ever lived would dispute your suggestion that this is false, can we have a clear countereaxmple with all the relevant numbers?

Hint: why not have ##n_a = 1## and ##n'_b = 1##

What message are you responding to here?
 
  • #15
Ray Vickson said:
What message are you responding to here?
As I read the thread, post #8 offered an algebraic and somewhat handwaving 'disproof'. PeroK's post #9 points out that since many would find the result surprising, it would be rather more persuasive to construct a detailed counterexample.
 
  • Like
Likes   Reactions: PeroK
  • #16
SammyS said:
Batting average is: ##\ \displaystyle \frac{\text{number of hits}}{\text{number of (official) at bats}}\ ##.
Maybe that was too subtle, so to be more direct:

If rA, rB, ... are batting averages, and nA, nB, ... are numbers of hits, then products such as ## n_A \cdot r_A## are not very helpful.

If we let mA, mB, ... be the number of 'at bats', then ##\displaystyle r_A=\frac{n_A}{m_A}\ ,##

and ##\displaystyle \frac{m_A r_A + m^\prime_A r^\prime_A}{m_A + m^\prime_A} ## is the full season batting average of player A.
 
  • #17
This is not true, but is usually discussed in the setting of Simpson's Paradox.
 
  • #18
statdad said:
This is not true, but is usually discussed in the setting of Simpson's Paradox.
There are handy "Quote" and "Reply" features to help give readers a clue as to which particular post you may be responding to.

Beyond that, you could indicate explicitly what it is that your "This" refers to.
 

Similar threads

Replies
8
Views
2K
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
6K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 67 ·
3
Replies
67
Views
16K
  • · Replies 21 ·
Replies
21
Views
5K
  • · Replies 2 ·
Replies
2
Views
3K
Replies
10
Views
3K
Replies
7
Views
3K
  • · Replies 9 ·
Replies
9
Views
4K