# Question on averages

1. Mar 4, 2015

### bigplanet401

1. The problem statement, all variables and given/known data
Player A has a higher batting average than player B for the first half of the baseball season. Player A also has a higher batting average than player B for the second half of the season. Prove, or disprove, that player A has a higher batting average than player B for the entire season.

2. Relevant equations

Arithmetic mean

3. The attempt at a solution

Let rA, rB be the batting average of A and B, respectively in the first half of the season (and rA', rB' in the second half of the season). I tried to compare the overall average by taking a weighted average of each player's performance in the first and second half (nA and nB are the number of balls hit in the first season, primes for the second season)

$$\frac{n_A r_A + n^\prime_A r^\prime_A}{n_A + n^\prime_A} \lessgtr \frac{n_B r_B + n^\prime_B r^\prime_B}{n_B + n^\prime_B}$$

But then I get lost when I try to find something that has rA and rB on one side of the inequality sign. The algebra seems to get very tedious and I'm wondering if I'm on the right track.

2. Mar 4, 2015

### haruspex

You have generalised the question to the point where it is no longer true. E.g. scores (0; 8, 8) produce averages (0; 4), while scores (1, 1; 9) produce averages (1; 9). Take the 'half season' more literally.

3. Mar 4, 2015

### bigplanet401

I'm not sure I understand. Do you mean take nA = n'A and nB = n'B? The number of pitches (the n's here) might be different between seasons (more balls were pitched) and batters.

4. Mar 4, 2015

### SammyS

Staff Emeritus
Do you suspect it's true and thus are trying to prove that it's true?

or -

Do you suspect it's false, so you need a counter example .

5. Mar 4, 2015

### haruspex

I think I just displayed my ignorance of sports.
If the 'half season' is irrelevant, for the reason you give, then the question is flawed. It should read "prove or disprove that player A necessarily has a higher batting average....". On that basis, my first reply was overhelpful.

6. Mar 4, 2015

### bigplanet401

I'm not sure and am only trying to work through the algebra (if this is the right approach) so that I can make the right deduction. Right now this looks like an algebraic mess, which makes me believe there's a simpler way.

7. Mar 4, 2015

### haruspex

If the claim is true then your algebraic approach should be the way to go. If you cannot see how to proceed, maybe it's because the claim is false. Time to look for a counterexample.
(With such questions, I generally alternate between looking for a proof that it's true and looking for a proof that it's not true. Dead ends in one can lead to insights into how to proceed with the other.)

8. Mar 5, 2015

### bigplanet401

Okay...I think I might have a solution.

Suppose A has the higher overall average. Then

$$\frac{n_Ar_A+n^\prime_A r^\prime_A}{n_A+n^\prime_A} > \frac{n_B r_B+n^\prime_B r^\prime_B}{n_B+n^\prime_B}$$

Now suppose, without loss of generality, that B's performance his best in the second season, and that A's performance was his worst in the first season. If B (A) had a very large number of at-bats in his second (first) season, then the worst A could have done is r_A and the best B could have done is r'_B, and we have to ask if

$$r_A > r^\prime_B \, (*)$$

We're only told that r_A > r_B and r'_A > r'_B, though. What if r_A = 0.200 and r'_B = 0.250? We can just say r_B = 0.175 and r'_A = 0.300 and still keep these assumptions. So (*) breaks and we can't assume that A has the higher average for the season.

9. Mar 5, 2015

### PeroK

As every sports commentator who ever lived would dispute your suggestion that this is false, can we have a clear countereaxmple with all the relevant numbers?

Hint: why not have $n_a = 1$ and $n'_b = 1$

10. Mar 5, 2015

### bigplanet401

Here is one example that meets the initial assumptions but shows why B wins out in the end:

nA = 100, n'A = 5; nB = 5, n'B = 100

rA = .200, r'A = .300; rB = .175, r'B = .250

$$\frac{20 + 1.5}{105} \overset{?}{>} \frac{0.875 + 25}{105}$$
$$21.5 \overset{?}{>} 25.875 \qquad \times$$

11. Mar 5, 2015

### PeroK

Would that be a surprise for most baseball fans, do you think?

12. Mar 5, 2015

### SammyS

Staff Emeritus
Why didn't I notice this earlier? (← rhetorical question to self)

Batting average is: $\ \displaystyle \frac{\text{number of hits}}{\text{number of (official) at bats}}$

.

13. Mar 5, 2015

### haruspex

Perhaps it was too cryptic, but if you reread my post #2 you'll see I gave you a simple counterexample.

14. Mar 5, 2015

### Ray Vickson

What message are you responding to here?

15. Mar 5, 2015

### haruspex

As I read the thread, post #8 offered an algebraic and somewhat handwaving 'disproof'. PeroK's post #9 points out that since many would find the result surprising, it would be rather more persuasive to construct a detailed counterexample.

16. Mar 5, 2015

### SammyS

Staff Emeritus
Maybe that was too subtle, so to be more direct:

If rA, rB, ... are batting averages, and nA, nB, ... are numbers of hits, then products such as $n_A \cdot r_A$ are not very helpful.

If we let mA, mB, ... be the number of 'at bats', then $\displaystyle r_A=\frac{n_A}{m_A}\ ,$

and $\displaystyle \frac{m_A r_A + m^\prime_A r^\prime_A}{m_A + m^\prime_A}$ is the full season batting average of player A.

17. Mar 9, 2015

This is not true, but is usually discussed in the setting of Simpson's Paradox.

18. Mar 9, 2015

### SammyS

Staff Emeritus
There are handy "Quote" and "Reply" features to help give readers a clue as to which particular post you may be responding to.

Beyond that, you could indicate explicitly what it is that your "This" refers to.