Calculate the mean, variance, stdev, and CV with 2 methods: check answers

Beth N · Oct 5, 2019

THE PROBLEM
6) Table 1 below contains data on offensive statistics for each game in the 2019 UW Husky Baseball
season. Answer the following questions and/or complete the specified tasks using these data. Do
everything by hand and show your work (good practice for the tests).
a. Construct a relative frequency histogram of the runs per game. You will first need to
construct a frequency table. Use individual bins for 0, 1, 2, 3…9 runs per game and then a
final bin of 10+ runs.
b. Calculate and plot the mean and median value using the raw data. Also calculate the mean
using the frequency table from part a. and compare (don’t plot).
c. Calculate the variance, standard deviation, and coefficient of variation using the raw data.
Repeat using the frequency table from part a. and compare (don’t plot). Why do the two
methods differ so much when the means were fairly close?

MY ATTEMPT:
- Using the raw data I get mean=5.71, variance=14.07, stdv=3.7 and CV=65.68%
-Using the frequency table where the last few data points are grouped into one bin (10+), I acqured mean=6.740, variance=4.778, stdev=2.186 and CV=32.4%.
I am concerned because the prompts says that the means acquired using both methods are fairly close. I don't think mine are very close, so I don't know whether my method could be wrong at some point.

I am using Population, not Sample calculation. For the frequency table, I construct my last bin to be 10-25 with a midpoint of 17.5. Is this accurate?
Attached below are the prompts and the data table.

Thank you very much for your consideration.

scottdave · Oct 5, 2019

Just a quick look at the runs which are 10 or more: they are not evenly distributed between 10 and 25, so does it make sense to use 17.5 (halfway between min and Max of that bin)?

If bins were gathering counts of something like 1-10, 11-20, 21-30, etc, then it may be a good idea to pick the middle of the bin, especially for bins not at the edge of the distribution.

Unless each histogram bin can recreate every piece of data, then it is going to be an estimate of the data.

What might be a better number to use for the 10+ bin?

Beth N · Oct 5, 2019

scottdave said:

Just a quick look at the runs which are 10 or more: they are not evenly distributed between 10 and 25, so does it make sense to use 17.5 (halfway between min and Max of that bin)?

If bins were gathering counts of something like 1-10, 11-20, 21-30, etc, then it may be a good idea to pick the middle of the bin, especially for bins not at the edge of the distribution.

Unless each histogram bin can recreate every piece of data, then it is going to be an estimate of the data.

What might be a better number to use for the 10+ bin?

Thank you so much for answering! I see, that makes sense, 17.5 doesn't seem to be the best number. Does it mean I cannot use the midpoint in this case? Or should I calculate the average or mean of this data group and use it as the midpoint? In an example in class, see that the professor left the midpoint empty for the open-ended bin. However, how can find the mean using the method as instructed?

This is the frequency of each data point for the class 10+. the Median would be 12, the mean is 12.6 and mode is 10. Should I use one of these number for the midpoint?

10	5
11	3
12	1
19	1
25	1

scottdave · Oct 5, 2019

So their hint told you that the histogram method would be close to the answer calculated from the raw data. What if I Googled these baseball stats and for a page which showed the histogram without more information. You know what the data is, but think about baseball, and how often there are high scoring games.

I'm not sure what they are looking for, but I don't think they want you to calculate the average of the raw data within that bin. I wouldn't take a midpoint, but where most of the data is within the bin (or is likely to be). You have the advantage of knowing where all the actual data is, where would you think would be a good guess? I have my idea in mind.

WWGD · Oct 5, 2019

Beth, just to understand thw methodology you're using. When you compute the sample stats , do you use the midpoints of each bin to do the computations?

Beth N · Oct 5, 2019

WWGD said:

Beth, just to understand thw methodology you're using. When you compute the sample stats , do you use the midpoints of each bin to do the computations?

Hello!
Yes I use the midpoint, but i didn't make the bins to have a lower & upper limit except for the last bin, so for the previous bins, the midpoint is the data value itself.

Beth N · Oct 5, 2019

scottdave said:

So their hint told you that the histogram method would be close to the answer calculated from the raw data. What if I Googled these baseball stats and for a page which showed the histogram without more information. You know what the data is, but think about baseball, and how often there are high scoring games.

I'm not sure what they are looking for, but I don't think they want you to calculate the average of the raw data within that bin. I wouldn't take a midpoint, but where most of the data is within the bin (or is likely to be). You have the advantage of knowing where all the actual data is, where would you think would be a good guess? I have my idea in mind.

10 as it is the value with the greatest frequency?

Calculate the mean, variance, stdev, and CV with 2 methods: check answers

Discussion

Graduate Expected numbers of cards of a last color remaining

Graduate Probability puzzle

Undergrad The problem of points

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Graduate Cannot understand this corollary on surreal numbers

High School Bunkbed Conjecture Debunked?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect