# Calculate the mean, variance, stdev, and CV with 2 methods: check answers

• I

## Summary:

I am calculating some statistical measures for a homework assignment but did not seem to find the correct answer. I would love to clarify my understanding

## Main Question or Discussion Point

THE PROBLEM
6) Table 1 below contains data on offensive statistics for each game in the 2019 UW Husky Baseball
season. Answer the following questions and/or complete the specified tasks using these data. Do
everything by hand and show your work (good practice for the tests).
a. Construct a relative frequency histogram of the runs per game. You will first need to
construct a frequency table. Use individual bins for 0, 1, 2, 3…9 runs per game and then a
final bin of 10+ runs.
b. Calculate and plot the mean and median value using the raw data. Also calculate the mean
using the frequency table from part a. and compare (don’t plot).
c. Calculate the variance, standard deviation, and coefficient of variation using the raw data.
Repeat using the frequency table from part a. and compare (don’t plot). Why do the two
methods differ so much when the means were fairly close?

MY ATTEMPT:
- Using the raw data I get mean=5.71, variance=14.07, stdv=3.7 and CV=65.68%
-Using the frequency table where the last few data points are grouped into one bin (10+), I acqured mean=6.740, variance=4.778, stdev=2.186 and CV=32.4%.
I am concerned because the prompts says that the means acquired using both methods are fairly close. I don't think mine are very close, so I don't know whether my method could be wrong at some point.

I am using Population, not Sample calculation. For the frequency table, I construct my last bin to be 10-25 with a midpoint of 17.5. Is this accurate?
Attached below are the prompts and the data table.

Thank you very much for your consideration.  Related Set Theory, Logic, Probability, Statistics News on Phys.org
scottdave
Homework Helper
Just a quick look at the runs which are 10 or more: they are not evenly distributed between 10 and 25, so does it make sense to use 17.5 (halfway between min and Max of that bin)?

If bins were gathering counts of something like 1-10, 11-20, 21-30, etc, then it may be a good idea to pick the middle of the bin, especially for bins not at the edge of the distribution.

Unless each histogram bin can recreate every piece of data, then it is going to be an estimate of the data.

What might be a better number to use for the 10+ bin?

• Beth N
Just a quick look at the runs which are 10 or more: they are not evenly distributed between 10 and 25, so does it make sense to use 17.5 (halfway between min and Max of that bin)?

If bins were gathering counts of something like 1-10, 11-20, 21-30, etc, then it may be a good idea to pick the middle of the bin, especially for bins not at the edge of the distribution.

Unless each histogram bin can recreate every piece of data, then it is going to be an estimate of the data.

What might be a better number to use for the 10+ bin?
Thank you so much for answering! I see, that makes sense, 17.5 doesn't seem to be the best number. Does it mean I cannot use the midpoint in this case? Or should I calculate the average or mean of this data group and use it as the midpoint? In an example in class, see that the professor left the midpoint empty for the open-ended bin. However, how can find the mean using the method as instructed?

This is the frequency of each data point for the class 10+. the Median would be 12, the mean is 12.6 and mode is 10. Should I use one of these number for the midpoint?
 10 5 11 3 12 1 19 1 25 1

Last edited:
scottdave
Homework Helper
So their hint told you that the histogram method would be close to the answer calculated from the raw data. What if I Googled these baseball stats and for a page which showed the histogram without more information. You know what the data is, but think about baseball, and how often there are high scoring games.

I'm not sure what they are looking for, but I don't think they want you to calculate the average of the raw data within that bin. I wouldn't take a midpoint, but where most of the data is within the bin (or is likely to be). You have the advantage of knowing where all the actual data is, where would you think would be a good guess? I have my idea in mind.

WWGD
Gold Member
2019 Award
Beth, just to understand thw methodology you're using. When you compute the sample stats , do you use the midpoints of each bin to do the computations?

• Beth N
Beth, just to understand thw methodology you're using. When you compute the sample stats , do you use the midpoints of each bin to do the computations?
Hello!
Yes I use the midpoint, but i didn't make the bins to have a lower & upper limit except for the last bin, so for the previous bins, the midpoint is the data value itself.

• WWGD
So their hint told you that the histogram method would be close to the answer calculated from the raw data. What if I Googled these baseball stats and for a page which showed the histogram without more information. You know what the data is, but think about baseball, and how often there are high scoring games.

I'm not sure what they are looking for, but I don't think they want you to calculate the average of the raw data within that bin. I wouldn't take a midpoint, but where most of the data is within the bin (or is likely to be). You have the advantage of knowing where all the actual data is, where would you think would be a good guess? I have my idea in mind.
10 as it is the value with the greatest frequency?