Calculate the mean, variance, stdev, and CV with 2 methods: check answers

In summary, the data in table 1 was used to construct a relative frequency histogram and a mean, median, and mode. The relative frequency histogram and mean differed significantly, indicating that the histogram method is more accurate than the raw data method.
  • #1
Beth N
41
4
TL;DR Summary
I am calculating some statistical measures for a homework assignment but did not seem to find the correct answer. I would love to clarify my understanding
THE PROBLEM
6) Table 1 below contains data on offensive statistics for each game in the 2019 UW Husky Baseball
season. Answer the following questions and/or complete the specified tasks using these data. Do
everything by hand and show your work (good practice for the tests).
a. Construct a relative frequency histogram of the runs per game. You will first need to
construct a frequency table. Use individual bins for 0, 1, 2, 3…9 runs per game and then a
final bin of 10+ runs.
b. Calculate and plot the mean and median value using the raw data. Also calculate the mean
using the frequency table from part a. and compare (don’t plot).
c. Calculate the variance, standard deviation, and coefficient of variation using the raw data.
Repeat using the frequency table from part a. and compare (don’t plot). Why do the two
methods differ so much when the means were fairly close?

MY ATTEMPT:
- Using the raw data I get mean=5.71, variance=14.07, stdv=3.7 and CV=65.68%
-Using the frequency table where the last few data points are grouped into one bin (10+), I acqured mean=6.740, variance=4.778, stdev=2.186 and CV=32.4%.
I am concerned because the prompts says that the means acquired using both methods are fairly close. I don't think mine are very close, so I don't know whether my method could be wrong at some point.

I am using Population, not Sample calculation. For the frequency table, I construct my last bin to be 10-25 with a midpoint of 17.5. Is this accurate?
Attached below are the prompts and the data table.

Thank you very much for your consideration.
Untitled.png
Untitled2.png
 
Physics news on Phys.org
  • #2
Just a quick look at the runs which are 10 or more: they are not evenly distributed between 10 and 25, so does it make sense to use 17.5 (halfway between min and Max of that bin)?

If bins were gathering counts of something like 1-10, 11-20, 21-30, etc, then it may be a good idea to pick the middle of the bin, especially for bins not at the edge of the distribution.

Unless each histogram bin can recreate every piece of data, then it is going to be an estimate of the data.

What might be a better number to use for the 10+ bin?
 
  • Like
Likes Beth N
  • #3
scottdave said:
Just a quick look at the runs which are 10 or more: they are not evenly distributed between 10 and 25, so does it make sense to use 17.5 (halfway between min and Max of that bin)?

If bins were gathering counts of something like 1-10, 11-20, 21-30, etc, then it may be a good idea to pick the middle of the bin, especially for bins not at the edge of the distribution.

Unless each histogram bin can recreate every piece of data, then it is going to be an estimate of the data.

What might be a better number to use for the 10+ bin?
Thank you so much for answering! I see, that makes sense, 17.5 doesn't seem to be the best number. Does it mean I cannot use the midpoint in this case? Or should I calculate the average or mean of this data group and use it as the midpoint? In an example in class, see that the professor left the midpoint empty for the open-ended bin. However, how can find the mean using the method as instructed?

This is the frequency of each data point for the class 10+. the Median would be 12, the mean is 12.6 and mode is 10. Should I use one of these number for the midpoint?
105
113
121
191
251
 
Last edited:
  • #4
So their hint told you that the histogram method would be close to the answer calculated from the raw data. What if I Googled these baseball stats and for a page which showed the histogram without more information. You know what the data is, but think about baseball, and how often there are high scoring games.

I'm not sure what they are looking for, but I don't think they want you to calculate the average of the raw data within that bin. I wouldn't take a midpoint, but where most of the data is within the bin (or is likely to be). You have the advantage of knowing where all the actual data is, where would you think would be a good guess? I have my idea in mind.
 
  • #5
Beth, just to understand thw methodology you're using. When you compute the sample stats , do you use the midpoints of each bin to do the computations?
 
  • Like
Likes Beth N
  • #6
WWGD said:
Beth, just to understand thw methodology you're using. When you compute the sample stats , do you use the midpoints of each bin to do the computations?
Hello!
Yes I use the midpoint, but i didn't make the bins to have a lower & upper limit except for the last bin, so for the previous bins, the midpoint is the data value itself.
 
  • Like
Likes WWGD
  • #7
scottdave said:
So their hint told you that the histogram method would be close to the answer calculated from the raw data. What if I Googled these baseball stats and for a page which showed the histogram without more information. You know what the data is, but think about baseball, and how often there are high scoring games.

I'm not sure what they are looking for, but I don't think they want you to calculate the average of the raw data within that bin. I wouldn't take a midpoint, but where most of the data is within the bin (or is likely to be). You have the advantage of knowing where all the actual data is, where would you think would be a good guess? I have my idea in mind.
10 as it is the value with the greatest frequency?
 

FAQ: Calculate the mean, variance, stdev, and CV with 2 methods: check answers

1. What is the mean and how is it calculated?

The mean is a measure of central tendency in a set of data and is calculated by adding all the values and dividing by the number of values in the set.

2. How is variance calculated?

Variance is a measure of how spread out the data is from the mean and is calculated by taking the average of the squared differences between each value and the mean.

3. What is the standard deviation and how is it related to variance?

The standard deviation is a measure of how much the data deviates from the mean and is calculated by taking the square root of the variance. It is related to variance because it is the square root of the variance.

4. What are the two methods for calculating mean, variance, standard deviation, and coefficient of variation?

The two methods are the formula method and the computational method. The formula method involves using mathematical formulas to calculate the values, while the computational method involves using a computer or calculator to perform the calculations.

5. How is the coefficient of variation (CV) calculated and what does it represent?

The coefficient of variation is calculated by dividing the standard deviation by the mean and multiplying by 100. It represents the relative variability of the data, taking into account the size of the mean. A lower CV indicates less variability, while a higher CV indicates more variability.

Similar threads

Replies
4
Views
1K
Replies
8
Views
1K
Replies
7
Views
2K
Replies
7
Views
1K
Replies
9
Views
2K
Replies
0
Views
878
Back
Top