Statistics: How to find mean of bins

  • Thread starter Thread starter Niles
  • Start date Start date
  • Tags Tags
    Mean Statistics
Click For Summary

Homework Help Overview

The original poster presents a problem related to calculating the mean of binned data, providing specific bin sizes and the number of data points in each bin. The context is within the subject area of statistics, focusing on the concept of means in relation to grouped data.

Discussion Character

  • Conceptual clarification, Mathematical reasoning

Approaches and Questions Raised

  • Participants discuss the concept of selecting representative values from bins for calculating the mean, questioning what values should be used for the bins. There is a focus on the idea of using midpoints or averages of the bins versus the actual data points.

Discussion Status

The discussion is exploring different interpretations of how to represent values within the bins for the mean calculation. Some participants suggest using the midpoint of the bins, while others clarify that the values should be derived from the bin structure rather than the data itself. There is an exchange of ideas, but no explicit consensus has been reached.

Contextual Notes

Participants are navigating the definitions and methods for calculating the mean of binned data, with an emphasis on understanding the appropriate values to use in the formula. The original poster's question reflects a common uncertainty in handling binned data in statistics.

Niles
Messages
1,834
Reaction score
0

Homework Statement


Hi

Say I have the following bin sizes, where the number in paranthesis is the amount of data points contained in the bin:

20-29 : (2)
30-39 : (7)
40-49 : (12)
50-59 : (14)

How would I go about and find the mean for this binned data? I know that I should use

[tex] mean = \frac{1}{N}\sum\limits_j {n_j x_j },[/tex]

where bin j corresponds to a value xj and contains nj elements. But in my case, what are the values of the bins?
 
Physics news on Phys.org
You usually treat this as a weighted mean problem. Think this way: if you needed to select one value from inside each bin, what value (intuitively) would be the one to pick if you didn't want to over- or under-estimate typical values in the bin? That's the value you use for x.
 
statdad said:
You usually treat this as a weighted mean problem. Think this way: if you needed to select one value from inside each bin, what value (intuitively) would be the one to pick if you didn't want to over- or under-estimate typical values in the bin? That's the value you use for x.

I would use the average value of the data samples in that particular bin. Would you also do that?
 
Niles said:
I would use the average value of the data samples in that particular bin. Would you also do that?

No - you need to use a number that comes from the bins, not the collected data.
 
Then the average of the bin-size, i.e. for 20-29 it is 24.5?
 
Yes - it's called the midpoint of the bin.
 
Thanks, it is kind of you to help me.

Best wishes,
Niles.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K