Histogram Bin Sizing Methodology

  • Context: Undergrad 
  • Thread starter Thread starter cmmcnamara
  • Start date Start date
  • Tags Tags
    Bin Histogram Sizing
Click For Summary
SUMMARY

The discussion focuses on selecting bin sizes for histograms in MATLAB, specifically using the Freedman-Diaconis method. This method calculates bin size as twice the interquartile range divided by the number of data points raised to the one-third power, making it less sensitive to outliers compared to the standard deviation method. The user is analyzing class height data from a sample of 32 students and seeks validation for their choice of methodology. While some participants suggest that the choice of bin size may not significantly impact results, they emphasize the importance of justifying the selection to achieve better academic outcomes.

PREREQUISITES
  • Understanding of MATLAB programming for data visualization
  • Basic knowledge of statistical concepts, particularly histograms
  • Familiarity with the Freedman-Diaconis bin sizing method
  • Ability to interpret interquartile range and its significance in data analysis
NEXT STEPS
  • Research the Freedman-Diaconis method in detail to understand its advantages and limitations
  • Explore alternative bin sizing methods such as Sturges' Rule and the Square Root method
  • Learn how to implement histogram plotting in MATLAB using different bin sizes
  • Investigate the impact of sample size on histogram representation and data interpretation
USEFUL FOR

This discussion is beneficial for students in programming and statistics, particularly those working with MATLAB for data visualization and analysis. It is also relevant for educators and anyone interested in understanding histogram methodologies and their implications in data representation.

cmmcnamara
Messages
121
Reaction score
2
Hey all I have what I assume to be a fairly vague question. I'm taking a programming class right now for MATLAB and the current project we are working on is a simple histogram display from reading an excel data file. The coding is extremely easy however, having never taken a statistics course and having a very vague understanding of the subject I am a bit stumped on how to properly choose bin size which specifies the number of bins for MATLAB to draw the histogram. I have been reading a bit on the topic on Wikipedia, however the methodology for choosing a bin size seems highly subjective to me and dependent on the nature of the collected data.

For my project the professor chose that the data to be analyzed was to be the height of our class. My rationale and selection for methodology goes as follows. I chose the Freedman-Diaconis bin sizing which specifies that the bin size is equivalent to twice the interquartile range divided by the number of data points to the one-third power (see http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width). Based on Wikipedia's description it tends to be less sensitive to extreme data points than the standard deviation rule. I thought that given my class's small size (32 people) that this method was the best because with such a small sample size I figured that the histogram function would be highly sensitive to extreme values and therefore not be as representative as a method which is more sensitive to extreme values. Would this be a correct reasoning for selecting bin size by this method? There are quite a few other methods listed but some such as the square root method don't seem to have any useful description. Could someone validate my reasoning here? I realize that this topic seems to have "no right answer" but I at least want to believe my logic is sound. Thanks in advance!
 
Physics news on Phys.org
cmmcnamara said:
Hey all I have what I assume to be a fairly vague question. I'm taking a programming class right now for MATLAB and the current project we are working on is a simple histogram display from reading an excel data file. The coding is extremely easy however, having never taken a statistics course and having a very vague understanding of the subject I am a bit stumped on how to properly choose bin size which specifies the number of bins for MATLAB to draw the histogram. I have been reading a bit on the topic on Wikipedia, however the methodology for choosing a bin size seems highly subjective to me and dependent on the nature of the collected data.

For my project the professor chose that the data to be analyzed was to be the height of our class. My rationale and selection for methodology goes as follows. I chose the Freedman-Diaconis bin sizing which specifies that the bin size is equivalent to twice the interquartile range divided by the number of data points to the one-third power (see http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width). Based on Wikipedia's description it tends to be less sensitive to extreme data points than the standard deviation rule. I thought that given my class's small size (32 people) that this method was the best because with such a small sample size I figured that the histogram function would be highly sensitive to extreme values and therefore not be as representative as a method which is more sensitive to extreme values. Would this be a correct reasoning for selecting bin size by this method? There are quite a few other methods listed but some such as the square root method don't seem to have any useful description. Could someone validate my reasoning here? I realize that this topic seems to have "no right answer" but I at least want to believe my logic is sound. Thanks in advance!

Your logic seems OK to me. Maybe I'm being a wet blanket but histograms are a rather crude technique and I think it doesn't really matter. It depends on what you want to know. Just try something and decide whether it looks alright. There are more important things to worry about. This attitude might get you a bad grade on a test, though.
 
What will get you a good (or at least, better) grade is motivating the choice you have made. In this case I would agree with ImaLooser that it probably doesn't matter that much but even if it did, I would give partial credit to a well-motivated (but incorrect) choice but none to an unmotivated correct choice (because for all I know, that could have been a lucky guess).
 

Similar threads

  • · Replies 10 ·
Replies
10
Views
6K
  • · Replies 3 ·
Replies
3
Views
8K
  • · Replies 8 ·
Replies
8
Views
6K
  • · Replies 37 ·
2
Replies
37
Views
5K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K