Histogram Bin Sizing Methodology

In summary, choosing the bin size for a histogram can be a subjective process and may depend on the nature of the data being analyzed. One method that may be useful is the Freedman-Diaconis bin sizing, which is less sensitive to extreme data points. However, the overall impact of bin size in a histogram may not be significant and choosing a method may not be as important as other aspects of the analysis. The key is to have a well-motivated reasoning for the chosen method.
  • #1
cmmcnamara
122
1
Hey all I have what I assume to be a fairly vague question. I'm taking a programming class right now for MATLAB and the current project we are working on is a simple histogram display from reading an excel data file. The coding is extremely easy however, having never taken a statistics course and having a very vague understanding of the subject I am a bit stumped on how to properly choose bin size which specifies the number of bins for MATLAB to draw the histogram. I have been reading a bit on the topic on Wikipedia, however the methodology for choosing a bin size seems highly subjective to me and dependent on the nature of the collected data.

For my project the professor chose that the data to be analyzed was to be the height of our class. My rationale and selection for methodology goes as follows. I chose the Freedman-Diaconis bin sizing which specifies that the bin size is equivalent to twice the interquartile range divided by the number of data points to the one-third power (see http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width). Based on Wikipedia's description it tends to be less sensitive to extreme data points than the standard deviation rule. I thought that given my class's small size (32 people) that this method was the best because with such a small sample size I figured that the histogram function would be highly sensitive to extreme values and therefore not be as representative as a method which is more sensitive to extreme values. Would this be a correct reasoning for selecting bin size by this method? There are quite a few other methods listed but some such as the square root method don't seem to have any useful description. Could someone validate my reasoning here? I realize that this topic seems to have "no right answer" but I at least want to believe my logic is sound. Thanks in advance!
 
Physics news on Phys.org
  • #2
cmmcnamara said:
Hey all I have what I assume to be a fairly vague question. I'm taking a programming class right now for MATLAB and the current project we are working on is a simple histogram display from reading an excel data file. The coding is extremely easy however, having never taken a statistics course and having a very vague understanding of the subject I am a bit stumped on how to properly choose bin size which specifies the number of bins for MATLAB to draw the histogram. I have been reading a bit on the topic on Wikipedia, however the methodology for choosing a bin size seems highly subjective to me and dependent on the nature of the collected data.

For my project the professor chose that the data to be analyzed was to be the height of our class. My rationale and selection for methodology goes as follows. I chose the Freedman-Diaconis bin sizing which specifies that the bin size is equivalent to twice the interquartile range divided by the number of data points to the one-third power (see http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width). Based on Wikipedia's description it tends to be less sensitive to extreme data points than the standard deviation rule. I thought that given my class's small size (32 people) that this method was the best because with such a small sample size I figured that the histogram function would be highly sensitive to extreme values and therefore not be as representative as a method which is more sensitive to extreme values. Would this be a correct reasoning for selecting bin size by this method? There are quite a few other methods listed but some such as the square root method don't seem to have any useful description. Could someone validate my reasoning here? I realize that this topic seems to have "no right answer" but I at least want to believe my logic is sound. Thanks in advance!

Your logic seems OK to me. Maybe I'm being a wet blanket but histograms are a rather crude technique and I think it doesn't really matter. It depends on what you want to know. Just try something and decide whether it looks alright. There are more important things to worry about. This attitude might get you a bad grade on a test, though.
 
  • #3
What will get you a good (or at least, better) grade is motivating the choice you have made. In this case I would agree with ImaLooser that it probably doesn't matter that much but even if it did, I would give partial credit to a well-motivated (but incorrect) choice but none to an unmotivated correct choice (because for all I know, that could have been a lucky guess).
 

What is Histogram Bin Sizing Methodology?

Histogram Bin Sizing Methodology is a statistical technique used to determine the appropriate size and number of bins for a histogram. A histogram is a visual representation of the distribution of data in a dataset.

Why is Histogram Bin Sizing Methodology important?

Histogram Bin Sizing Methodology is important because it helps to accurately represent the distribution of data in a dataset. Choosing the wrong bin size or number of bins can result in a misleading or inaccurate representation of the data.

How is Histogram Bin Sizing Methodology calculated?

Histogram Bin Sizing Methodology is typically calculated using mathematical formulas such as the Freedman-Diaconis rule or the Scott's normal reference rule. These formulas take into account the number of data points and the range of the data to determine the optimal bin size and number of bins.

What are the limitations of Histogram Bin Sizing Methodology?

One limitation of Histogram Bin Sizing Methodology is that it relies on mathematical formulas, which may not always be the most appropriate for a specific dataset. Additionally, different bin sizes may result in different interpretations of the data, making it important to carefully consider the chosen bin size.

How can Histogram Bin Sizing Methodology be applied in research?

Histogram Bin Sizing Methodology can be applied in research to accurately visualize and analyze data. It can also be used to compare different datasets or to detect patterns and trends in the data. By using an appropriate bin size, researchers can ensure that their findings are reliable and valid.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
7K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
4K
  • Programming and Computer Science
Replies
9
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
7
Views
5K
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Back
Top