Histogram to PDF Conversion: Confirm Experimental Approximation

  • Context: Undergrad 
  • Thread starter Thread starter tangodirt
  • Start date Start date
  • Tags Tags
    Histogram Pdf
Click For Summary

Discussion Overview

The discussion revolves around the conversion of a histogram to an experimental approximation of a probability density function (PDF) using a large dataset of 1,000,000 points. Participants explore the validity of this method, including considerations of terminology and alternative techniques for density estimation.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant asserts that dividing the number of items in each histogram bin by the total number of items yields an experimental approximation of the PDF.
  • Another participant agrees but emphasizes that it is more accurate to refer to it as an estimate rather than an approximation.
  • A different participant suggests that in addition to dividing by the total number of items, one should also divide by the width of the bins to obtain a proper estimate of the PDF.
  • Some participants propose using kernel density estimation (KDE) techniques as an alternative method for generating a smoother and potentially more accurate PDF from the dataset.
  • References to external resources are provided for further reading on density estimation techniques and the theoretical underpinnings of histograms as density estimators.

Areas of Agreement / Disagreement

While there is some agreement on the method of dividing by the total number of items to estimate the PDF, there is no consensus on the terminology (estimate vs. approximation) or the additional step of dividing by bin width. The discussion includes multiple viewpoints on the adequacy of the histogram method and alternative techniques like KDE.

Contextual Notes

Participants mention considerations regarding the choice of bin width and the quality of the estimator, referencing studies that have explored these topics. However, specific details about these studies or their findings are not resolved in the discussion.

tangodirt
Messages
51
Reaction score
1
Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!
 
Physics news on Phys.org
Yes, it does.

Strictly speaking it's an estimate, not an approximation. In almost all most cases they'll be the same thing. Better to call it an estimate though. It's the best estimate you can make in the absence of any other info.
 
Yes, estimate is a much better word! Thank you for clarifying.
 
tangodirt said:
Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!

If you have access to the dataset, you can also use density estimation techniques like the kernel density estimation.

You can find a very good explanation of this technique here: http://www.mglerner.com/blog/?p=28

I hope this helps !
 
h6ss said:
If you have access to the dataset, you can also use density estimation techniques like the kernel density estimation.

You can find a very good explanation of this technique here: http://www.mglerner.com/blog/?p=28

I hope this helps !

Wow, this is really cool. I am playing with KDE techniques now and the results look great. At the very least, it really helps to "smooth" the discrete data to generate a more accurate PDF.
 
  • Like
Likes   Reactions: h6ss
tangodirt said:
Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!
Divide by the total number of items *and* by the width of the bins. Now you have an estimate of the probability density function. (Now you have a function such that the area under the "curve" equals to one).

How good an estimator is it? And how to choose the bin-width? That has been studied in many papers, for instance in http://link.springer.com/article/10.1007/BF01025868
On the histogram as a density estimator: L 2 theory
David Freedman, Persi Diaconis
Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete
December 1981, Volume 57, Issue 4, pp 453-476
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
8K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
5K
Replies
28
Views
4K