Histogram to PDF Conversion: Confirm Experimental Approximation

  • Thread starter Thread starter tangodirt
  • Start date Start date
  • Tags Tags
    Histogram Pdf
Click For Summary
SUMMARY

Dividing the counts in histogram bins by the total number of items, such as 1,000,000, provides an experimental estimate of the probability density function (PDF). It is crucial to also divide by the width of the bins to ensure the area under the curve equals one, enhancing the accuracy of the estimate. Kernel Density Estimation (KDE) techniques are recommended for smoothing discrete data and generating a more precise PDF. For further understanding, refer to the paper by Freedman and Diaconis on histogram density estimation.

PREREQUISITES
  • Understanding of histograms and their construction
  • Familiarity with probability density functions (PDFs)
  • Knowledge of Kernel Density Estimation (KDE) techniques
  • Basic statistical concepts such as bin-width selection
NEXT STEPS
  • Research Kernel Density Estimation (KDE) techniques for improved PDF estimation
  • Study the impact of bin-width selection on histogram accuracy
  • Read the paper "On the histogram as a density estimator: L2 theory" by Freedman and Diaconis
  • Explore practical applications of PDF estimation in data analysis
USEFUL FOR

Data scientists, statisticians, and analysts involved in data visualization and probability density estimation will benefit from this discussion.

tangodirt
Messages
51
Reaction score
1
Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!
 
Physics news on Phys.org
Yes, it does.

Strictly speaking it's an estimate, not an approximation. In almost all most cases they'll be the same thing. Better to call it an estimate though. It's the best estimate you can make in the absence of any other info.
 
Yes, estimate is a much better word! Thank you for clarifying.
 
tangodirt said:
Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!

If you have access to the dataset, you can also use density estimation techniques like the kernel density estimation.

You can find a very good explanation of this technique here: http://www.mglerner.com/blog/?p=28

I hope this helps !
 
h6ss said:
If you have access to the dataset, you can also use density estimation techniques like the kernel density estimation.

You can find a very good explanation of this technique here: http://www.mglerner.com/blog/?p=28

I hope this helps !

Wow, this is really cool. I am playing with KDE techniques now and the results look great. At the very least, it really helps to "smooth" the discrete data to generate a more accurate PDF.
 
  • Like
Likes h6ss
tangodirt said:
Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!
Divide by the total number of items *and* by the width of the bins. Now you have an estimate of the probability density function. (Now you have a function such that the area under the "curve" equals to one).

How good an estimator is it? And how to choose the bin-width? That has been studied in many papers, for instance in http://link.springer.com/article/10.1007/BF01025868
On the histogram as a density estimator: L 2 theory
David Freedman, Persi Diaconis
Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete
December 1981, Volume 57, Issue 4, pp 453-476
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
8K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
28
Views
4K
Replies
1
Views
3K
  • · Replies 4 ·
Replies
4
Views
5K