Histogram to PDF Conversion: Confirm Experimental Approximation

tangodirt · Oct 3, 2015

Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!

andrewkirk · Oct 3, 2015

Yes, it does.

Strictly speaking it's an estimate, not an approximation. In almost all most cases they'll be the same thing. Better to call it an estimate though. It's the best estimate you can make in the absence of any other info.

tangodirt · Oct 3, 2015

Yes, estimate is a much better word! Thank you for clarifying.

h6ss · Oct 4, 2015

tangodirt said:

Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!

If you have access to the dataset, you can also use density estimation techniques like the kernel density estimation.

You can find a very good explanation of this technique here: http://www.mglerner.com/blog/?p=28

I hope this helps !

tangodirt · Oct 8, 2015

h6ss said:

If you have access to the dataset, you can also use density estimation techniques like the kernel density estimation.

You can find a very good explanation of this technique here: http://www.mglerner.com/blog/?p=28

I hope this helps !

Wow, this is really cool. I am playing with KDE techniques now and the results look great. At the very least, it really helps to "smooth" the discrete data to generate a more accurate PDF.

gill1109 · Oct 11, 2015

tangodirt said:

Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!

Divide by the total number of items *and* by the width of the bins. Now you have an estimate of the probability density function. (Now you have a function such that the area under the "curve" equals to one).

How good an estimator is it? And how to choose the bin-width? That has been studied in many papers, for instance in http://link.springer.com/article/10.1007/BF01025868
On the histogram as a density estimator: L 2 theory
David Freedman, Persi Diaconis
Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete
December 1981, Volume 57, Issue 4, pp 453-476

Histogram to PDF Conversion: Confirm Experimental Approximation

Thread 'My basic understanding of set theory'

Similar threads

Undergrad A variant of the Monty Hall problem

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers