Histogram to PDF Conversion: Confirm Experimental Approximation

In summary: Histograms are a very common way of displaying data. They are also a very simple way to display the density of a data set. The density of a data set is a measure of how spread out the data is. The more spread out the data, the higher the density.The histogram is a way of measuring the density of a data set. It is a bar chart that has bins along the x-axis and the number of items in each bin along the y-axis. The histogram shows how many items are in each bin.The histogram is a way of displaying the density of a data set. The density of a data set is a measure of how spread out the data is. The more
  • #1
tangodirt
54
1
Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!
 
Physics news on Phys.org
  • #2
Yes, it does.

Strictly speaking it's an estimate, not an approximation. In almost all most cases they'll be the same thing. Better to call it an estimate though. It's the best estimate you can make in the absence of any other info.
 
  • #3
Yes, estimate is a much better word! Thank you for clarifying.
 
  • #4
tangodirt said:
Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!

If you have access to the dataset, you can also use density estimation techniques like the kernel density estimation.

You can find a very good explanation of this technique here: http://www.mglerner.com/blog/?p=28

I hope this helps !
 
  • #5
h6ss said:
If you have access to the dataset, you can also use density estimation techniques like the kernel density estimation.

You can find a very good explanation of this technique here: http://www.mglerner.com/blog/?p=28

I hope this helps !

Wow, this is really cool. I am playing with KDE techniques now and the results look great. At the very least, it really helps to "smooth" the discrete data to generate a more accurate PDF.
 
  • Like
Likes h6ss
  • #6
tangodirt said:
Say I have a large data set of 1,000,000 points. If I plot a histogram of this data, I get a bar chart with bins along the x-axis and the number of items in each bin along the y-axis.

If I take the number of items in each bin and divide this by the total number of items (1,000,000 in this case), have I arrived at an experimental approximation of the probability density function?

Everything I know says that yes, dividing the histogram by the total number of points gets me to an experimental approximation of the PDF, but I want someone who is more familiar with this to confirm. Thank you!
Divide by the total number of items *and* by the width of the bins. Now you have an estimate of the probability density function. (Now you have a function such that the area under the "curve" equals to one).

How good an estimator is it? And how to choose the bin-width? That has been studied in many papers, for instance in http://link.springer.com/article/10.1007/BF01025868
On the histogram as a density estimator: L 2 theory
David Freedman, Persi Diaconis
Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete
December 1981, Volume 57, Issue 4, pp 453-476
 

1. What is a histogram and how is it related to PDF conversion?

A histogram is a graphical representation of a dataset, where the data is organized into bins and the frequency of each bin is displayed in a bar graph. PDF conversion refers to converting a histogram into a probability density function, which is a mathematical representation of the probability distribution of a continuous random variable.

2. Why is it important to confirm experimental approximation in histogram to PDF conversion?

Confirming experimental approximation ensures the accuracy of the PDF conversion, as it verifies that the data collected and represented in the histogram accurately reflects the underlying distribution of the data. It also helps to identify any potential errors or outliers in the data.

3. What methods can be used to confirm experimental approximation in histogram to PDF conversion?

One common method is to compare the shape of the histogram to the theoretical PDF of the expected distribution. Another method is to use statistical tests, such as the Kolmogorov-Smirnov test, to assess the goodness of fit between the histogram and the PDF.

4. Are there any limitations to using histogram to PDF conversion for experimental data?

Yes, there are limitations to using histogram to PDF conversion. One limitation is that it assumes the underlying distribution of the data is continuous and can be accurately represented by a PDF. Additionally, the accuracy of the conversion relies on the number of data points and the choice of bin size in the histogram.

5. How can histogram to PDF conversion be useful in scientific research?

Histogram to PDF conversion is useful in scientific research as it allows for a more precise and detailed analysis of experimental data. It can also help to identify patterns and trends in the data, and can be used to make predictions and inferences about future data points.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • High Energy, Nuclear, Particle Physics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
7K
  • Programming and Computer Science
Replies
3
Views
690
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
3K
  • Set Theory, Logic, Probability, Statistics
2
Replies
37
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
995
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
3K
Back
Top