How to get the CDF from a histogram

In summary, the conversation discusses different methods for creating a cumulative distribution function (cdf) for a set of values. The first method mentioned is to count the total number of occurrences for each particular value, while the second method involves fitting a probability distribution to the data. The speakers also clarify the difference between the empirical cdf and the underlying cdf. Ultimately, it is agreed that the method of counting occurrences is a valid way to create a cumulative histogram of the data, but there may be more sophisticated methods for estimating the underlying cdf.
  • #1
catalin.drago
10
0
Hello,

I have a histogram, where I count the number of occurrences that a function takes particular values in the range 0.8 and 2.2.

I would like to get the cumulative distribution function for the set of values. Is it correct to just count the total number of occurrences until each particular value.

For example, the cdf at 0.9 will be the sum of all the occurrences from 0.8 to 0.9?

Is it correct?

Thank you
 
Mathematics news on Phys.org
  • #2
That would be a crude way of doing it, yes. There are a variety of techniques (e.g. maximum likelihood) for fitting a distributions to empirical data. Most statistical software (e.g. R, Matlab with the stats toolbox) should support a few different methods.
 
  • #3
catalin.drago said:
I would like to get the cumulative distribution function for the set of values.

To mathematicians, the usual scenario is that your data is random samples from some probability distribution (i.e. a c.d.f). The data is not the same as the c.d.f. (unless your sample happened to come out "perfectly"). When you make the cumulative histogram of the data, it isn't the same thing as the c.d.f, so the preferred term for it would be "the empirical c.d.f" or just "the cumulative histogram".

If you are trying to make the cumulative histogram, your method is correct. If you are tyring to estimate the underlying c.d.f. of the random variable then, as Number Nine mentions, there may be more sophisticated ways.
 

1. How do I convert a histogram to a cumulative distribution function (CDF)?

The CDF can be obtained from a histogram by taking the cumulative sum of the frequencies and dividing by the total number of data points. This will give you the cumulative probability for each bin in the histogram.

2. Do I need to have a certain number of bins in my histogram to get an accurate CDF?

The number of bins in a histogram does not affect the accuracy of the resulting CDF. However, having too few bins can result in a less smooth CDF, while having too many bins can make it difficult to interpret the data.

3. Can I use any type of data to create a CDF from a histogram?

Yes, you can use any type of data to create a CDF from a histogram. However, the data should be continuous and numeric for the CDF to be meaningful.

4. Why is the CDF useful in data analysis?

The CDF allows you to see the overall distribution of the data and the probability of observing a certain value or range of values. It also allows for easy comparison between different datasets and can help identify outliers or unusual patterns in the data.

5. Can I use software or programs to automatically generate a CDF from a histogram?

Yes, many statistical software and programs have built-in functions to generate a CDF from a histogram. These include R, Python, and Excel. However, it is important to understand the underlying calculations and assumptions in order to properly interpret the resulting CDF.

Similar threads

  • Computing and Technology
Replies
9
Views
454
  • High Energy, Nuclear, Particle Physics
Replies
9
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
2
Views
4K
Replies
1
Views
4K
  • General Math
Replies
7
Views
766
  • General Math
Replies
1
Views
1K
Replies
3
Views
451
Replies
1
Views
778
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
Back
Top