How to get the CDF from a histogram

  • Context: Undergrad 
  • Thread starter Thread starter catalin.drago
  • Start date Start date
  • Tags Tags
    Cdf Histogram
Click For Summary
SUMMARY

The discussion focuses on deriving the cumulative distribution function (CDF) from a histogram of values ranging from 0.8 to 2.2. It confirms that summing occurrences up to a specific value, such as 0.9, provides a cumulative histogram, referred to as the empirical CDF. However, it emphasizes that this method is a crude approximation and suggests using more advanced techniques like maximum likelihood estimation for fitting distributions to empirical data. Statistical software such as R and Matlab with the stats toolbox can facilitate these advanced methods.

PREREQUISITES
  • Understanding of histograms and their construction
  • Familiarity with cumulative distribution functions (CDF)
  • Basic knowledge of statistical software like R or Matlab
  • Concept of maximum likelihood estimation for statistical modeling
NEXT STEPS
  • Explore how to create empirical CDFs using R
  • Learn about maximum likelihood estimation techniques
  • Investigate the differences between cumulative histograms and true CDFs
  • Study the statistical toolbox features in Matlab for distribution fitting
USEFUL FOR

Statisticians, data analysts, and researchers interested in understanding empirical data distributions and those looking to apply statistical software for advanced data analysis.

catalin.drago
Messages
10
Reaction score
0
Hello,

I have a histogram, where I count the number of occurrences that a function takes particular values in the range 0.8 and 2.2.

I would like to get the cumulative distribution function for the set of values. Is it correct to just count the total number of occurrences until each particular value.

For example, the cdf at 0.9 will be the sum of all the occurrences from 0.8 to 0.9?

Is it correct?

Thank you
 
Physics news on Phys.org
That would be a crude way of doing it, yes. There are a variety of techniques (e.g. maximum likelihood) for fitting a distributions to empirical data. Most statistical software (e.g. R, Matlab with the stats toolbox) should support a few different methods.
 
catalin.drago said:
I would like to get the cumulative distribution function for the set of values.

To mathematicians, the usual scenario is that your data is random samples from some probability distribution (i.e. a c.d.f). The data is not the same as the c.d.f. (unless your sample happened to come out "perfectly"). When you make the cumulative histogram of the data, it isn't the same thing as the c.d.f, so the preferred term for it would be "the empirical c.d.f" or just "the cumulative histogram".

If you are trying to make the cumulative histogram, your method is correct. If you are tyring to estimate the underlying c.d.f. of the random variable then, as Number Nine mentions, there may be more sophisticated ways.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 3 ·
Replies
3
Views
8K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K