How to get the CDF from a histogram

  • Thread starter Thread starter catalin.drago
  • Start date Start date
  • Tags Tags
    Cdf Histogram
Click For Summary
To obtain the cumulative distribution function (CDF) from a histogram, one can sum the occurrences of values up to a specific point, which is a valid method for creating a cumulative histogram. However, this approach yields an empirical CDF rather than the true CDF of the underlying distribution. For a more accurate estimation of the CDF, advanced techniques such as maximum likelihood estimation can be employed, often supported by statistical software like R or Matlab. It is important to distinguish between the empirical CDF derived from data and the theoretical CDF of a random variable. Using the cumulative histogram method is appropriate for visualizing data, but estimating the underlying distribution may require more sophisticated methods.
catalin.drago
Messages
10
Reaction score
0
Hello,

I have a histogram, where I count the number of occurrences that a function takes particular values in the range 0.8 and 2.2.

I would like to get the cumulative distribution function for the set of values. Is it correct to just count the total number of occurrences until each particular value.

For example, the cdf at 0.9 will be the sum of all the occurrences from 0.8 to 0.9?

Is it correct?

Thank you
 
Mathematics news on Phys.org
That would be a crude way of doing it, yes. There are a variety of techniques (e.g. maximum likelihood) for fitting a distributions to empirical data. Most statistical software (e.g. R, Matlab with the stats toolbox) should support a few different methods.
 
catalin.drago said:
I would like to get the cumulative distribution function for the set of values.

To mathematicians, the usual scenario is that your data is random samples from some probability distribution (i.e. a c.d.f). The data is not the same as the c.d.f. (unless your sample happened to come out "perfectly"). When you make the cumulative histogram of the data, it isn't the same thing as the c.d.f, so the preferred term for it would be "the empirical c.d.f" or just "the cumulative histogram".

If you are trying to make the cumulative histogram, your method is correct. If you are tyring to estimate the underlying c.d.f. of the random variable then, as Number Nine mentions, there may be more sophisticated ways.
 
Here is a little puzzle from the book 100 Geometric Games by Pierre Berloquin. The side of a small square is one meter long and the side of a larger square one and a half meters long. One vertex of the large square is at the center of the small square. The side of the large square cuts two sides of the small square into one- third parts and two-thirds parts. What is the area where the squares overlap?

Similar threads

  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 2 ·
Replies
2
Views
5K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
1K
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 7 ·
Replies
7
Views
1K