# Distribution of data - alternative presentation

1. Apr 29, 2013

### exponent137

If we commonly draw a distribution of data, we should be careful to chose appropriate classes, for instance,
1-2, 4
2-3, 6
3-4, 11
etc.
But, if we draw a cumulative distribution, classes are not necessary. For instance
1-2, 4
2-3, 10
3-4, 21
and still better:
1, 1
1.3, 2
1.4, 3
1.9, 4
2.1, 5
etc

Does exist any good smoothing of this cumulative curve and calculation (derivation) of noncumulative distribution from it again?

Last edited: Apr 29, 2013
2. Apr 30, 2013

### Stephen Tashi

There are many methods of smoothing, but whether they are "good" is not a precise mathematical question until you define how "goodness" will be measured. To have a precise mathematical question, you also need to provide some probabilitiy model for how the data is generated.

A simple way is to pick a family of distributions (such as a Poisson), estimate the value of the parameters of that distribution and use the mathematical formula for it to estimate both the cumulative distribution and the density.

What you have in mind may be "interpolation" - i.e. perhaps you want a function that exactly matches the cumulative at every point of the data, but gives a smooth representation of the density. If that's your goal, you should express this thought.

3. May 1, 2013

### exponent137

It is not necessary that interpolation is made. I know that distributions are different, for instance Gaussian or Poisson one. I only asked if my mentioned method, or something similar, is in general use?

Last edited: May 1, 2013
4. May 1, 2013

### Stephen Tashi

I don't see that you mentioned a particular method. You only described the goal of getting a smooth representation of the density.

5. May 2, 2013

### exponent137

Yes, I do not know how to clearly present my question.
Maybe this:
Cumulative distribution is universal, is not dependent of classes, or intervals, but common distributions are dependent. But otherise, cummulative distubition is to much abstract.
So I suspect that it is possible to present common distribution without intervals, as a help at visualization of a cummulative distribution. Probably something such exist already?

6. May 2, 2013

### Stephen Tashi

If we are talking about the cumulative histogram of indpendent samples of a random variable, this does depend on the interval that gives the precision of the measurement. For example, if we measure to the nearest kg, we get a different representation that if we measure to the nearest gram. All real measurements of continuous random variables have limited precision.

The only thing that prevents you from making an "exact" representation of want you call the "common distribution" is that you want a histogram . A histogram, by definition, uses intervals to classify the data. You could plot the data without using intervals. At each data point ( you could draw a vertical line of height (k/n) where n is the number of data points and k is the number of times the value occurs, which will usually be 1.

If we are talking about the cumulative distribution of a continuous random variable both the cumulative distribution and the probability density are smooth curves and neither depends on interval sizes.

7. May 2, 2013

### exponent137

Yes, histogram is a correct word.

You could plot the data without using intervals. At each data point ( you could draw a vertical line of height (k/n) where n is the number of data points and k is the number of times the value occurs, which will usually be 1.

Yes, you could draw so, but if values are contionous, it means for instance 1.234455, 1.345555, your have only vertical lines high 1/N, or zeros. It is not visually well. But cumulative histogram has fine shape even if you do not have intervals.

Last edited: May 2, 2013