A question about a cumulative distribution curve

  • Context: Undergrad 
  • Thread starter Thread starter bradyj7
  • Start date Start date
  • Tags Tags
    Curve Distribution
Click For Summary
SUMMARY

The discussion centers on the interpretation of a cumulative distribution function (CDF) and its associated bins in statistical analysis. The data is divided into ten bins, each representing equal probabilities on the CDF, which means that each bin contains the same proportion of the total data. The confusion arises from the assumption that bin probabilities are cumulative; however, each bin is independent and represents a decile of the data. The graph referenced illustrates the CDF and the probability density function, emphasizing the importance of median values over mean values in this context.

PREREQUISITES
  • Understanding of cumulative distribution functions (CDF)
  • Familiarity with statistical binning techniques
  • Knowledge of probability density functions (PDF)
  • Basic concepts of quantiles and percentiles in statistics
NEXT STEPS
  • Research "Cumulative Distribution Function (CDF) in Statistics"
  • Learn about "Quantile Analysis and Binning Techniques"
  • Explore "Probability Density Functions (PDF) and Their Applications"
  • Investigate "Median vs. Mean in Statistical Analysis"
USEFUL FOR

Statisticians, data analysts, and researchers interested in understanding cumulative distribution functions and their applications in data analysis.

bradyj7
Messages
117
Reaction score
0
Hello there,

I have a Figure from a book and some text explaining the figure and I was hoping that somebody could explain/clarify what it means.

Here is the Figure

http://dl.dropbox.com/u/54057365/All/pic.JPG

Here is the text explaining the Figure:

"The data are divided into ten bins having the same probability on the cumulative density
function (cdf). The representative driving distances in each bin are selected having the median cumulative distribution in each bin. The selected distance of one-day driving in each bin ranges from 9.56 to 81.4 miles, thus, the one-way trip distances range
from 4.78 to 40.71 miles."


I'm looking for clarification on the first line.

"The data are divided into ten bins having the same probability on the cumulative density function (cdf)"

Does this mean that the data is divided into 10 bins according the cumulative distribution curve and these bins are 0.1, 0.2 0.3...1.0?

My question is do they have the "same probability"? I would of though that bin 0.2 would have twice the probability of bin 0.1? And bin 0.3 would have three times the probability etc.

Am I understanding this correctly?

Thank you for your help

John
 
Physics news on Phys.org
Bin 0.2 refers to those items between 0.1 and 0.2, so it has the same probability as the first interval.
 
bradyj7 said:
Hello there,

I have a Figure from a book and some text explaining the figure and I was hoping that somebody could explain/clarify what it means.

Here is the Figure

http://dl.dropbox.com/u/54057365/All/pic.JPG

Here is the text explaining the Figure:

"The data are divided into ten bins having the same probability on the cumulative density
function (cdf). The representative driving distances in each bin are selected having the median cumulative distribution in each bin. The selected distance of one-day driving in each bin ranges from 9.56 to 81.4 miles, thus, the one-way trip distances range
from 4.78 to 40.71 miles."


I'm looking for clarification on the first line.

"The data are divided into ten bins having the same probability on the cumulative density function (cdf)"

Does this mean that the data is divided into 10 bins according the cumulative distribution curve and these bins are 0.1, 0.2 0.3...1.0?

That is what the graph appears to be showing. The bins are more often called "cells"; and the analysis is called "quantile" or percentage of the cdf; I have another thread where I am asking about the same subject. In my design, the cells/bins are of equal probability based on the cdf (erf) of the bell curve.

My question is do they have the "same probability"? I would of though that bin 0.2 would have twice the probability of bin 0.1? And bin 0.3 would have three times the probability etc.

Am I understanding this correctly?

Thank you for your help

John

It can be done either way but in this case I don't think it is like your last statement; In my algorithm it is not accumulating either; By reading the description of the text, I don't believe this is talking about bins which include previous ones. Notice: The blue line is the cumulative distribution function; and check how the right hand side of the graph's description is the cumulative value up to 1.0 (100%), Then notice how the dotted black vertical lines intersect the CDF at exactly spaced % on the right. So your graph, I suppose, is perdecile; whereas mine is per/cent/ile as I break it up into 1% bins.


The probability density function also appears on that graph; Notice it isn't a bell curve -- this is a result of the number of samples being small; The bins are 10, so the discrete binomal that would have the same shape is a bernoulli trial with p=0.1 and q=0.9 scaled to the number of data points in the original sample.
Notice, the smoothness of the graph appears to be "fudged" by using "median" values rather than "mean"; Even with a million values, I still see some non-smooth deviation in my percentile cells.

See graphs, here:
Graphs at bottom of binomial cpq thread

I hope this helps.
If you go looking for the thread directly, note: I made a typo in the title, it is supposed to be a cumulative p,q bernoulli distribution function (cpq, not cpk...!) that I am treating the cells as.

If your book happens to have a good estimation formula for the pdf (the purple brown graph), when n of data points is small, I'd appreciate knowing what the name of the approximation is -- so I can look it up. Computing an exact value for the Bernouli trial is time intensive...
Thanks, Andrew.
 

Similar threads

  • · Replies 11 ·
Replies
11
Views
5K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 14 ·
Replies
14
Views
5K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 31 ·
2
Replies
31
Views
4K
  • · Replies 14 ·
Replies
14
Views
4K