
#1
Mar512, 04:33 AM

P: 122

Hello there,
I have a Figure from a book and some text explaining the figure and I was hoping that somebody could explain/clarify what it means. Here is the Figure http://dl.dropbox.com/u/54057365/All/pic.JPG Here is the text explaining the Figure: "The data are divided into ten bins having the same probability on the cumulative density function (cdf). The representative driving distances in each bin are selected having the median cumulative distribution in each bin. The selected distance of oneday driving in each bin ranges from 9.56 to 81.4 miles, thus, the oneway trip distances range from 4.78 to 40.71 miles." I'm looking for clarification on the first line. "The data are divided into ten bins having the same probability on the cumulative density function (cdf)" Does this mean that the data is divided into 10 bins according the cumulative distribution curve and these bins are 0.1, 0.2 0.3....1.0? My question is do they have the "same probability"? I would of though that bin 0.2 would have twice the probability of bin 0.1? And bin 0.3 would have three times the probability etc. Am I understanding this correctly? Thank you for your help John 



#2
Mar512, 03:32 PM

Sci Advisor
P: 5,941

Bin 0.2 refers to those items between 0.1 and 0.2, so it has the same probability as the first interval.




#3
Mar812, 07:47 PM

P: 263

The probability density function also appears on that graph; Notice it isn't a bell curve  this is a result of the number of samples being small; The bins are 10, so the discrete binomal that would have the same shape is a bernoulli trial with p=0.1 and q=0.9 scaled to the number of data points in the original sample. Notice, the smoothness of the graph appears to be "fudged" by using "median" values rather than "mean"; Even with a million values, I still see some nonsmooth deviation in my percentile cells. See graphs, here: Graphs at bottom of binomial cpq thread I hope this helps. If you go looking for the thread directly, note: I made a typo in the title, it is supposed to be a cumulative p,q bernoulli distribution function (cpq, not cpk...!) that I am treating the cells as. If your book happens to have a good estimation formula for the pdf (the purple brown graph), when n of data points is small, I'd appreciate knowing what the name of the approximation is  so I can look it up. Computing an exact value for the Bernouli trial is time intensive... Thanks, Andrew. 


Register to reply 
Related Discussions  
cumulative distribution function  Set Theory, Logic, Probability, Statistics  61  
cumulative distribution function question  Precalculus Mathematics Homework  2  
integral of cumulative age distribution curve  Calculus  1  
working out the cumulative distribution  Set Theory, Logic, Probability, Statistics  1 