- #1

- 299

- 12

## Main Question or Discussion Point

Anybody ever heard of the broken stick model? If so, what is it used for besides species abundance in ecology? Basically, n-1 points are randomly distributed on a unit interval, creating n intervals of random lengths. The points are independent, uniform random variables on the unit interval.

Here I am going to derive the expected length of I

First I'll give a brief sketch of the my derivation. I'll fill in the details later.

To begin, we need to find the probability that I

As a sub-problem, we need to find the probability that exactly j intervals have length > x, and then we can sum over j from n-r+1 to n. The set of n-1 points X1, X2, … X(n-1) can be thought of as a vector uniformly distributed on an (n-1)-cube of unit size. It turns out that the easiest probability to calculate is the probability that a particular group of s intervals (and possibly others, too) have lengths > x. This probability is just (1-sx)

For example, the probability that the first two intervals starting from the left are greater than .01 in length is (1-2*.01)

1) [tex]\sum_{s=j}^{n}(-1)^{s-j}\binom{s}{j}\binom{n}{s}(1-sx)^{n-1}[/tex] if s<floor(1/x). Zero otherwise. In other words,

2) [tex]\sum_{s=j}^{\min(n,\floor{\frac{1}{x}})}(-1)^{s-j}\binom{s}{j}\binom{n}{s}(1-sx)^{n-1}[/tex]

Therefore the tail probability distribution for the r-th smallest interval is given by

3) [tex]P(I_r > x) = \sum_{j=n-r+1}^{n}\sum_{s=j}^{\min(n,\floor{\frac{1}{x}})}(-1)^{s-j}\binom{s}{j}\binom{n}{s}(1-sx)^{n-1 }[/tex]

To find the expected value of Ir, we integrate this tail probability over the allowable values for x. The smallest x could be is zero. The largest it could be is 1/(n-r+1). This would happen if the smallest r-1 intervals were zero length and all other intervals had length 1/(n-r+1). So we need to integrate P(I_r>x) from 0 to 1/(n-r+1).

Because the upper limit of the sum over s depends on x, we have to break the integral up into small intervals:

4) [tex]E[I_r] = \int_{0}^{\frac{1}{n}}P(I_r > x)dx + \sum_{l=1}^{r-1} \int_{\frac{1}{n-(l-1)}}^{\frac{1}{n-l}}P(I_r>x)dx[/tex]

Plugging in (3) for P(I_r>x) we get two main terms for E[I_r].

5) [tex]E[I_r] = \int_{0}^{\frac{1}{n}} \sum_{j=n-r+1}^{n}\sum_{s=j}^{n}(-1)^{s-j}\binom{s}{j}\binom{n}{s}(1-sx)^{n-1} dx + \sum_{l=1}^{r-1} \int_{\frac{1}{n-(l-1)}}^{\frac{1}{n-l}}\sum_{j=n-r+1}^{n}\sum_{s=j}^{n-l}(-1)^{s-j}\binom{s}{j}\binom{n}{s}(1-sx)^{n-1} dx[/tex]

After performing the integrals and rearranging the summations, you get to this:

6) [tex]E[I_r] = \frac{1}{n}\sum_{j=n-r+1}^{n} \sum_{s=j}^{n}(-1)^{s-j} \frac{ \binom{s}{j} \binom{n}{s} }{s}[/tex]

That inner sum over s had me stumped for over a week, but I finally figured out how it comes out to 1/j. That gives the desired answer

7) [tex]E[I_r] = \frac{1}{n}\sum_{j=n-r+1}^{n}\frac{1}{j}[/tex]

Here I am going to derive the expected length of I

_{r}, the r^{th}shortest interval using a brute force approach. The I’s are the order statistics for the intervals. That is, 0<=I_{1}<= I_{2}<= … I_{r}<= … I_{n}<= 1. Looking around online, there are some more sophisticated derivations for the expectation that I wasn't able to follow.First I'll give a brief sketch of the my derivation. I'll fill in the details later.

To begin, we need to find the probability that I

_{r}> x, or P(I_{r}> x). This condition holds exactly when there are at least n-r+1 intervals with length greater than x.As a sub-problem, we need to find the probability that exactly j intervals have length > x, and then we can sum over j from n-r+1 to n. The set of n-1 points X1, X2, … X(n-1) can be thought of as a vector uniformly distributed on an (n-1)-cube of unit size. It turns out that the easiest probability to calculate is the probability that a particular group of s intervals (and possibly others, too) have lengths > x. This probability is just (1-sx)

^{n-1}, provided s is less than the integer part of 1/x. Otherwise it’s zero.For example, the probability that the first two intervals starting from the left are greater than .01 in length is (1-2*.01)

^{n-1}Because of the symmetry of the problem, all such groups have the same probability, so the sum over all groups of s intervals of the probability that at least those s intervals have length > x is given by n_C_s (1-sx)^{n-1}, provided s < floor(1/x). You might think we could immediately get the probability that at least n-r+1 intervals have length > x by putting s=n-r+1. But the cases where there are more than s intervals with length > x are being counted multiple times, so we have to use the inclusion-exclusion principle. The probability that exactly j intervals have length > x is1) [tex]\sum_{s=j}^{n}(-1)^{s-j}\binom{s}{j}\binom{n}{s}(1-sx)^{n-1}[/tex] if s<floor(1/x). Zero otherwise. In other words,

2) [tex]\sum_{s=j}^{\min(n,\floor{\frac{1}{x}})}(-1)^{s-j}\binom{s}{j}\binom{n}{s}(1-sx)^{n-1}[/tex]

Therefore the tail probability distribution for the r-th smallest interval is given by

3) [tex]P(I_r > x) = \sum_{j=n-r+1}^{n}\sum_{s=j}^{\min(n,\floor{\frac{1}{x}})}(-1)^{s-j}\binom{s}{j}\binom{n}{s}(1-sx)^{n-1 }[/tex]

To find the expected value of Ir, we integrate this tail probability over the allowable values for x. The smallest x could be is zero. The largest it could be is 1/(n-r+1). This would happen if the smallest r-1 intervals were zero length and all other intervals had length 1/(n-r+1). So we need to integrate P(I_r>x) from 0 to 1/(n-r+1).

Because the upper limit of the sum over s depends on x, we have to break the integral up into small intervals:

4) [tex]E[I_r] = \int_{0}^{\frac{1}{n}}P(I_r > x)dx + \sum_{l=1}^{r-1} \int_{\frac{1}{n-(l-1)}}^{\frac{1}{n-l}}P(I_r>x)dx[/tex]

Plugging in (3) for P(I_r>x) we get two main terms for E[I_r].

5) [tex]E[I_r] = \int_{0}^{\frac{1}{n}} \sum_{j=n-r+1}^{n}\sum_{s=j}^{n}(-1)^{s-j}\binom{s}{j}\binom{n}{s}(1-sx)^{n-1} dx + \sum_{l=1}^{r-1} \int_{\frac{1}{n-(l-1)}}^{\frac{1}{n-l}}\sum_{j=n-r+1}^{n}\sum_{s=j}^{n-l}(-1)^{s-j}\binom{s}{j}\binom{n}{s}(1-sx)^{n-1} dx[/tex]

After performing the integrals and rearranging the summations, you get to this:

6) [tex]E[I_r] = \frac{1}{n}\sum_{j=n-r+1}^{n} \sum_{s=j}^{n}(-1)^{s-j} \frac{ \binom{s}{j} \binom{n}{s} }{s}[/tex]

That inner sum over s had me stumped for over a week, but I finally figured out how it comes out to 1/j. That gives the desired answer

7) [tex]E[I_r] = \frac{1}{n}\sum_{j=n-r+1}^{n}\frac{1}{j}[/tex]