# Puzzling roll X dice, choose Y highest problem

by chawk
Tags: average, choose, dice, probability, roll
 P: 523 Puzzling roll X dice, choose Y highest problem Yes, working out order statistics of discrete random variables can be tricky. We can label the sorted n-sided dicerolls as $$Z_1,\ldots,Z_x$$ where $$n\ge Z_1\ge \ldots \ge Z_x\ge 1$$. To work out the expectation $$E[Z_1+\ldots+Z_y]$$ we just need to know the marginal distribution of each Z. In fact $$P[Z_k\le j]$$ is the probability that at most k-1 of the dice are greater than j, so $$P[Z_k\le j] = \sum_{i=0}^{k-1}(^xC_i)(1-j/n)^i (j/n)^{x-i}$$ and with $$P[Z_k= j] = P[Z_k\le j] - P[Z_k\le j-1]$$ we can work out $$E[Z_k]$$ for each k.
 P: 256 Ah, so you can compute the average without having to find the distribution of Z1+...+Zy. That is very cool. I was hoping someone would post a way to do that. Thanks, bpet. Finding the distribution of Z1+...+Zy is not so bad after all. I originally thought it could only be computed recursively, and not in a way that would lead to a general formula. But then I saw how to break the problem down into smaller parts. For instance, assume that you roll x dice, each of which has z sides, and want the sum S of the y highest values rolled. Let r be the smallest value that occurs among the y highest. Suppose that exactly i of the x dice have values less than r, j are equal to r, and the rest are greater than r. If i+j = x, then each of the y highest takes on the value r, and S = y*r is the only possibility for the sum. Otherwise, if i+j < x, then that leaves x-i-j dice whose values are greater than r. Of the highest y, y - (x-i-j) of them take on the value r. So for the sum S to take on some particular value s, the remaining x-i-j dice need to sum to s-r*(y+i+j-x). Further, these values of these dice are restricted to r+1, r+2, ...., z. The number of ways this can be done is the same as the number of ways to distribute s-r*(y+i+j-x) indistinguishable balls among x-i-j cells such that every cell has between r+1 and z balls. And this is the same as the number of ways to distribute s-r*(y+i+j-x)-(r+1)*(x-i-j) = s+i+j-x-r*y balls among x-i-j cells so that each cell has between 0 and z-r-1 balls. Using the principle of inclusion-exclusion, the number of ways to distribute B indistinguishable balls among C cells so that each cell has between 0 and D balls is $$N(B;C,D) = \sum_{k=0}^C (-1)^k \binom{C}{k} \binom{B-k(D+1) + C-1}{C-1}$$ for C > 0. We take N(0;0,D) to be 1. So for each pair i,j, there are $$\frac{x!}{i!j!(x-i-j)!}$$ ways to separate the x dice into those that are less than, equal to, and greater than r. Then there are (r-1)i ways to choose the values of those less than r. And finally there are N(s+i+j-x-r*y; x-i-j, z-r-1) ways for the y highest to sum to a particular value s. So, summing the resulting expression over the allowed values of r, i, and j results in $$\sum_{r=1}^{z}\sum_{i=0}^{x-y}\sum_{j=x-y-i+1}^{x-i}\frac{x!}{i!j!(x-i-j)!}(r-1)^i N(s+i+j-x-ry; x-i-j, z-r-1)$$ ways for the sum S of the y highest of x z-sided dice to take on the value s.