Hi all, I have a stats problem I'm trying to figure out. Suppose I have a very large population (~millions) of colored balls with exactly 50% red, 30% green, 20% blue. If I take a random sample of 1000 of these balls, the distribution of colors I end up with can be modeled as a multivariate normal distribution, with each dependent variable denoting the number of red, green and blue balls [r g b]. I can calculate the expected values and variances for each variable: The means are (obviously): Mu = [500 300 200] And each variance will be given by, (100-P)*P/n Where P is the probability of picking a particular color (e.g. 0.5 for red) and n is the sample size, 1000 in this case. Note that this is a valid approximation as long as the population I am sampling is large compared to the sample size. However, I'm not sure how to determine the covariance matrix. I know the diagonal elements are the variances [edit: should be standard deviations, not variances] of each variable, so that's easy. However, given that the variables must add up to 1000 (r+g+b=1000), and knowing the means and variances of each, shouldn't I be able to use an analytical expression to determine the covariances? Intuitively, I know that for every extra red ball I pick, this translates to 300/(300+200) probability that there will be one less green ball, and 200/(300+200) probability that there will be one less blue ball, in the final distribution. What am I missing here? I'm about to perform a Monte Carlo simulation to empirically determine the covariances, but I'm sure there's a neater way! Thanks in advance!