Hi all,(adsbygoogle = window.adsbygoogle || []).push({});

I have a stats problem I'm trying to figure out.

Suppose I have a very large population (~millions) of colored balls with exactly 50% red, 30% green, 20% blue. If I take a random sample of 1000 of these balls, the distribution of colors I end up with can be modeled as amultivariate normal distribution, with eachdependentvariable denoting the number of red, green and blue balls

[r g b].

I can calculate the expected values and variances for each variable:

The means are (obviously):Mu = [500 300 200]

And each variance will be given by,(100-P)*P/n

Where P is the probability of picking a particular color (e.g. 0.5 for red) and n is the sample size, 1000 in this case. Note that this is a valid approximation as long as the population I am sampling is large compared to the sample size.

However, I'm not sure how to determine the covariance matrix. I know the diagonal elements are the variances[edit: should be standard deviations, not variances]of each variable, so that's easy. However, given that the variables must add up to 1000 (r+g+b=1000), and knowing the means and variances of each, shouldn't I be able to use an analytical expression to determine the covariances?

Intuitively, I know that for every extra red ball I pick, this translates to 300/(300+200) probability that there will be one less green ball, and 200/(300+200) probability that there will be one less blue ball, in the final distribution.

What am I missing here? I'm about to perform a Monte Carlo simulation to empirically determine the covariances, but I'm sure there's a neater way!

Thanks in advance!

**Physics Forums | Science Articles, Homework Help, Discussion**

The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

# Determining the covariance matrix of a multivariate normal distribution

**Physics Forums | Science Articles, Homework Help, Discussion**