# Sampling distribution of mean item number?

1. Apr 12, 2012

### Economics2012

1. The problem statement, all variables and given/known data

If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each.

2. Relevant equations
Mean = u = Exp(x) = e(x)
St dev worked out by the variation.
St Dev = square root of the variation.
Variation )5 columns - X, X-U, X-U squared, p(x) and (x-u)squared multiplied by p(x)

3. The attempt at a solution
If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each. I got a mean of 5.5 and a std dev of 2.872 when I worked this out.

If I am asked then to conduct a sampling distribution of the mean item number? with a sample size 30? how would you do this? Is it just finding the mean and std error or is there a few steps?

what new mean and standard error would it be? would it be still mean of 5.5 and std error of 0.5244? (old std dev/sq of 30)

This is as much as I can work out? I'm stuck basically on the conduct of a sampling distribution of the mean item?

If I posted this wrong can somebody tell me, I don't want a warning :)

2. Apr 12, 2012

### HallsofIvy

Staff Emeritus
"Central Limit Theorem": If you have n samples from any distribution, with mean $\mu$, and standard deviation $\sigma$, then the mean value of the samples is approximately normal with mean $\mu$ and standard deviation $\sqrt{n}\sigma$. The larger n is the better the approximation.

3. Apr 12, 2012

### Economics2012

Sorry, I don't understand what you mean?

4. Apr 12, 2012

### Ray Vickson

Why do you keep using question marks at the ends of sentences that are not questions? (That is a really, really annoying bad habit.) Now on to your questions.

First you need to decide whether the sampling is "with replacement" or "without replacement".

In sampling with replacement we put each sampled item back into the box before drawing out the net item; and we shake up the box vigorously, or in some other way ensure randomness, before each drawing. This ensures that the drawings are independent of each other, and makes the subsequent analysis much easier.

In sampling without replacement we draw out items one-by-one but do not put them back in the box. Therefore, the successive drawings are not independent, because, for example, if I get the number '1' on my first draw there are now 999 items left in the box and 9 of them are labelled '1'. That changes the probabilities on the next draw, etc. The sampling problem without replacement is harder to deal with.

So, let's take the case of sampling with replacement. For a sample of size n (n = 30 in your example) the mean number drawn is the sample average
$$M = \frac{X_1 + X_2 + \cdots + X_n}{n},$$ where $X_i$ is the number picked in the ith draw. Here the random variables $X_1, X_2, \ldots, X_n$ are independent and all have the same distribution $P\{X_i = k \} = 1/10, \; k = 1, 2, \ldots, 10.$ There are some basic facts that you can find in books or on-line: (1) the expectation of a sum is the sum of the expectations; (2) the expectation of cX is c times the expectation of X; (3) the variance of a sum of independent random variables is the sum of the variances; and (4) the variance of cX is c^2 times the variance of X. We have $E(X_i) = 5.5 \text{ and } \text{Var}(X_i) = 44/4 = 8.25,$ so
$$E(M) = \frac{n 5.5}{n} = 5.5 \text{ and } \text{Var}(M) = \frac{n 8.25}{n^2} = \frac{8.25}{n}.$$ For n = 30 the variance is 8.25/30 = 0.275 and the standard deviation is the square root of this, which = 0.5244.

What about the distribution of M? The exact distribution can (for given n) be obtained numerically by recursive methods, but there is no really easy way of getting it. However, if n is 'large', say >= 20, the distribution of M is approximately normal with mean 5.5 and variance 8.25/n; the approximation will be good enough for practical purposes if we stay with 2-3 standard deviations from the mean. So, for example, if n = 30 and you want $P\{ M \leq 6.5 \},$ you can use the fact that 6.5 = 5.5 + k(0.5344), where k = 1/0.5244 ~ 1.9069, so the required probability is approximately P{N(0,1) <= 1.9069} = 0.9717.

The so-called Central Limit Theorem guarantees that in the limit of large n, the appropriately-normalized version of M (sqrt(n)*M in this case) converges in distribution to a standard normal, meaning that its distribution converges to that of N(0,1). That justifies the use of a normal approximation for large n.

Gook luck working with the case of sampling without replacement; it can be done, but it is a lot more complicated and would require large amounts of computation to get numerical answers.

RGV

Last edited: Apr 12, 2012
5. Apr 12, 2012

### Economics2012

Thank you very much, your too good.

Can I ask you one more thing

What would be the probability of drawing 30 tiles and obtaining a meal tile number of less than 4?

I keep getting .9332 by using the tables of the normal distribution, I know that's probably wrong though :/

Last edited: Apr 12, 2012
6. Apr 12, 2012

### Ray Vickson

Show your work; otherwise it is impossible for me to tell you what you have done wrong.

RGV

7. Apr 12, 2012

### Economics2012

I knew it had to do with sampling, so I just took 4 from 5.5 and got 1.5 on the http://www.cs.washington.edu/homes/jrl/normal_cdf.pdf here and thats how I got it, I think that's more than likely completely incorrect.
P(M>4), that's what I thought you did. Am i way off?

8. Apr 12, 2012

### Ray Vickson

I did not "do" anything like saying P(M>4); YOU did that. Anyway, you need P{M < 4}, and using the Normal approximation we need the probability that Z < z, where z = (4 - 5.5)/(0.5244) = -2.8604; that is, P{Z < -2.8604}. The normal approximation might be dicey in this case because we are right near or a bit beyond the limit of where the normal approximation is trustworthy when n is not very, very large.

Before calculating anything, it is always a good idea to get a "feel" for the range of an answer. In this case you want P{M < 4} and 4 is less than the population mean 5.5. Therefore, the probability of falling below 4 will be < 1/2. That should be enough to warn you that an answer like 0.9332 cannot possibly be right.

Note added in editing: we can compute the exact answer and the normal approximation and compare them. For n = 30 and M = S/30 <= 4 we have the sum S <= 120; for M < 4 we have S < 120, so S <= 119 (because the values of the sum S are integers). I am not sure whether or not you want M <= 4 or M < 4; it makes a difference in this case.

$$P_{exact}\{ S \leq 120 \} = 0.002155197756, \; P_{normal}\{ S \leq 120 \} = 0.002115616446$$
and
$$P_{exact}\{ S \leq 119 \} = 0.001746718891, \; P_{normal} \{ S \leq 119 \} = 0.001728090515.$$

RGV

Last edited: Apr 12, 2012
9. Apr 13, 2012

### Economics2012

Thank you so much :)

Can I ask why you use 120 here?

Last edited: Apr 13, 2012