Economics2012 said:
Homework Statement
If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each.
Homework Equations
Mean = u = Exp(x) = e(x)
St dev worked out by the variation.
St Dev = square root of the variation.
Variation )5 columns - X, X-U, X-U squared, p(x) and (x-u)squared multiplied by p(x)
The Attempt at a Solution
If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each. I got a mean of 5.5 and a std dev of 2.872 when I worked this out.
If I am asked then to conduct a sampling distribution of the mean item number? with a sample size 30? how would you do this? Is it just finding the mean and std error or is there a few steps?
what new mean and standard error would it be? would it be still mean of 5.5 and std error of 0.5244? (old std dev/sq of 30)
This is as much as I can work out? I'm stuck basically on the conduct of a sampling distribution of the mean item?
If I posted this wrong can somebody tell me, I don't want a warning :)
Why do you keep using question marks at the ends of sentences that are not questions? (That is a really, really annoying bad habit.) Now on to your questions.
First you need to decide whether the sampling is "with replacement" or "without replacement".
In sampling with replacement we put each sampled item back into the box before drawing out the net item; and we shake up the box vigorously, or in some other way ensure randomness, before each drawing. This ensures that the drawings are independent of each other, and makes the subsequent analysis much easier.
In sampling without replacement we draw out items one-by-one but do not put them back in the box. Therefore, the successive drawings are not independent, because, for example, if I get the number '1' on my first draw there are now 999 items left in the box and 9 of them are labelled '1'. That changes the probabilities on the next draw, etc. The sampling problem without replacement is harder to deal with.
So, let's take the case of sampling with replacement. For a sample of size n (n = 30 in your example) the mean number drawn is the sample average
M = \frac{X_1 + X_2 + \cdots + X_n}{n}, where X_i is the number picked in the ith draw. Here the random variables X_1, X_2, \ldots, X_n are independent and all have the same distribution P\{X_i = k \} = 1/10, \; k = 1, 2, \ldots, 10. There are some basic facts that you can find in books or on-line: (1) the expectation of a sum is the sum of the expectations; (2) the expectation of cX is c times the expectation of X; (3) the variance of a sum of
independent random variables is the sum of the variances; and (4) the variance of cX is c^2 times the variance of X. We have E(X_i) = 5.5 \text{ and } \text{Var}(X_i) = 44/4 = 8.25, so
E(M) = \frac{n 5.5}{n} = 5.5 \text{ and } \text{Var}(M) = \frac{n 8.25}{n^2} = \frac{8.25}{n}. For n = 30 the variance is 8.25/30 = 0.275 and the standard deviation is the square root of this, which = 0.5244.
What about the distribution of M? The exact distribution can (for given n) be obtained numerically by recursive methods, but there is no really easy way of getting it. However, if n is 'large', say >= 20, the distribution of M is approximately normal with mean 5.5 and variance 8.25/n; the approximation will be good enough for practical purposes if we stay with 2-3 standard deviations from the mean. So, for example, if n = 30 and you want P\{ M \leq 6.5 \}, you can use the fact that 6.5 = 5.5 + k(0.5344), where k = 1/0.5244 ~ 1.9069, so the required probability is approximately P{N(0,1) <= 1.9069} = 0.9717.
The so-called Central Limit Theorem guarantees that in the limit of large n, the appropriately-normalized version of M (sqrt(n)*M in this case) converges in distribution to a standard normal, meaning that its distribution converges to that of N(0,1). That justifies the use of a normal approximation for large n.
Gook luck working with the case of sampling without replacement; it can be done, but it is a lot more complicated and would require large amounts of computation to get numerical answers.
RGV