Sampling distribution of mean item number?

Economics2012 · Apr 12, 2012

Homework Statement

If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each.

Homework Equations

Mean = u = Exp(x) = e(x)
St dev worked out by the variation.
St Dev = square root of the variation.
Variation )5 columns - X, X-U, X-U squared, p(x) and (x-u)squared multiplied by p(x)

The Attempt at a Solution

If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each. I got a mean of 5.5 and a std dev of 2.872 when I worked this out.

If I am asked then to conduct a sampling distribution of the mean item number? with a sample size 30? how would you do this? Is it just finding the mean and std error or is there a few steps?

what new mean and standard error would it be? would it be still mean of 5.5 and std error of 0.5244? (old std dev/sq of 30)

This is as much as I can work out? I'm stuck basically on the conduct of a sampling distribution of the mean item?If I posted this wrong can somebody tell me, I don't want a warning :)

HallsofIvy · Apr 12, 2012

"Central Limit Theorem": If you have n samples from any distribution, with mean \mu, and standard deviation \sigma, then the mean value of the samples is approximately normal with mean \mu and standard deviation \sqrt{n}\sigma. The larger n is the better the approximation.

Economics2012 · Apr 12, 2012

Sorry, I don't understand what you mean?

Ray Vickson · Apr 12, 2012

Economics2012 said:

Homework Statement

If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each.

Homework Equations

Mean = u = Exp(x) = e(x)
St dev worked out by the variation.
St Dev = square root of the variation.
Variation )5 columns - X, X-U, X-U squared, p(x) and (x-u)squared multiplied by p(x)

The Attempt at a Solution

If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each. I got a mean of 5.5 and a std dev of 2.872 when I worked this out.

If I am asked then to conduct a sampling distribution of the mean item number? with a sample size 30? how would you do this? Is it just finding the mean and std error or is there a few steps?

what new mean and standard error would it be? would it be still mean of 5.5 and std error of 0.5244? (old std dev/sq of 30)

This is as much as I can work out? I'm stuck basically on the conduct of a sampling distribution of the mean item?

If I posted this wrong can somebody tell me, I don't want a warning :)

Why do you keep using question marks at the ends of sentences that are not questions? (That is a really, really annoying bad habit.) Now on to your questions.

First you need to decide whether the sampling is "with replacement" or "without replacement".

In sampling with replacement we put each sampled item back into the box before drawing out the net item; and we shake up the box vigorously, or in some other way ensure randomness, before each drawing. This ensures that the drawings are independent of each other, and makes the subsequent analysis much easier.

In sampling without replacement we draw out items one-by-one but do not put them back in the box. Therefore, the successive drawings are not independent, because, for example, if I get the number '1' on my first draw there are now 999 items left in the box and 9 of them are labelled '1'. That changes the probabilities on the next draw, etc. The sampling problem without replacement is harder to deal with.

So, let's take the case of sampling with replacement. For a sample of size n (n = 30 in your example) the mean number drawn is the sample average
M = \frac{X_1 + X_2 + \cdots + X_n}{n}, where X_i is the number picked in the ith draw. Here the random variables X_1, X_2, \ldots, X_n are independent and all have the same distribution P\{X_i = k \} = 1/10, \; k = 1, 2, \ldots, 10. There are some basic facts that you can find in books or on-line: (1) the expectation of a sum is the sum of the expectations; (2) the expectation of cX is c times the expectation of X; (3) the variance of a sum of independent random variables is the sum of the variances; and (4) the variance of cX is c^2 times the variance of X. We have E(X_i) = 5.5 \text{ and } \text{Var}(X_i) = 44/4 = 8.25, so
E(M) = \frac{n 5.5}{n} = 5.5 \text{ and } \text{Var}(M) = \frac{n 8.25}{n^2} = \frac{8.25}{n}. For n = 30 the variance is 8.25/30 = 0.275 and the standard deviation is the square root of this, which = 0.5244.

What about the distribution of M? The exact distribution can (for given n) be obtained numerically by recursive methods, but there is no really easy way of getting it. However, if n is 'large', say >= 20, the distribution of M is approximately normal with mean 5.5 and variance 8.25/n; the approximation will be good enough for practical purposes if we stay with 2-3 standard deviations from the mean. So, for example, if n = 30 and you want P\{ M \leq 6.5 \}, you can use the fact that 6.5 = 5.5 + k(0.5344), where k = 1/0.5244 ~ 1.9069, so the required probability is approximately P{N(0,1) <= 1.9069} = 0.9717.

The so-called Central Limit Theorem guarantees that in the limit of large n, the appropriately-normalized version of M (sqrt(n)*M in this case) converges in distribution to a standard normal, meaning that its distribution converges to that of N(0,1). That justifies the use of a normal approximation for large n.

Gook luck working with the case of sampling without replacement; it can be done, but it is a lot more complicated and would require large amounts of computation to get numerical answers.

RGV

Economics2012 · Apr 12, 2012

Thank you very much, your too good.

Sorry about the question marks.

Can I ask you one more thing

What would be the probability of drawing 30 tiles and obtaining a meal tile number of less than 4?

I keep getting .9332 by using the tables of the normal distribution, I know that's probably wrong though :/

Ray Vickson · Apr 12, 2012

Economics2012 said:

Thank you very much, your too good.

Sorry about the question marks.

Can I ask you one more thing

What would be the probability of drawing 30 tiles and obtaining a meal tile number of less than 4?

I keep getting .9332 by using the tables of the normal distribution, I know that's probably wrong though :/

Show your work; otherwise it is impossible for me to tell you what you have done wrong.

RGV

Economics2012 · Apr 12, 2012

I knew it had to do with sampling, so I just took 4 from 5.5 and got 1.5 on the http://www.cs.washington.edu/homes/jrl/normal_cdf.pdf here and that's how I got it, I think that's more than likely completely incorrect.
P(M>4), that's what I thought you did. Am i way off?

Ray Vickson · Apr 12, 2012

Economics2012 said:

I knew it had to do with sampling, so I just took 4 from 5.5 and got 1.5 on the http://www.cs.washington.edu/homes/jrl/normal_cdf.pdf here and that's how I got it, I think that's more than likely completely incorrect.
P(M>4), that's what I thought you did. Am i way off?

I did not "do" anything like saying P(M>4); YOU did that. Anyway, you need P{M < 4}, and using the Normal approximation we need the probability that Z < z, where z = (4 - 5.5)/(0.5244) = -2.8604; that is, P{Z < -2.8604}. The normal approximation might be dicey in this case because we are right near or a bit beyond the limit of where the normal approximation is trustworthy when n is not very, very large.

Before calculating anything, it is always a good idea to get a "feel" for the range of an answer. In this case you want P{M < 4} and 4 is less than the population mean 5.5. Therefore, the probability of falling below 4 will be < 1/2. That should be enough to warn you that an answer like 0.9332 cannot possibly be right.

Note added in editing: we can compute the exact answer and the normal approximation and compare them. For n = 30 and M = S/30 <= 4 we have the sum S <= 120; for M < 4 we have S < 120, so S <= 119 (because the values of the sum S are integers). I am not sure whether or not you want M <= 4 or M < 4; it makes a difference in this case.

P_{exact}\{ S \leq 120 \} = 0.002155197756, \; P_{normal}\{ S \leq 120 \} = 0.002115616446
and
P_{exact}\{ S \leq 119 \} = 0.001746718891, \; P_{normal} \{ S \leq 119 \} = 0.001728090515.

RGV

Economics2012 · Apr 13, 2012

Thank you so much :)

Can I ask why you use 120 here?

Sampling distribution of mean item number?

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

Similar threads

Hot Threads

Geometry: Similar Shapes

Length of Diagonal

Eliminate ##\theta## between a pair of given equations

Approximate value of ##E=1/2! +1/3!+1/4!+... ##

Find integer points on this equation

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective