Sampling distribution of mean item number?

AI Thread Summary
The discussion focuses on calculating the sampling distribution of the mean from a box of 1,000 items, each labeled from 1 to 10, demonstrating a discrete uniform probability distribution. The mean is established at 5.5, with a standard deviation of 2.872, leading to a standard error of 0.5244 for a sample size of 30. It emphasizes the importance of determining whether sampling is with or without replacement, as this affects the independence of samples and subsequent calculations. The Central Limit Theorem is referenced, indicating that the distribution of sample means approaches normality with larger sample sizes. The conversation also touches on calculating probabilities related to the sample mean, highlighting the need for careful interpretation of results.
Economics2012
Messages
9
Reaction score
0

Homework Statement



If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each.

Homework Equations


Mean = u = Exp(x) = e(x)
St dev worked out by the variation.
St Dev = square root of the variation.
Variation )5 columns - X, X-U, X-U squared, p(x) and (x-u)squared multiplied by p(x)

The Attempt at a Solution


If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each. I got a mean of 5.5 and a std dev of 2.872 when I worked this out.

If I am asked then to conduct a sampling distribution of the mean item number? with a sample size 30? how would you do this? Is it just finding the mean and std error or is there a few steps?

what new mean and standard error would it be? would it be still mean of 5.5 and std error of 0.5244? (old std dev/sq of 30)

This is as much as I can work out? I'm stuck basically on the conduct of a sampling distribution of the mean item?If I posted this wrong can somebody tell me, I don't want a warning :)
 
Physics news on Phys.org
"Central Limit Theorem": If you have n samples from any distribution, with mean \mu, and standard deviation \sigma, then the mean value of the samples is approximately normal with mean \mu and standard deviation \sqrt{n}\sigma. The larger n is the better the approximation.
 
Sorry, I don't understand what you mean?
 
Economics2012 said:

Homework Statement



If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each.


Homework Equations


Mean = u = Exp(x) = e(x)
St dev worked out by the variation.
St Dev = square root of the variation.
Variation )5 columns - X, X-U, X-U squared, p(x) and (x-u)squared multiplied by p(x)

The Attempt at a Solution


If you have a box of 1,000 items, with numbers 1-10 on them, 100 for each!
And this proves the discrete uniform probability distribution.
1/10 for each. I got a mean of 5.5 and a std dev of 2.872 when I worked this out.

If I am asked then to conduct a sampling distribution of the mean item number? with a sample size 30? how would you do this? Is it just finding the mean and std error or is there a few steps?

what new mean and standard error would it be? would it be still mean of 5.5 and std error of 0.5244? (old std dev/sq of 30)

This is as much as I can work out? I'm stuck basically on the conduct of a sampling distribution of the mean item?





If I posted this wrong can somebody tell me, I don't want a warning :)

Why do you keep using question marks at the ends of sentences that are not questions? (That is a really, really annoying bad habit.) Now on to your questions.

First you need to decide whether the sampling is "with replacement" or "without replacement".

In sampling with replacement we put each sampled item back into the box before drawing out the net item; and we shake up the box vigorously, or in some other way ensure randomness, before each drawing. This ensures that the drawings are independent of each other, and makes the subsequent analysis much easier.

In sampling without replacement we draw out items one-by-one but do not put them back in the box. Therefore, the successive drawings are not independent, because, for example, if I get the number '1' on my first draw there are now 999 items left in the box and 9 of them are labelled '1'. That changes the probabilities on the next draw, etc. The sampling problem without replacement is harder to deal with.

So, let's take the case of sampling with replacement. For a sample of size n (n = 30 in your example) the mean number drawn is the sample average
M = \frac{X_1 + X_2 + \cdots + X_n}{n}, where X_i is the number picked in the ith draw. Here the random variables X_1, X_2, \ldots, X_n are independent and all have the same distribution P\{X_i = k \} = 1/10, \; k = 1, 2, \ldots, 10. There are some basic facts that you can find in books or on-line: (1) the expectation of a sum is the sum of the expectations; (2) the expectation of cX is c times the expectation of X; (3) the variance of a sum of independent random variables is the sum of the variances; and (4) the variance of cX is c^2 times the variance of X. We have E(X_i) = 5.5 \text{ and } \text{Var}(X_i) = 44/4 = 8.25, so
E(M) = \frac{n 5.5}{n} = 5.5 \text{ and } \text{Var}(M) = \frac{n 8.25}{n^2} = \frac{8.25}{n}. For n = 30 the variance is 8.25/30 = 0.275 and the standard deviation is the square root of this, which = 0.5244.

What about the distribution of M? The exact distribution can (for given n) be obtained numerically by recursive methods, but there is no really easy way of getting it. However, if n is 'large', say >= 20, the distribution of M is approximately normal with mean 5.5 and variance 8.25/n; the approximation will be good enough for practical purposes if we stay with 2-3 standard deviations from the mean. So, for example, if n = 30 and you want P\{ M \leq 6.5 \}, you can use the fact that 6.5 = 5.5 + k(0.5344), where k = 1/0.5244 ~ 1.9069, so the required probability is approximately P{N(0,1) <= 1.9069} = 0.9717.

The so-called Central Limit Theorem guarantees that in the limit of large n, the appropriately-normalized version of M (sqrt(n)*M in this case) converges in distribution to a standard normal, meaning that its distribution converges to that of N(0,1). That justifies the use of a normal approximation for large n.

Gook luck working with the case of sampling without replacement; it can be done, but it is a lot more complicated and would require large amounts of computation to get numerical answers.

RGV
 
Last edited:
Thank you very much, your too good.

Sorry about the question marks.

Can I ask you one more thing

What would be the probability of drawing 30 tiles and obtaining a meal tile number of less than 4?

I keep getting .9332 by using the tables of the normal distribution, I know that's probably wrong though :/
 
Last edited:
Economics2012 said:
Thank you very much, your too good.

Sorry about the question marks.

Can I ask you one more thing

What would be the probability of drawing 30 tiles and obtaining a meal tile number of less than 4?

I keep getting .9332 by using the tables of the normal distribution, I know that's probably wrong though :/

Show your work; otherwise it is impossible for me to tell you what you have done wrong.

RGV
 
I knew it had to do with sampling, so I just took 4 from 5.5 and got 1.5 on the http://www.cs.washington.edu/homes/jrl/normal_cdf.pdf here and that's how I got it, I think that's more than likely completely incorrect.
P(M>4), that's what I thought you did. Am i way off?
 
Economics2012 said:
I knew it had to do with sampling, so I just took 4 from 5.5 and got 1.5 on the http://www.cs.washington.edu/homes/jrl/normal_cdf.pdf here and that's how I got it, I think that's more than likely completely incorrect.
P(M>4), that's what I thought you did. Am i way off?

I did not "do" anything like saying P(M>4); YOU did that. Anyway, you need P{M < 4}, and using the Normal approximation we need the probability that Z < z, where z = (4 - 5.5)/(0.5244) = -2.8604; that is, P{Z < -2.8604}. The normal approximation might be dicey in this case because we are right near or a bit beyond the limit of where the normal approximation is trustworthy when n is not very, very large.

Before calculating anything, it is always a good idea to get a "feel" for the range of an answer. In this case you want P{M < 4} and 4 is less than the population mean 5.5. Therefore, the probability of falling below 4 will be < 1/2. That should be enough to warn you that an answer like 0.9332 cannot possibly be right.

Note added in editing: we can compute the exact answer and the normal approximation and compare them. For n = 30 and M = S/30 <= 4 we have the sum S <= 120; for M < 4 we have S < 120, so S <= 119 (because the values of the sum S are integers). I am not sure whether or not you want M <= 4 or M < 4; it makes a difference in this case.

P_{exact}\{ S \leq 120 \} = 0.002155197756, \; P_{normal}\{ S \leq 120 \} = 0.002115616446
and
P_{exact}\{ S \leq 119 \} = 0.001746718891, \; P_{normal} \{ S \leq 119 \} = 0.001728090515.


RGV
 
Last edited:
Thank you so much :)

Can I ask why you use 120 here?
 
Last edited:
Back
Top