Estimating the Mean Christmas Spending: A Comparison of Two Sampling Methods

  • MHB
  • Thread starter mathmari
  • Start date
In summary, two employees, A and B, discuss the best procedure for estimating the average amount of money adults give for Christmas presents. A suggests questioning a random sample of people and using their average spend as the estimate, while B proposes separately surveying a sample of employed and unemployed people and using their respective averages. Both estimators are unbiased, but B's variance is smaller and thus considered a better estimate.
  • #1
mathmari
Gold Member
MHB
5,049
7
Hey! :eek:

The variable $ Y $ denotes the amount of money that an adult person gives out for Christmas presents.
The distribution of $ Y $ depends on whether the person is employed ($ E = 1 $) or not ($ E = 0 $).
It holds that $ P (E = 1) = p $, i.e a randomly selected person is employed with probability $ p $.

We have the following
\begin{align*}&E(Y\mid E=1)=\mu_1 \\ &V(Y\mid E=1)=\sigma_1^2 \\ &E(Y\mid E=0)=\mu_0 \\ &V(Y\mid E=0)=\sigma_0^2 \\ &E(Y)=\mu=p\mu_1+(1-p)\mu_0 \\ &V(Y)=\sigma^2=p\sigma_1^2+(1-p)\sigma_0^2+G\end{align*}
where \begin{equation*}G=p(\mu_1-\mu)^2+(1-p)(\mu_0-\mu)^2\geq 0\end{equation*}

A research institute would like to estimate $\mu $ based on a $ n $-sized sample. The parameter $ p $ is known to the institute. Two employees of the institute, A and B, discuss the procedure.

  • A suggests questioning $n$ randomly selected people and using their average spend as an estimate for $\mu $.
  • B proposes to separately survey $ n p $ employed persons and $ n (1 - p) $ unemployed persons, and then use the estimator \begin{equation*} \overline{Y}_B = p \overline{Y}_1 + (1-p) \overline{Y}_0 \end{equation*} $ \overline {Y}_1 $ and $ \overline{Y}_0 $ are the average spend of the employed and non-employed persons, respectively. For the sake of simplicity, we assume that $ n p $ and $ n (1 - p) $ are integers.
If I understand correcly the proposition of B, we have a sample of soze $n$ with $np$ employed and $n(1-p)$ unemployed. $Y_{1i}$ is the answer that the employed perosn $i$ gives and $Y_{0i}$ is the answer that the unemployed person $i$ gives. We calculate the mean of what the employed people spend, according to the survey, and we define that average $\overline{Y}_1$, i.e. $ \overline{Y}_1=\frac{1}{np}\sum_{i=1}^{np}Y_{1i}$. Respectively, it holds that $ \overline{Y}_0=\frac{1}{n(1-p)}\sum_{i=1}^{n(1-p)}Y_{0i}$.
Adding these two results multiplied by the respective possibility we get the average of all people.

Have I understood that correctly? (Wondering) I want to calculate the expected values and the variances of the estimates of A and B.

How could we do that? Could you give me a hint? (Wondering)
 
Physics news on Phys.org
  • #2
Hey mathmari! (Wave)

For A we have $E(\overline {Y_A}) = E(Y)$ and $\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n}$ don't we?
And for B we have $E(\overline {Y_B}) = E(Y)$ and $\sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)}$ don't we? (Wondering)
 
  • #3
I like Serena said:
For A we have $E(\overline {Y_A}) = E(Y)$ and $\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n}$ don't we?
And for B we have $E(\overline {Y_B}) = E(Y)$ and $\sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)}$ don't we? (Wondering)

Does it hold that $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ because $\overline {Y_A}$ and $\overline {Y_B}$ describes respectlively the average amount of money?

Why do we not use here that $\overline{Y}_B=p \overline{Y_1} + (1-p) \overline{Y_0}$ ? From $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ we get that both estimators are unbiased, right? (Wondering) To check which estimate is better we have to compare the two variances, right? The variance of the estimate A is equal to the variance of the median of the amounts of money. Does this mean that this is better than the variance of the estaimate B?
Or can we not compare them? (Wondering)
 
  • #4
mathmari said:
Does it hold that $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ because $\overline {Y_A}$ and $\overline {Y_B}$ describes respectlively the average amount of money?

Why do we not use here that $\overline{Y}_B=p \overline{Y_1} + (1-p) \overline{Y_0}$ ?

If follows mathematically.
Let's go through the steps for B, using indeed the formula for $\overline{Y_B}$.
$$
E(\overline{Y_B})=E\left(p\overline{Y_1}+(1−p)\overline{Y_0}\right)
=pE(Y_1)+(1−p)E(Y_0)
=p\mu_1 + (1-p)\mu_0
=\mu
= E(Y)
$$
Yes? (Thinking)

mathmari said:
From $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ we get that both estimators are unbiased, right?

Yep.

mathmari said:
To check which estimate is better we have to compare the two variances, right? The variance of the estimate A is equal to the variance of the median of the amounts of money. Does this mean that this is better than the variance of the estaimate B?
Or can we not compare them? (Wondering)

Let's compare them.

$$\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n} = \frac{p\sigma_1^2+(1-p)\sigma_0^2+G}{n} \\
\sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)}
=\frac{p\sigma_1^2 + (1-p)\sigma_0^2}{n}
$$
So $\sigma^2(\overline {Y_B})$ is smaller than $\sigma^2(\overline {Y_A})$ isn't it? (Wondering)

And if the standard deviations $\sigma_1$ and $\sigma_0$ are comparable or smaller than $|\mu_1 - \mu_0|$, then the standard deviation of B will be much smaller than the one of A.
That's assuming that both $p$ and $1-p$ are significantly greater than 0.
 
  • #5
I like Serena said:
If follows mathematically.
Let's go through the steps for B, using indeed the formula for $\overline{Y_B}$.
$$
E(\overline{Y_B})=E\left(p\overline{Y_1}+(1−p)\overline{Y_0}\right)
=pE(Y_1)+(1−p)E(Y_0)
=p\mu_1 + (1-p)\mu_0
=\mu
= E(Y)
$$

Ah ok!
I like Serena said:
Let's compare them.

$$\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n} = \frac{p\sigma_1^2+(1-p)\sigma_0^2+G}{n} \\
\sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)}
=\frac{p\sigma_1^2 + (1-p)\sigma_0^2}{n}
$$
So $\sigma^2(\overline {Y_B})$ is smaller than $\sigma^2(\overline {Y_A})$ isn't it? (Wondering)

And if the standard deviations $\sigma_1$ and $\sigma_0$ are comparable or smaller than $|\mu_1 - \mu_0|$, then the standard deviation of B will be much smaller than the one of A.
That's assuming that both $p$ and $1-p$ are significantly greater than 0.
So, since the variance of A is bigger than that of B, it becomes clear that according to A's estimates, there will be larger fluctuations as eith the estimation of B. Thus, the estimate of B is better, isn't it? (Wondering)
 
  • #6
mathmari said:
Ah ok!

So, since the variance of A is bigger than that of B, it becomes clear that according to A's estimates, there will be larger fluctuations as eith the estimation of B. Thus, the estimate of B is better, isn't it?

Yep. (Nod)
 
  • #7
I like Serena said:
Yep. (Nod)

Ok! Thank you very much! (Yes)
 

What is an expected value?

An expected value is the average outcome of a random variable over a large number of trials. It is calculated by multiplying each possible outcome by its probability and then summing all of these values together.

What is a variance?

A variance measures how spread out the data is from the expected value. It is calculated by taking the difference between each data point and the expected value, squaring it, and then averaging all of these values. A higher variance indicates a larger spread of values, while a lower variance indicates a more clustered set of values.

How do expected values and variances relate to each other?

The expected value and variance are closely related as they both provide information about the central tendency and spread of a data set. The expected value helps to understand the average outcome, while the variance helps to understand the variability of the outcomes around that average.

What is the significance of expected values and variances in statistics?

Expected values and variances are important in statistics because they help to describe and summarize a data set. They provide information about the central tendency and spread of the data, which can be used to make predictions and draw conclusions about a population. They are also used in many statistical tests and models to analyze data and make inferences.

How can expected values and variances be calculated?

Expected values and variances can be calculated using mathematical formulas, depending on the type of data and distribution being analyzed. For example, the expected value of a discrete random variable is calculated by multiplying each possible outcome by its probability, while the variance is calculated by taking the sum of squared deviations from the expected value. For continuous variables, integrals may be used to calculate expected values and variances.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
926
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
3K
  • Math Proof Training and Practice
2
Replies
61
Views
7K
  • Calculus and Beyond Homework Help
Replies
6
Views
3K
  • Math Proof Training and Practice
2
Replies
46
Views
5K
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
Back
Top