How Many Cafes Should Be Surveyed for Accurate Daily Turnover Estimates?

  • Context: MHB 
  • Thread starter Thread starter mathmari
  • Start date Start date
Click For Summary

Discussion Overview

The discussion revolves around estimating the required sample size for surveying cafes to achieve accurate daily turnover estimates. Participants explore statistical methods, including Chebyshev's inequality and the Central Limit Theorem, to determine how many cafes need to be surveyed to ensure the sample mean deviates from the expected value within a specified range. The conversation also touches on the implications of a surprising survey result and the role of hypothesis testing in this context.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant calculates that at least 125 cafes need to be surveyed based on Chebyshev's inequality, but acknowledges that this may be a broad estimate.
  • Another participant suggests a more precise calculation using the normal distribution, concluding that a sample size greater than 24 is needed for a 95% confidence interval.
  • Concerns are raised about the surprising result of a mean turnover of 690 Euros after surveying 500 cafes, with some participants questioning the accuracy of the assumed expected value and standard deviation.
  • Participants discuss the application of the Central Limit Theorem and the standard error of the sample mean in their calculations.
  • There is a proposal to calculate the probability of observing such a large deviation from the expected mean under the null hypothesis.
  • Some participants mention the significance level and p-values in the context of hypothesis testing, with discussions on how to interpret these results.
  • The Finite Population Correction Factor is introduced, with participants debating its implications for sample size and standard deviation when sampling a significant fraction of the population.
  • There is a contention regarding the limits of sampling from a finite population and the consequences of exceeding the population size.

Areas of Agreement / Disagreement

Participants express differing views on the adequacy of Chebyshev's inequality versus the normal distribution approach for determining sample size. There is also disagreement regarding the implications of the surprising survey result and how to interpret the statistical tests applied. The discussion remains unresolved on several points, particularly concerning the impact of the Finite Population Correction Factor and the limits of sampling.

Contextual Notes

Participants note limitations in their assumptions regarding the expected value and standard deviation, as well as the implications of sampling a significant fraction of the population. The discussion highlights the complexity of statistical inference in finite populations.

mathmari
Gold Member
MHB
Messages
4,984
Reaction score
7
Hey! :o

The daily turnover $X$ of Cafes has the expected value $ \mu_X = 600 $ Euro and the standard deviation $ \sigma_X = 30 $ Euro.

(a) How many cafes at least have to be surveyed in a random sample, so that $\overline{X}_n$ deviates from $\mu_X$ with a probability of at least $95\%$ by less than $12$ euros?

(b) After a survey of $500$ Cafes the arithmetic mean is $690$. Is this result surprising after the question (a) ?
I have done the following:

(a) From Chebyshev's inequality we have that \begin{equation*} P\left (\left |\overline{X}_n-E(\overline{X}_n)\right |< \epsilon\right )\geq 1-\frac{V(\overline{X}_n)}{\epsilon^2}\end{equation*} with $E(X_n)=\mu_X=600$ and $V(X_n)=\frac{\sigma_X^2}{n}=\frac{30^2}{n}=\frac{900}{n}$.

It must hold \begin{align*} P\left (\left |\overline{X}_n-600\right |< 12\right )\geq 95\% &\Rightarrow 1-\frac{V(\overline{X}_n)}{12^2}\geq 95\% \\ & \Rightarrow 1-\frac{\frac{900}{n}}{144}\geq 0.95 \\ & \Rightarrow 1-0.95\geq \frac{\frac{900}{n}}{144} \\ & \Rightarrow 0.05\geq \frac{25}{4n} \\ & \Rightarrow n\geq \frac{25}{4\cdot 0.05} \\ & \Rightarrow n\geq 125\end{align*}

That means that at least $125$ Cafes have to be surveyed. Is this correct?
(b) Why does not hold when number of the surveyed Cafes is $500$ ? (Wondering)
 
Physics news on Phys.org
Hey mathmari!

(a) Unfortunately Chebyshev's inequality only gives a very wide result.
We can do better. (Thinking)

Since both the expected value and the standard deviation are given, the distribution of the sample mean is $\overline X_n \sim N(\mu_X, \frac{\sigma_X}{\sqrt n})$.
Therefore we have:
$$P(|\overline X_n - \mu_X| < 12) = P\left(\frac{|\overline X_n - \mu_X|}{\sigma_X/\sqrt n} < \frac{12}{\sigma_X/\sqrt n}\right)
= P\left(|Z| < \frac{12}{\sigma_X/\sqrt n}\right)$$
The critical z-value for a confidence interval of $95\%$ is $z^*=1.96$.
Therefore the critical sample size $n^*$ satisfies:
$$z^* = \frac{12}{\sigma_X/\sqrt {n^*}}=1.96\quad\Rightarrow\quad n^*=24.0$$
Thus we need a sample size $n>24$. (Wink)(b) Are we surprised to see the much larger deviation $\overline x - \mu_X = 690-600=90$ when the number of surveyed Cafes is $n=500$?
Yes! It means that our assumed expected value and standard deviation are likely not correct!
Can we calculate the probability how confident we are that they are incorrect? (Wondering)
 
I like Serena said:
(a) Unfortunately Chebyshev's inequality only gives a very wide result.
We can do better. (Thinking)

Since both the expected value and the standard deviation are given, the distribution of the sample mean is $\overline X_n \sim N(\mu_X, \frac{\sigma_X}{\sqrt n})$.
Therefore we have:
$$P(|\overline X_n - \mu_X| < 12) = P\left(\frac{|\overline X_n - \mu_X|}{\sigma_X/\sqrt n} < \frac{12}{\sigma_X/\sqrt n}\right)
= P\left(|Z| < \frac{12}{\sigma_X/\sqrt n}\right)$$
The critical z-value for a confidence interval of $95\%$ is $z^*=1.96$.
Therefore the critical sample size $n^*$ satisfies:
$$z^* = \frac{12}{\sigma_X/\sqrt {n^*}}=1.96\quad\Rightarrow\quad n^*=24.0$$
Thus we need a sample size $n>24$. (Wink)

I understand! We use here the Central Limit Theorem, or not? (Wondering)
I like Serena said:
(b) Are we surprised to see the much larger deviation $\overline x - \mu_X = 690-600=90$ when the number of surveyed Cafes is $n=500$?
Yes! It means that our assumed expected value and standard deviation are likely not correct!
Can we calculate the probability how confident we are that they are incorrect? (Wondering)
How can we calculate that probability? (Wondering)
 
mathmari said:
I understand! We use here the Central Limit Theorem, or not?

That's one way to look at it.

Then again, suppose that $X_1, ..., X_n$ are indendent random variables with the same distribution $N(\mu_X,\sigma_X)$.
Then:
$$\sigma^2\left(\frac 1n(X_1+...+X_n)\right) = \frac 1{n^2}\left(\sigma^2(X_1) + ... + \sigma^2(X_n)\right) = \frac {\sigma_X^2}n
\quad\Rightarrow\quad \sigma(\overline X) = \frac{\sigma_X}{\sqrt n}
$$
Isn't it? (Wondering)

In other words, the standard deviation of the sample means $\sigma_{\overline X}$, which is also called the standard error $SE$, is:
$$SE=\sigma_{\overline X} = \frac{\sigma_X}{\sqrt n}$$

mathmari said:
How can we calculate that probability? (Wondering)

Pick the null hypothesis $H_0: \mu=600, \sigma=30$, and the alternative hypothesis $H_1: \mu\ne 600 \lor \sigma\ne 30$.
Then assuming $H_0$, $\overline X$ has the normal distribution $N(600, \frac{30}{\sqrt{500}})$ for a sample size of $n=500$.
What is:
$$P(|\overline X-600| > |690 - 600|)$$
? (Wondering)
 
I like Serena said:
That's one way to look at it.

Then again, suppose that $X_1, ..., X_n$ are indendent random variables with the same distribution $N(\mu_X,\sigma_X)$.
Then:
$$\sigma^2\left(\frac 1n(X_1+...+X_n)\right) = \frac 1{n^2}\left(\sigma^2(X_1) + ... + \sigma^2(X_n)\right) = \frac {\sigma_X^2}n
\quad\Rightarrow\quad \sigma(\overline X) = \frac{\sigma_X}{\sqrt n}
$$
Isn't it? (Wondering)

In other words, the standard deviation of the sample means $\sigma_{\overline X}$, which is also called the standard error $SE$, is:
$$SE=\sigma_{\overline X} = \frac{\sigma_X}{\sqrt n}$$

I see! (Nerd)
I like Serena said:
Pick the null hypothesis $H_0: \mu=600, \sigma=30$, and the alternative hypothesis $H_1: \mu\ne 600 \lor \sigma\ne 30$.
Then assuming $H_0$, $\overline X$ has the normal distribution $N(600, \frac{30}{\sqrt{500}})$ for a sample size of $n=500$.
What is:
$$P(|\overline X-600| > |690 - 600|)$$
? (Wondering)

Do we calculate this probability using the significance level? Or am I thinking in a wrong way? (Wondering)
 
mathmari said:
Do we calculate this probability using the significance level? Or am I thinking in a wrong way? (Wondering)

Sure. Let's pick $\alpha=0.05$, although I'm really interested in the so called p-value that we can compare with $\alpha$. (Thinking)
 
I like Serena said:
Sure. Let's pick $\alpha=0.05$, although I'm really interested in the so called p-value that we can compare with $\alpha$. (Thinking)

We have that $$P(|\overline X-600| > |690 - 600|)=P(|\overline X-600| > 90)=1-P(|\overline X-600| \leq 90)$$ Do we use here the distribution function of the normal distribution?

I got stuck right now. (Wondering)
 
mathmari said:
We have that $$P(|\overline X-600| > |690 - 600|)=P(|\overline X-600| > 90)=1-P(|\overline X-600| \leq 90)$$ Do we use here the distribution function of the normal distribution?

I got stuck right now. (Wondering)

Yep.
We have:
$$p = P(|\overline X-600| > |690 - 600|)=P\left(\frac{|\overline X-600|}{30/\sqrt{500}} > \frac{90}{30/\sqrt{500}}\right)
\approx P\left(|Z| > 67.1\right) \approx 0.00000
$$
Since $p<\alpha$, we can reject $H_0$.

Conclusion
The daily turnover $X$ of Cafes has an expected value $μ_X\ne 600$ Euro and/or standard deviation $σ_X\ne 30$ Euro ($p < 0.00000$). (Thinking)
 
Let's also not forget the Finite Population Correction Factor. 125/600 = 21%ish. The standard rule is about 5% max. Something must be done. If you have the right circumstances, your analysis can suggest sampling more than the entire population unless you correct it for the absence of an infinite population.
 
  • #10
tkhunny said:
Let's also not forget the Finite Population Correction Factor. 125/600 = 21%ish. The standard rule is about 5% max. Something must be done. If you have the right circumstances, your analysis can suggest sampling more than the entire population unless you correct it for the absence of an infinite population.

If we're sampling a significant fraction of the population, doesn't $\sigma_{\overline X}$ just go down even further?
Making the conclusion of rejecting the null hypothesis only more significant?
 
  • #11
I like Serena said:
If we're sampling a significant fraction of the population, doesn't $\sigma_{\overline X}$ just go down even further?
Making the conclusion of rejecting the null hypothesis only more significant?

You cannot extend this argument forever with a finite population.

1) Eventually, your sample will exceed your population.

2) Once you have sampled the entire population, you no longer have a sample. It's a census. There is no longer ANY sampling error.

3) At times, sampling is SO EXPENSIVE that it is simply not practical to proceed. I once assisted in a court case where we needed statistical significance that would convince the court. It was a population of about 450. Without the FPCF, the sample was calculated to be about 600! That's obviously no good. With the FPCF, an appropriate and convincing sample was about 14. MUCH better.

Even without the practical concerns, with a finite population, the assumption of Normality is less clearly met. Worrying about Bias in your estimator leads to a different and unbiased test statistic. Guess what one such test statistic is? :-)
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 10 ·
Replies
10
Views
3K
Replies
4
Views
2K
  • · Replies 20 ·
Replies
20
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 21 ·
Replies
21
Views
4K
  • · Replies 77 ·
3
Replies
77
Views
12K