Sample size without standard Deviation

Click For Summary

Discussion Overview

The discussion revolves around determining an appropriate sample size for estimating the mean annual income of natives in New York, specifically when the standard deviation is unknown. Participants explore various methods and assumptions related to sample size calculation, confidence intervals, and the implications of different income distributions.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant suggests using the endpoints of the income distribution (0 to $120,000) to estimate the mean and proposes finding the standard deviation based on how many standard deviations encompass 99% of the population.
  • Another participant argues that the question cannot be answered definitively without additional information about the distribution of incomes, highlighting that different standard deviations would significantly affect the required sample size.
  • A participant references Chebyshev's inequality to derive a minimum standard deviation based on the requirement of being correct within $1000 with 99% probability, concluding that the smallest standard deviation that would work is $500.
  • One participant offers a crude approximation for the standard deviation based on the range of the distribution, suggesting a factor of 6 instead of 4 for better accuracy.
  • Another participant notes that if the true standard deviation were known, a specific formula for sample size could be used, but emphasizes that the situation is more complex when the standard deviation is unknown.

Areas of Agreement / Disagreement

Participants express differing views on how to approach the problem, with no consensus on a single method or solution. Some propose using inequalities and approximations, while others emphasize the need for more information to arrive at a definitive answer.

Contextual Notes

Limitations include the lack of information regarding the actual distribution of incomes and the assumptions made about the standard deviation based on the endpoints of the distribution.

Fear_of_Math
Messages
7
Reaction score
0
Hello again,

I have a question here that asks me to find how large a sample size is, but I have no Standard deviation. How would you tackle this>

How large a sample size do we need to estimate the mean annual income of natives in New York, correct to within $1000 with probability 0.99? No information is available to us about the standard deviation of their annual income. We guess that nearly all of the incomes fall between $0 and $120,000 and that this distribution is approximately normal.

Here's what I see:
1 - alpha = 0.99 therefore alpha =0.01 /2 = 0.005
This gives a Z* of 2.575 (because it states normal distribution)
The 99% CI is (0, 120000).

I know that n = [Z*s/m]squared, buty I have neither s, nor m...

As always, the feedback and guidance is appreciated =)
 
Physics news on Phys.org
Practically the endpoints of the distribution are 0 and 120K. Normal dist. is symmetric, so you can figure out the mean. As for standard dev., I would assume 99% of the people are within 0 to 120K, and find out how many standard deviations it would take to get 99% of people (within ___ standard deviations around the mean).
 
I don't think this question can be answered without further information. Suppose that the income is distributed with a mean of $60,000, and a standard deviation of $1. After a small number of observations we would learn that the std. dev. is small, and realize we don't need to take many more samples.

On the other hand suppose that the income is distributed with a mean of $60,000 and a standard deviation of $20,000. In that case we'd have to take a much larger number of samples to achieve the same confidence.
 
The statement "correct to within $1000 with probability 0.99" implies a standard deviation by Chebyshev's inequality: The probability an observation is with k standard deviations of the mean is less than 1/k^2. The largest k that has 1/k^2< .99 is 2 so 1000 must be no more than 2 standard deviations. The smallest standard deviation that will work is $500.
 
You can also try use the (very crude) approximation that

<br /> \sigma \approx \frac{\text{Range}}{4}<br />

presented in some texts. I suggest to students to use 6 rather than 4.
 
HallsofIvy said:
The statement "correct to within $1000 with probability 0.99" implies a standard deviation by Chebyshev's inequality: The probability an observation is with k standard deviations of the mean is less than 1/k^2. The largest k that has 1/k^2&lt; .99 is 2 so 1000 must be no more than 2 standard deviations. The smallest standard deviation that will work is $500.

Chebyshev's inequality says the probability an observation is _not_ within k std. dev. of the mean is <= 1/k^2.
 
Had the true std. dev. (σ) been known, you'd use N = (zσ/x)^2, where x is the margin of error = $1,000 (or x = 1 if you express everything in $1,000). When σ is unknown the process is more complicated and you may have to iterate. This page explains how.
 
Last edited:

Similar threads

  • · Replies 24 ·
Replies
24
Views
7K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 31 ·
2
Replies
31
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 21 ·
Replies
21
Views
4K
Replies
11
Views
94K