Stratified Random Sampling vs Simple Random Sampling

  • I
  • Thread starter jaumzaum
  • Start date
  • #1
392
23
I'm beginning to learn statistics and I quite didn't understand the formula for the stratified random sampling.

Let's say we have a country with 3000 people, divided into 3 cities containing 1000 people. We want to know the proportion of women in the whole country so we decide to take a sample of 30 people, 10 from each town. Suppose each town has the same sample mean and variance.

Using the formula for the variance of stratified random sampling:
$$s^2 = \sum_{h=1}^{3} (\frac {N_h}{N})^2 \frac {s_h^2}{n_h}=\sum_{h=1}^{3} (\frac {100}{300})^2 \frac {s_h^2}{10}=\frac{s_h^2} {30} $$
But we know that when we have 3 equal samples os variance ##s_h^2## the resulting sample has ##s_h^2/3##

So what was my mistake?
 
  • Like
Likes Stephen Tashi

Answers and Replies

  • #2
mathman
Science Advisor
8,005
518
The formula looks funny. The term ##\frac{s_h^2}{n_h}## is dimensionally inconsistent with ##s^2##.
 
  • #4
Stephen Tashi
Science Advisor
7,713
1,519
This is a very good question.

But we know that when we have 3 equal samples os variance ##s_h^2## the resulting sample has ##s_h^2/3##

A sample is a set of observations Several samples "result" in several sets of observations, but this doesn't define how a particular numerical value results from those sets of observations.

Perhaps you are thinking of the case where we have 3 independent identically distributed random variables ##X_1,X_2,X_3##, each with variance ##{\sigma}^2##. If we define "the result" of ##X_1,X_2,X_3## to be the random variable ##Y = (1/3)X_1 + (1/3)X_2 + (1/3)X_3## then ##\sigma^2_Y = (1/9)\sigma^2 + (1/9)\sigma^2 + (1/9) \sigma^2 = (1/3) \sigma ^2##.

In the link you gave, the notation ##"{s_h}^2"## does not represent the ##\sigma^2## in the previous paragraph. A random sample of size 1 taken from the entire population of City h has variance ##{s_h}^2##. The random variable consisting of the mean of the sum of 10 such samples has variance ##{s_h}^2/10##.

The Wikipedia article is not well written. It begins by referring to "The mean and variance of stratified random sampling". However "mean" and "variance" are quantities that are defined for random variables. It only makes sense to speak of the mean and variance of a sampling procedure, if there is a universal understanding that the sampling procedure uses a particular random variable. I haven't done a wide survey of articles on stratified sampling, but I did find https://jkim.public.iastate.edu/teaching/book5.pdf
In example 5.1 of that text, the random variable associated with stratified sampling is, in the notation of the Wikipedia article, ##\hat{x} = \sum_{h=1}^L N_h \hat{ \bar{x}_h}## while the Wikipedia article apparently deals with the random variable ##\hat{\bar x} = (1/N) \hat{x}##. So, with respect to the including or omitting the constant ##1/N##, there is ambiguity about what random variable is associated with stratified sampling.

In the link I gave, you can alter "book5" to "book3" or "book4" to get information about the abbreviations used in "book5". (e.g. "SRS" for simple random sampling and "HT" for Hurwitz-Thompson )
 
Last edited:
  • Like
Likes FactChecker

Related Threads on Stratified Random Sampling vs Simple Random Sampling

  • Last Post
Replies
4
Views
3K
  • Last Post
Replies
7
Views
2K
  • Last Post
Replies
2
Views
3K
Replies
6
Views
13K
Replies
2
Views
2K
  • Last Post
Replies
4
Views
2K
Replies
1
Views
379
  • Last Post
Replies
3
Views
1K
  • Last Post
Replies
5
Views
6K
Top