# I Stratified Random Sampling vs Simple Random Sampling

#### jaumzaum

I'm beginning to learn statistics and I quite didn't understand the formula for the stratified random sampling.

Let's say we have a country with 3000 people, divided into 3 cities containing 1000 people. We want to know the proportion of women in the whole country so we decide to take a sample of 30 people, 10 from each town. Suppose each town has the same sample mean and variance.

Using the formula for the variance of stratified random sampling:
$$s^2 = \sum_{h=1}^{3} (\frac {N_h}{N})^2 \frac {s_h^2}{n_h}=\sum_{h=1}^{3} (\frac {100}{300})^2 \frac {s_h^2}{10}=\frac{s_h^2} {30}$$
But we know that when we have 3 equal samples os variance $s_h^2$ the resulting sample has $s_h^2/3$

So what was my mistake?

Related Set Theory, Logic, Probability, Statistics News on Phys.org

#### mathman

The formula looks funny. The term $\frac{s_h^2}{n_h}$ is dimensionally inconsistent with $s^2$.

#### Stephen Tashi

This is a very good question.

But we know that when we have 3 equal samples os variance $s_h^2$ the resulting sample has $s_h^2/3$
A sample is a set of observations Several samples "result" in several sets of observations, but this doesn't define how a particular numerical value results from those sets of observations.

Perhaps you are thinking of the case where we have 3 independent identically distributed random variables $X_1,X_2,X_3$, each with variance ${\sigma}^2$. If we define "the result" of $X_1,X_2,X_3$ to be the random variable $Y = (1/3)X_1 + (1/3)X_2 + (1/3)X_3$ then $\sigma^2_Y = (1/9)\sigma^2 + (1/9)\sigma^2 + (1/9) \sigma^2 = (1/3) \sigma ^2$.

In the link you gave, the notation $"{s_h}^2"$ does not represent the $\sigma^2$ in the previous paragraph. A random sample of size 1 taken from the entire population of City h has variance ${s_h}^2$. The random variable consisting of the mean of the sum of 10 such samples has variance ${s_h}^2/10$.

The Wikipedia article is not well written. It begins by referring to "The mean and variance of stratified random sampling". However "mean" and "variance" are quantities that are defined for random variables. It only makes sense to speak of the mean and variance of a sampling procedure, if there is a universal understanding that the sampling procedure uses a particular random variable. I haven't done a wide survey of articles on stratified sampling, but I did find https://jkim.public.iastate.edu/teaching/book5.pdf
In example 5.1 of that text, the random variable associated with stratified sampling is, in the notation of the Wikipedia article, $\hat{x} = \sum_{h=1}^L N_h \hat{ \bar{x}_h}$ while the Wikipedia article apparently deals with the random variable $\hat{\bar x} = (1/N) \hat{x}$. So, with respect to the including or omitting the constant $1/N$, there is ambiguity about what random variable is associated with stratified sampling.

In the link I gave, you can alter "book5" to "book3" or "book4" to get information about the abbreviations used in "book5". (e.g. "SRS" for simple random sampling and "HT" for Hurwitz-Thompson )

Last edited:

"Stratified Random Sampling vs Simple Random Sampling"

### Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving