Stratified Random Sampling vs Simple Random Sampling

jaumzaum · May 15, 2019

I'm beginning to learn statistics and I quite didn't understand the formula for the stratified random sampling.

Let's say we have a country with 3000 people, divided into 3 cities containing 1000 people. We want to know the proportion of women in the whole country so we decide to take a sample of 30 people, 10 from each town. Suppose each town has the same sample mean and variance.

Using the formula for the variance of stratified random sampling:
$$s^2 = \sum_{h=1}^{3} (\frac {N_h}{N})^2 \frac {s_h^2}{n_h}=\sum_{h=1}^{3} (\frac {100}{300})^2 \frac {s_h^2}{10}=\frac{s_h^2} {30} $$
But we know that when we have 3 equal samples os variance ##s_h^2## the resulting sample has ##s_h^2/3##

So what was my mistake?

mathman · May 15, 2019

The formula looks funny. The term ##\frac{s_h^2}{n_h}## is dimensionally inconsistent with ##s^2##.

jaumzaum · May 15, 2019

https://en.m.wikipedia.org/wiki/Stratified_sampling

Stephen Tashi · May 16, 2019

This is a very good question.

jaumzaum said:

But we know that when we have 3 equal samples os variance ##s_h^2## the resulting sample has ##s_h^2/3##

A sample is a set of observations Several samples "result" in several sets of observations, but this doesn't define how a particular numerical value results from those sets of observations.

Perhaps you are thinking of the case where we have 3 independent identically distributed random variables ##X_1,X_2,X_3##, each with variance ##{\sigma}^2##. If we define "the result" of ##X_1,X_2,X_3## to be the random variable ##Y = (1/3)X_1 + (1/3)X_2 + (1/3)X_3## then ##\sigma^2_Y = (1/9)\sigma^2 + (1/9)\sigma^2 + (1/9) \sigma^2 = (1/3) \sigma ^2##.

In the link you gave, the notation ##"{s_h}^2"## does not represent the ##\sigma^2## in the previous paragraph. A random sample of size 1 taken from the entire population of City h has variance ##{s_h}^2##. The random variable consisting of the mean of the sum of 10 such samples has variance ##{s_h}^2/10##.

The Wikipedia article is not well written. It begins by referring to "The mean and variance of stratified random sampling". However "mean" and "variance" are quantities that are defined for random variables. It only makes sense to speak of the mean and variance of a sampling procedure, if there is a universal understanding that the sampling procedure uses a particular random variable. I haven't done a wide survey of articles on stratified sampling, but I did find https://jkim.public.iastate.edu/teaching/book5.pdf
In example 5.1 of that text, the random variable associated with stratified sampling is, in the notation of the Wikipedia article, ##\hat{x} = \sum_{h=1}^L N_h \hat{ \bar{x}_h}## while the Wikipedia article apparently deals with the random variable ##\hat{\bar x} = (1/N) \hat{x}##. So, with respect to the including or omitting the constant ##1/N##, there is ambiguity about what random variable is associated with stratified sampling.

In the link I gave, you can alter "book5" to "book3" or "book4" to get information about the abbreviations used in "book5". (e.g. "SRS" for simple random sampling and "HT" for Hurwitz-Thompson )

Stratified Random Sampling vs Simple Random Sampling

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect