I Stratified Random Sampling vs Simple Random Sampling

jaumzaum · May 15, 2019

I'm beginning to learn statistics and I quite didn't understand the formula for the stratified random sampling.

Let's say we have a country with 3000 people, divided into 3 cities containing 1000 people. We want to know the proportion of women in the whole country so we decide to take a sample of 30 people, 10 from each town. Suppose each town has the same sample mean and variance.

Using the formula for the variance of stratified random sampling:
$$s^2 = \sum_{h=1}^{3} (\frac {N_h}{N})^2 \frac {s_h^2}{n_h}=\sum_{h=1}^{3} (\frac {100}{300})^2 \frac {s_h^2}{10}=\frac{s_h^2} {30} $$
But we know that when we have 3 equal samples os variance ##s_h^2## the resulting sample has ##s_h^2/3##

So what was my mistake?

mathman · May 15, 2019

The formula looks funny. The term ##\frac{s_h^2}{n_h}## is dimensionally inconsistent with ##s^2##.

jaumzaum · May 15, 2019

https://en.m.wikipedia.org/wiki/Stratified_sampling

Stephen Tashi · May 16, 2019

This is a very good question.

jaumzaum said:

But we know that when we have 3 equal samples os variance ##s_h^2## the resulting sample has ##s_h^2/3##

A sample is a set of observations Several samples "result" in several sets of observations, but this doesn't define how a particular numerical value results from those sets of observations.

Perhaps you are thinking of the case where we have 3 independent identically distributed random variables ##X_1,X_2,X_3##, each with variance ##{\sigma}^2##. If we define "the result" of ##X_1,X_2,X_3## to be the random variable ##Y = (1/3)X_1 + (1/3)X_2 + (1/3)X_3## then ##\sigma^2_Y = (1/9)\sigma^2 + (1/9)\sigma^2 + (1/9) \sigma^2 = (1/3) \sigma ^2##.

In the link you gave, the notation ##"{s_h}^2"## does not represent the ##\sigma^2## in the previous paragraph. A random sample of size 1 taken from the entire population of City h has variance ##{s_h}^2##. The random variable consisting of the mean of the sum of 10 such samples has variance ##{s_h}^2/10##.

The Wikipedia article is not well written. It begins by referring to "The mean and variance of stratified random sampling". However "mean" and "variance" are quantities that are defined for random variables. It only makes sense to speak of the mean and variance of a sampling procedure, if there is a universal understanding that the sampling procedure uses a particular random variable. I haven't done a wide survey of articles on stratified sampling, but I did find https://jkim.public.iastate.edu/teaching/book5.pdf
In example 5.1 of that text, the random variable associated with stratified sampling is, in the notation of the Wikipedia article, ##\hat{x} = \sum_{h=1}^L N_h \hat{ \bar{x}_h}## while the Wikipedia article apparently deals with the random variable ##\hat{\bar x} = (1/N) \hat{x}##. So, with respect to the including or omitting the constant ##1/N##, there is ambiguity about what random variable is associated with stratified sampling.

In the link I gave, you can alter "book5" to "book3" or "book4" to get information about the abbreviations used in "book5". (e.g. "SRS" for simple random sampling and "HT" for Hurwitz-Thompson )

I Stratified Random Sampling vs Simple Random Sampling

Thread 'Deductive proof in logic formal systems'

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I Stochastic calculus: Ito's lemma and differentials

I Help me understand skewness in QQ-plots please

I Intransitive implication

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem