Stratified Random Sampling vs Simple Random Sampling

  • Context: Undergrad 
  • Thread starter Thread starter jaumzaum
  • Start date Start date
  • Tags Tags
    Random Sampling
Click For Summary

Discussion Overview

The discussion revolves around the concepts of stratified random sampling and simple random sampling, particularly focusing on the variance formula used in stratified sampling. Participants explore the implications of the formula and clarify misunderstandings related to variance calculations in the context of sampling from different populations.

Discussion Character

  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant expresses confusion regarding the variance formula for stratified random sampling, questioning their understanding of how the variance is calculated when sampling from multiple cities.
  • Another participant points out a potential dimensional inconsistency in the formula presented by the first participant.
  • A later reply elaborates on the relationship between independent identically distributed random variables and their variances, suggesting that the misunderstanding may stem from how the results of multiple samples are interpreted.
  • There is a discussion about the notation used in the Wikipedia article on stratified sampling, with one participant arguing that it lacks clarity regarding the random variables involved in the sampling process.
  • References to external resources are provided to support the discussion, including a link to a Wikipedia article and a specific textbook that may clarify the concepts further.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the correct interpretation of the variance formula or the clarity of the Wikipedia article. Multiple competing views remain regarding the definitions and implications of the terms used in stratified sampling.

Contextual Notes

There are limitations in the discussion regarding the assumptions made about the random variables and the definitions used in the variance calculations. The ambiguity in notation and the relationship between different sampling procedures are also noted.

jaumzaum
Messages
433
Reaction score
33
I'm beginning to learn statistics and I quite didn't understand the formula for the stratified random sampling.

Let's say we have a country with 3000 people, divided into 3 cities containing 1000 people. We want to know the proportion of women in the whole country so we decide to take a sample of 30 people, 10 from each town. Suppose each town has the same sample mean and variance.

Using the formula for the variance of stratified random sampling:
$$s^2 = \sum_{h=1}^{3} (\frac {N_h}{N})^2 \frac {s_h^2}{n_h}=\sum_{h=1}^{3} (\frac {100}{300})^2 \frac {s_h^2}{10}=\frac{s_h^2} {30} $$
But we know that when we have 3 equal samples os variance ##s_h^2## the resulting sample has ##s_h^2/3##

So what was my mistake?
 
  • Like
Likes   Reactions: Stephen Tashi
Physics news on Phys.org
The formula looks funny. The term ##\frac{s_h^2}{n_h}## is dimensionally inconsistent with ##s^2##.
 
This is a very good question.

jaumzaum said:
But we know that when we have 3 equal samples os variance ##s_h^2## the resulting sample has ##s_h^2/3##

A sample is a set of observations Several samples "result" in several sets of observations, but this doesn't define how a particular numerical value results from those sets of observations.

Perhaps you are thinking of the case where we have 3 independent identically distributed random variables ##X_1,X_2,X_3##, each with variance ##{\sigma}^2##. If we define "the result" of ##X_1,X_2,X_3## to be the random variable ##Y = (1/3)X_1 + (1/3)X_2 + (1/3)X_3## then ##\sigma^2_Y = (1/9)\sigma^2 + (1/9)\sigma^2 + (1/9) \sigma^2 = (1/3) \sigma ^2##.

In the link you gave, the notation ##"{s_h}^2"## does not represent the ##\sigma^2## in the previous paragraph. A random sample of size 1 taken from the entire population of City h has variance ##{s_h}^2##. The random variable consisting of the mean of the sum of 10 such samples has variance ##{s_h}^2/10##.

The Wikipedia article is not well written. It begins by referring to "The mean and variance of stratified random sampling". However "mean" and "variance" are quantities that are defined for random variables. It only makes sense to speak of the mean and variance of a sampling procedure, if there is a universal understanding that the sampling procedure uses a particular random variable. I haven't done a wide survey of articles on stratified sampling, but I did find https://jkim.public.iastate.edu/teaching/book5.pdf
In example 5.1 of that text, the random variable associated with stratified sampling is, in the notation of the Wikipedia article, ##\hat{x} = \sum_{h=1}^L N_h \hat{ \bar{x}_h}## while the Wikipedia article apparently deals with the random variable ##\hat{\bar x} = (1/N) \hat{x}##. So, with respect to the including or omitting the constant ##1/N##, there is ambiguity about what random variable is associated with stratified sampling.

In the link I gave, you can alter "book5" to "book3" or "book4" to get information about the abbreviations used in "book5". (e.g. "SRS" for simple random sampling and "HT" for Hurwitz-Thompson )
 
Last edited:
  • Like
Likes   Reactions: FactChecker

Similar threads

  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
794
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 7 ·
Replies
7
Views
6K
  • · Replies 3 ·
Replies
3
Views
3K