Stratified Random Sampling vs Simple Random Sampling

In summary, The formula for the variance of stratified random sampling is s^2 = sum_{h=1}^{3} (\frac {N_h}{N})^2 \frac {s_h^2}{n_h}=\sum_{h=1}^{3} (\frac {100}{300})^2 \frac {s_h^2}{10}=\frac{s_h^2} {30}.
  • #1
jaumzaum
434
33
I'm beginning to learn statistics and I quite didn't understand the formula for the stratified random sampling.

Let's say we have a country with 3000 people, divided into 3 cities containing 1000 people. We want to know the proportion of women in the whole country so we decide to take a sample of 30 people, 10 from each town. Suppose each town has the same sample mean and variance.

Using the formula for the variance of stratified random sampling:
$$s^2 = \sum_{h=1}^{3} (\frac {N_h}{N})^2 \frac {s_h^2}{n_h}=\sum_{h=1}^{3} (\frac {100}{300})^2 \frac {s_h^2}{10}=\frac{s_h^2} {30} $$
But we know that when we have 3 equal samples os variance ##s_h^2## the resulting sample has ##s_h^2/3##

So what was my mistake?
 
  • Like
Likes Stephen Tashi
Physics news on Phys.org
  • #2
The formula looks funny. The term ##\frac{s_h^2}{n_h}## is dimensionally inconsistent with ##s^2##.
 
  • #4
This is a very good question.

jaumzaum said:
But we know that when we have 3 equal samples os variance ##s_h^2## the resulting sample has ##s_h^2/3##

A sample is a set of observations Several samples "result" in several sets of observations, but this doesn't define how a particular numerical value results from those sets of observations.

Perhaps you are thinking of the case where we have 3 independent identically distributed random variables ##X_1,X_2,X_3##, each with variance ##{\sigma}^2##. If we define "the result" of ##X_1,X_2,X_3## to be the random variable ##Y = (1/3)X_1 + (1/3)X_2 + (1/3)X_3## then ##\sigma^2_Y = (1/9)\sigma^2 + (1/9)\sigma^2 + (1/9) \sigma^2 = (1/3) \sigma ^2##.

In the link you gave, the notation ##"{s_h}^2"## does not represent the ##\sigma^2## in the previous paragraph. A random sample of size 1 taken from the entire population of City h has variance ##{s_h}^2##. The random variable consisting of the mean of the sum of 10 such samples has variance ##{s_h}^2/10##.

The Wikipedia article is not well written. It begins by referring to "The mean and variance of stratified random sampling". However "mean" and "variance" are quantities that are defined for random variables. It only makes sense to speak of the mean and variance of a sampling procedure, if there is a universal understanding that the sampling procedure uses a particular random variable. I haven't done a wide survey of articles on stratified sampling, but I did find https://jkim.public.iastate.edu/teaching/book5.pdf
In example 5.1 of that text, the random variable associated with stratified sampling is, in the notation of the Wikipedia article, ##\hat{x} = \sum_{h=1}^L N_h \hat{ \bar{x}_h}## while the Wikipedia article apparently deals with the random variable ##\hat{\bar x} = (1/N) \hat{x}##. So, with respect to the including or omitting the constant ##1/N##, there is ambiguity about what random variable is associated with stratified sampling.

In the link I gave, you can alter "book5" to "book3" or "book4" to get information about the abbreviations used in "book5". (e.g. "SRS" for simple random sampling and "HT" for Hurwitz-Thompson )
 
Last edited:
  • Like
Likes FactChecker

1. What is the difference between stratified random sampling and simple random sampling?

Stratified random sampling is a sampling technique where the population is divided into subgroups or strata, and then a simple random sample is taken from each stratum. Simple random sampling, on the other hand, is a sampling technique where each individual in the population has an equal chance of being selected for the sample.

2. Which sampling method is more representative of the population?

Stratified random sampling is generally considered to be more representative of the population because it ensures that each subgroup or stratum is represented in the sample. This can help to reduce bias and provide a more accurate representation of the population as a whole.

3. When should stratified random sampling be used?

Stratified random sampling is typically used when the population is heterogeneous, meaning there are distinct subgroups within the population that may have different characteristics. This sampling method is useful for ensuring that each subgroup is represented in the sample and can provide more accurate results for each subgroup.

4. What are the advantages of simple random sampling?

The main advantage of simple random sampling is its simplicity. It is easy to understand and implement, making it a quick and efficient sampling method. It also ensures that each individual in the population has an equal chance of being selected, which can help to reduce bias.

5. Are there any drawbacks to using stratified random sampling?

One potential drawback of stratified random sampling is that it can be more time-consuming and complex to implement compared to simple random sampling. It also requires prior knowledge of the population and its subgroups, which may not always be available. Additionally, if the subgroups are not well-defined, it may be difficult to accurately divide the population into strata.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
768
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
918
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
719
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
933
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
840
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
Back
Top