Estimating group sample size

In summary, to estimate the average size of clans in a country, a survey of 1000 randomly selected people is conducted. To compute this estimate, a weighted average approach can be used by assigning a weight of 1/n to each respondent where n is the reported size of their clan. However, this approach may not be accurate if there are multiple clans of the same size or if the respondents' answers are not exact. To overcome these challenges, the total population of the country can be used to determine the number of clans of each reported size, and then the average clan size can be calculated by dividing the total population by the number of clans.
  • #1
zut837
4
0
I'm working on this puzzle:

The people in a country are partitioned into clans. In order to estimate the average size of a clan, a survey is conducted where 1000 randomly selected people are asked to state the size of the clan to which they belong. How does one compute an estimate average clan size from the data collected?

And am a bit stuck. If you take a pure average of the surveyed people you will overestimate the group size because you will have more representatives from the larger clans. Thus each sampled variable needs to be downweighted in some way -- to factor out multiple samples from the same clan.

Any ideas?
 
Physics news on Phys.org
  • #2
1. Create a histogram and group the results together to whatever degree of accuracy you want (i.e. lump all people who answered 100 plus/minus 10 together into the "100" bin

2. Now if you plot this and squint you basically have a function f(x) where f is the number of clans and x is the size of the clan. f(x) is essentially the probability of finding a clan of size x.

3. Now we want to calculate the "expectation value" of x, or the average value of the size of the clan. This is usually done with an integrable function like this

[tex]\int^{\infty}_{-\infty} x * f(x) dx[/tex]

Since we don't have a continuous function, you can use the summation:

[tex]\sum x P(x)[/tex]

Where P(x) is the probability of finding a clan of size x. Then you sum over all x's.
 
  • #3
You can just weight each respondent as 1/n, where n is the group size reported.
 
  • #4
with respect to the first response, how is that not the same as a weighted average,
then, since your sample will have more members of larger clans you will still be overestimating the frequency of the larger clans,

for the second, if you weight each response n by 1/n aren't you simple just scaling all the responses to 1, I'm not 100% sure I understand your approach, could you try to be a bit more explicit?
 
  • #5
zut837 said:
with respect to the first response, how is that not the same as a weighted average,
then, since your sample will have more members of larger clans you will still be overestimating the frequency of the larger clans,

for the second, if you weight each response n by 1/n aren't you simple just scaling all the responses to 1, I'm not 100% sure I understand your approach, could you try to be a bit more explicit?

If I understand this correctly you have two numbers, n and N.. For the number of people in a given clan you have a label n(i) in terms of sampling order for each member i of the clan. There are j clans so every individual in the sample space can be labeled n(i,j). Consider an array of i rows and j columns. What is a summation over columns, over rows and over the whole array?

Note: In a true random sample, every individual will have an equal probability of being selected, so the proportional size of the clans in the sample will approach the true proportions as N grows large.
 
Last edited:
  • #6
A weighted average (i.e. the proportion of respondents per response multiplied by the magnitude of the respone, where the "response" is size of the respondent's clan) would absolutely work, given two assumptions:
1)No two clans are the same size.
2)Answers are 100% accurate, not estimated/rounded.

Difficulties may arise if either of the preceding two conditions are violated. Consider the following: of 3 persons surveyed, 2 respond that they come from a clan with 100 total members, while the remaining person responds that his/her clan is comprised of 50. Now, the first 2 individuals may or may not be from the same clan, and neither alternative is outside the realm of statistical possibility. Consequently, average clan size could be either (100+100+50)/3 = 83.33 ppl or (100+50)/2 = 75 ppl. We simply don't know... UNLESS we know the total population of the country. If the country has only 150 people, we know our second calculation must be correct.

Similarly, we must know the total population in order to determine whether there are multiple clans of identical size, which would otherwise obfuscate our calculations. Given a sample size of 1000, the survey should "even itself out," and the percentage of respondents from a given clan should be roughly equivalent to the percentage of the national population that is composed of members from that clan. That is: R/1000 = S/T, where R is the number of respondents from clan A, S is the size of clan A, and T is the total population of the country. Given this information, we can determine the number of clans of a given size by dividing (R/1000)/(S/T), where R is the subset of respondents claiming to have an identical clan size. If 1/4th the respondents come from a clan that supposedly constitutes 1/12th the population, this is a good indication that there are 3 clans of this size. Depending on the number of different reported sizes, this may be cumbersome to do by hand, but it should give you the right answer.
# CLANS = (R1/1000)/(S1/T) + (R2/1000)/(S2/T)+ (R3/1000)/(S3/T) + ...
Once you have the number of clans, you can divide the total population by number of clans to get the average size!
PPL/CLAN = T/(# CLANS)
 

What is group sample size estimation?

Group sample size estimation is a statistical process used to determine the number of individuals that need to be included in a research study or experiment in order to obtain reliable and meaningful results.

Why is group sample size estimation important?

Estimating the appropriate sample size is important because it ensures that the study's results are accurate and representative of the population being studied. A sample size that is too small may lead to biased or inconclusive results, while a sample size that is too large may be a waste of resources.

What factors should be considered when estimating group sample size?

Factors that should be considered when estimating group sample size include the desired level of precision, the expected effect size, the variability of the data, and the desired level of confidence in the results. Other factors may include the research design, the type of analysis being performed, and the resources available for the study.

How is group sample size estimated?

Group sample size can be estimated using statistical power analysis, which takes into account the factors mentioned above to determine the minimum sample size needed to detect a significant effect. There are also online calculators and software programs available to help with sample size estimation.

Can group sample size be adjusted during the course of a study?

Yes, group sample size can be adjusted during a study if necessary. This may be done if the initial sample size was too small or too large, or if there are unforeseen circumstances that require a change in the sample size. However, it is important to carefully consider the potential effects of changing the sample size and to justify any adjustments made.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
442
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
419
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
866
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
651
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
5K
Back
Top