Is this a valid method for selecting a Simple Random Sample?

  • Context: MHB 
  • Thread starter Thread starter Ackbach
  • Start date Start date
  • Tags Tags
    Random
Click For Summary
SUMMARY

The discussion centers on the validity of a student's method for selecting a Simple Random Sample (SRS) of apartment complexes using random digits from Table D. The student’s approach, which involves arithmetic operations on the digits, introduces bias, resulting in certain samples being more likely than others. Participants concluded that the method does not meet the criteria for SRS, as repeating the sampling process yields the same results, violating the principle of equal probability for all elements. A simulation using LibreOffice Calc confirmed the non-uniform distribution of results, further supporting the claim that the method is flawed.

PREREQUISITES
  • Understanding of Simple Random Sampling (SRS)
  • Familiarity with random number tables, specifically Table D
  • Basic arithmetic operations and their implications in sampling
  • Experience with statistical software, such as LibreOffice Calc
NEXT STEPS
  • Study the principles of Simple Random Sampling (SRS) and its requirements
  • Learn how to use random number tables effectively for sampling
  • Explore statistical simulation techniques using LibreOffice Calc or similar tools
  • Investigate potential biases in sampling methods and how to identify them
USEFUL FOR

Statisticians, data analysts, educators in statistics, and anyone involved in sampling methodologies who seeks to understand the implications of sampling methods on data integrity.

Ackbach
Gold Member
MHB
Messages
4,148
Reaction score
94
So, I have a jokester (MHB user Cmoney) in my class (what teacher doesn't?), who decided to go all-out on a quiz question. The question reads as follows:

You are planning a report on apartment living in a college town. You decide to select three apartment complexes at random for in-depth interviews with residents.

(a) Explain how you would use a line of Table D to choose an SRS (Simple Random Sample) of 3 complexes from the list below. Explain your method clearly enough for a classmate to obtain your results.

(b) Use line 117 to select the sample. Show how you use each of the digits.

Now Table D is a table of random digits as follows:

\begin{array}{cllllllll}
{\bf Line} \\
116 &14459 &26056 &31424 &80371 &65103 &62253 &50490 &61181 \\
117 &38167 &98532 &62183 &70632 &23417 &26185 &41448 &75532 \\
118 &73190 &32533 &04470 &29669 &84407 &90785 &65956 &86382
\end{array}

The apartment complex listing has 33 names in it - that's all that's really important.

For part (a), my student's answer is as follows:

First, I would obtain the second digit of every group in lines 116-118 (4,6,1,0,5,2,0,1,8,8,2,0,3,6,1,5,3,2,4,9,4,0,5,6). Second, split them into pairs: (46,10,52,01,88,20,36,15,32,49,40,56). Third, out of 33 apartments, labeled 1-33, take the first pair and last and subtract, then take the next two and subtract and so forth until you get three. (10,30,12). Fourth, the ones that were chosen were: (and he gives the three apartment complexes).

My question: is this truly an SRS, or did he inadvertently introduce a process that makes certain samples less likely than others (for example, is some intermediate number restricted to be smaller than a certain amount)?

For part (b), my student's answer is as follows:

Line 117: (38,16,79,85,32,62,18,37,06,32,23,41,72,61,85,41,44,87,55,32).
Then add each one [edit: it looks as though he did it digit-wise]: (11,7,16,13,5,8,9,10,6,5,5,5,9,7,13,5,8,15,10,5).
Subtract with the one to the right: (4,3,3,1,1,0,2,8,7,5).
Add: (7,4,1,10,12).
Subtract: (3,9,12)
Add: (12,12)
Add: 24, which is a particular apartment.

He stops here, so he doesn't attain the full sample of three complexes. I know there are steps here which are suspect - the very first one has a max of 18. And are each of the possible samples equally likely?

Thanks!
 
Physics news on Phys.org
I would not call this simple random sampling for the simple reason that if you repeat the sampling process on the same data you will get the exact same apartment each time. That would not occur with a proper random sample. If the goal is to take one element from a list of size N, then each element should have the probability 1/N of being selected. Using your student's method, once the list is made, one particular element has a 100% chance of being selected and the others have 0% no matter how many times the sampling is repeated.
 
Well, I think the idea is that if you did the sampling again, you'd use a different row of the table of random numbers. I'm not worried about the table of random numbers. If you like, imagine those numbers to have come from a pseudo-random-number-generator. I'm worried about all the arithmetic and (what I would call) shenanigans that my student is doing. Is the arithmetic he's using inadvertently making some samples less likely than others?
 
I think I get your question, but just want to point out that even if you won't get the same data each round, the fact that repeating the process on the same data always gives the same answer is bad. It's the complete opposite of random.

You are asking, I think, if his process is somehow inherently biased over some other similar method that would not be biased. I just don't think this is a standard way of sampling but nevertheless - one way to test that would be to notice a pattern, but I'm lazier than that and would test it by coding the algorithm and running it on a huge number of 5 digit numbers to see what I get.

That's all I have to weigh in on. Maybe someone else can quickly spot a pattern.
 
Wow, seems like you have a genius on your hands there, Ackbach.

I personally would create a spreadsheet, then run the data in a histogram. Try that out, and see if you get the results that you are looking for.

Good luck
 
Well, I constructed a LibreOffice Calc spreadsheet to simulate this method of sampling. I did a histogram of the resulting numbers (over 200 of them), and there was a definite pattern. The five-number summary was {3, 14.5, 20, 24, 41}. The mean was 20.1, and the standard deviation was 7.7. The histogram was unimodal and symmetric, with a definite peak near 21. There were no outliers or gaps.

Perhaps the most important feature lacking: the histogram was by no means flat, as you'd expect from a uniform distribution. Therefore, I conclude that this sampling method would not produce a Simple Random Sample.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
3K
Replies
4
Views
4K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 67 ·
3
Replies
67
Views
16K
  • · Replies 125 ·
5
Replies
125
Views
20K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 11 ·
Replies
11
Views
33K
  • · Replies 128 ·
5
Replies
128
Views
34K