# Pick k (assume k < n) items from the list

by bob j
Tags: permutation, random, shuffling
 P: 22 I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading? thanks
P: 115
 Quote by bob j I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading? thanks
I don't understand how you can have a list of n items and not know n.

Besides:
Are you picking a subset from a set?
Set from a list?
List from a set?
List from a list?

I think there is a difference.
 P: 22 Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...
P: 115
Pick k (assume k < n) items from the list

 Quote by bob j Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...
I think the best you can do is to pick k random distinct integers from {0,...,m} where m is an arbitary number bigger than k. Order the k integers in a list L.

During your one pass through, process the i-th line if i is an element in L, skip the rest.

Obviously you will not be able to read k lines if the actual value of n > m. But this is the best I can think of.
HW Helper
P: 3,684
I assume you mean k/n, not 1/n.

 Quote by bob j I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?
Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.
P: 115
 Quote by CRGreathouse I assume you mean k/n, not 1/n. Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.
What happens if k > n? That to me is the problem...
HW Helper
P: 3,684
 Quote by ych22 What happens if k > n? That to me is the problem...
Since the OP said

 Quote by bob j (assume k < n)
P: 115
 Quote by CRGreathouse I'm not worried about this case. My program can cause nasal demons if that happens.
Oh, fair enough. Then my algorithm should work too.

However this k<n assumption is still pretty suspicious. How can one claim not to know n yet make the claim that k<n?
HW Helper
P: 3,684
 Quote by ych22 Oh, fair enough. Then my algorithm should work too.
Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it.

 Quote by ych22 However this k
Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.
P: 115
 Quote by CRGreathouse Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it. Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.
Oh oops. You are right on both counts!

 Related Discussions Programming & Computer Science 1 Forum Feedback & Announcements 36