pick k (assume k < n) items from the list


by bob j
Tags: permutation, random, shuffling
bob j
bob j is offline
#1
Nov26-10, 08:27 PM
P: 22
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

thanks
Phys.Org News Partner Science news on Phys.org
SensaBubble: It's a bubble, but not as we know it (w/ video)
The hemihelix: Scientists discover a new shape using rubber bands (w/ video)
Microbes provide insights into evolution of human language
ych22
ych22 is offline
#2
Nov26-10, 10:47 PM
P: 115
Quote Quote by bob j View Post
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

thanks
I don't understand how you can have a list of n items and not know n.

Besides:
Are you picking a subset from a set?
Set from a list?
List from a set?
List from a list?

I think there is a difference.
bob j
bob j is offline
#3
Nov26-10, 11:04 PM
P: 22
Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...

ych22
ych22 is offline
#4
Nov26-10, 11:19 PM
P: 115

pick k (assume k < n) items from the list


Quote Quote by bob j View Post
Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...
I think the best you can do is to pick k random distinct integers from {0,...,m} where m is an arbitary number bigger than k. Order the k integers in a list L.

During your one pass through, process the i-th line if i is an element in L, skip the rest.

Obviously you will not be able to read k lines if the actual value of n > m. But this is the best I can think of.
CRGreathouse
CRGreathouse is offline
#5
Nov26-10, 11:35 PM
Sci Advisor
HW Helper
P: 3,680
I assume you mean k/n, not 1/n.

Quote Quote by bob j View Post
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?
Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.
ych22
ych22 is offline
#6
Nov26-10, 11:46 PM
P: 115
Quote Quote by CRGreathouse View Post
I assume you mean k/n, not 1/n.



Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.
What happens if k > n? That to me is the problem...
CRGreathouse
CRGreathouse is offline
#7
Nov26-10, 11:48 PM
Sci Advisor
HW Helper
P: 3,680
Quote Quote by ych22 View Post
What happens if k > n? That to me is the problem...
Since the OP said

Quote Quote by bob j View Post
(assume k < n)
I'm not worried about this case. My program can cause nasal demons if that happens.
ych22
ych22 is offline
#8
Nov27-10, 12:05 AM
P: 115
Quote Quote by CRGreathouse View Post

I'm not worried about this case. My program can cause nasal demons if that happens.
Oh, fair enough. Then my algorithm should work too.

However this k<n assumption is still pretty suspicious. How can one claim not to know n yet make the claim that k<n?
CRGreathouse
CRGreathouse is offline
#9
Nov27-10, 12:59 AM
Sci Advisor
HW Helper
P: 3,680
Quote Quote by ych22 View Post
Oh, fair enough. Then my algorithm should work too.
Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it.

Quote Quote by ych22 View Post
However this k<n assumption is still pretty suspicious. How can one claim not to know n yet make the claim that k<n?
Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.
ych22
ych22 is offline
#10
Nov27-10, 02:45 AM
P: 115
Quote Quote by CRGreathouse View Post
Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it.



Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.
Oh oops. You are right on both counts!


Register to reply

Related Discussions
array shuffling in fortran Programming & Computer Science 1
More Staff Shuffling Forum Feedback & Announcements 36