Pick k (assume k < n) items from the list

  • Thread starter Thread starter bob j
  • Start date Start date
  • Tags Tags
    List
bob j
Messages
22
Reaction score
0
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

thanks
 
Physics news on Phys.org


bob j said:
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

thanks

I don't understand how you can have a list of n items and not know n.

Besides:
Are you picking a subset from a set?
Set from a list?
List from a set?
List from a list?

I think there is a difference.
 


Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...
 


bob j said:
Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...

I think the best you can do is to pick k random distinct integers from {0,...,m} where m is an arbitary number bigger than k. Order the k integers in a list L.

During your one pass through, process the i-th line if i is an element in L, skip the rest.

Obviously you will not be able to read k lines if the actual value of n > m. But this is the best I can think of.
 


I assume you mean k/n, not 1/n.

bob j said:
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.
 


CRGreathouse said:
I assume you mean k/n, not 1/n.



Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.

What happens if k > n? That to me is the problem...
 


ych22 said:
What happens if k > n? That to me is the problem...

Since the OP said

bob j said:
(assume k < n)

I'm not worried about this case. My program can cause nasal demons if that happens.
 


CRGreathouse said:
I'm not worried about this case. My program can cause nasal demons if that happens.

Oh, fair enough. Then my algorithm should work too.

However this k<n assumption is still pretty suspicious. How can one claim not to know n yet make the claim that k<n? :eek:
 


ych22 said:
Oh, fair enough. Then my algorithm should work too.

Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it.

ych22 said:
However this k<n assumption is still pretty suspicious. How can one claim not to know n yet make the claim that k<n? :eek:

Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.
 
  • #10


CRGreathouse said:
Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it.



Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.

Oh oops. You are right on both counts!
 

Similar threads

Replies
8
Views
1K
Replies
3
Views
1K
Replies
2
Views
2K
Replies
2
Views
2K
Replies
4
Views
2K
Back
Top