Pick k (assume k < n) items from the list

  • Context: Undergrad 
  • Thread starter Thread starter bob j
  • Start date Start date
  • Tags Tags
    List
Click For Summary

Discussion Overview

The discussion revolves around the problem of selecting k items from a list of n items, where k is less than n, without knowing the value of n in advance. Participants explore methods for achieving this selection in a single pass through the data, considering both theoretical and practical implications.

Discussion Character

  • Exploratory, Technical explanation, Debate/contested, Mathematical reasoning

Main Points Raised

  • Some participants question the feasibility of selecting items from a list without knowing the total number of items, n, and express confusion about the initial premise.
  • A participant suggests a method involving picking k random distinct integers from a range and processing lines based on those integers, while acknowledging limitations if n exceeds a certain arbitrary number.
  • Another participant proposes a method where the first k items are stored, and subsequent items are selected with a probability of k/x, raising concerns about efficiency.
  • Some participants express skepticism about the assumption that k is less than n while simultaneously claiming not to know n, suggesting that there may be a range for n that is not explicitly defined.
  • There is a discussion about the implications of k being greater than n, with some participants dismissing this scenario based on the original assumption.
  • One participant believes their method achieves the desired probability of selection, while another participant questions the validity of this claim without proof.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the best method for selecting k items or the implications of the assumptions made about n and k. Multiple competing views and methods are presented, and some skepticism remains regarding the initial conditions of the problem.

Contextual Notes

Participants highlight the limitations of their proposed methods, including the dependency on assumptions about n and the efficiency of the algorithms discussed. There is also an acknowledgment of the unresolved nature of the probability claims made by different participants.

bob j
Messages
22
Reaction score
0
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

thanks
 
Physics news on Phys.org


bob j said:
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

thanks

I don't understand how you can have a list of n items and not know n.

Besides:
Are you picking a subset from a set?
Set from a list?
List from a set?
List from a list?

I think there is a difference.
 


Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...
 


bob j said:
Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...

I think the best you can do is to pick k random distinct integers from {0,...,m} where m is an arbitary number bigger than k. Order the k integers in a list L.

During your one pass through, process the i-th line if i is an element in L, skip the rest.

Obviously you will not be able to read k lines if the actual value of n > m. But this is the best I can think of.
 


I assume you mean k/n, not 1/n.

bob j said:
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.
 


CRGreathouse said:
I assume you mean k/n, not 1/n.



Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.

What happens if k > n? That to me is the problem...
 


ych22 said:
What happens if k > n? That to me is the problem...

Since the OP said

bob j said:
(assume k < n)

I'm not worried about this case. My program can cause nasal demons if that happens.
 


CRGreathouse said:
I'm not worried about this case. My program can cause nasal demons if that happens.

Oh, fair enough. Then my algorithm should work too.

However this k<n assumption is still pretty suspicious. How can one claim not to know n yet make the claim that k<n? :eek:
 


ych22 said:
Oh, fair enough. Then my algorithm should work too.

Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it.

ych22 said:
However this k<n assumption is still pretty suspicious. How can one claim not to know n yet make the claim that k<n? :eek:

Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.
 
  • #10


CRGreathouse said:
Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it.



Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.

Oh oops. You are right on both counts!
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 20 ·
Replies
20
Views
2K
  • · Replies 29 ·
Replies
29
Views
6K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K