Pick k (assume k < n) items from the list

bob j · Nov 26, 2010

I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

thanks

ych22 · Nov 26, 2010

bob j said:

I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

thanks

I don't understand how you can have a list of n items and not know n.

Besides:
Are you picking a subset from a set?
Set from a list?
List from a set?
List from a list?

I think there is a difference.

bob j · Nov 26, 2010

Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...

ych22 · Nov 26, 2010

bob j said:

Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...

I think the best you can do is to pick k random distinct integers from {0,...,m} where m is an arbitary number bigger than k. Order the k integers in a list L.

During your one pass through, process the i-th line if i is an element in L, skip the rest.

Obviously you will not be able to read k lines if the actual value of n > m. But this is the best I can think of.

CRGreathouse · Nov 26, 2010

I assume you mean k/n, not 1/n.

bob j said:

I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.

ych22 · Nov 26, 2010

CRGreathouse said:

I assume you mean k/n, not 1/n.

Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.

What happens if k > n? That to me is the problem...

CRGreathouse · Nov 26, 2010

ych22 said:

What happens if k > n? That to me is the problem...

Since the OP said

bob j said:

(assume k < n)

I'm not worried about this case. My program can cause nasal demons if that happens.

ych22 · Nov 27, 2010

CRGreathouse said:

I'm not worried about this case. My program can cause nasal demons if that happens.

Oh, fair enough. Then my algorithm should work too.

However this k<n assumption is still pretty suspicious. How can one claim not to know n yet make the claim that k<n?

CRGreathouse · Nov 27, 2010

ych22 said:

Oh, fair enough. Then my algorithm should work too.

Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it.

ych22 said:

However this k<n assumption is still pretty suspicious. How can one claim not to know n yet make the claim that k<n?

Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.

ych22 · Nov 27, 2010

CRGreathouse said:

Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it.

Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.

Oh oops. You are right on both counts!

Pick k (assume k < n) items from the list

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective