Pick k (assume k < n) items from the list

  • Thread starter bob j
  • Start date
  • Tags
    List
In summary: If n is small then k can be small too. If n is large then k can be large too.Oh oops. You are right on both counts!
  • #1
bob j
22
0
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

thanks
 
Physics news on Phys.org
  • #2


bob j said:
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

thanks

I don't understand how you can have a list of n items and not know n.

Besides:
Are you picking a subset from a set?
Set from a list?
List from a set?
List from a list?

I think there is a difference.
 
  • #3


Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...
 
  • #4


bob j said:
Let's say there is a file composed of n lines. I do not know the number of lines in advance. I wish to pick k random lines from the file, with one read only (without storing the lines in memory and then doing the rest)...

I think the best you can do is to pick k random distinct integers from {0,...,m} where m is an arbitary number bigger than k. Order the k integers in a list L.

During your one pass through, process the i-th line if i is an element in L, skip the rest.

Obviously you will not be able to read k lines if the actual value of n > m. But this is the best I can think of.
 
  • #5


I assume you mean k/n, not 1/n.

bob j said:
I have a list of n items. My goal is to pick k (assume k < n) items from the list such that each one of them as probability 1/n of being chosen. Would it be possible to accomplish this without knowing n, in one reading?

Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.
 
  • #6


CRGreathouse said:
I assume you mean k/n, not 1/n.



Yes -- but it won't be efficient. Store the first k items as your preliminary picks. Start x as k + 1 and increase it by 1 until you reach the end of the file. For each x, select x with probability k/x. If selected, choose a random number r from 1 to k and replace preliminary pick r with x.

What happens if k > n? That to me is the problem...
 
  • #7


ych22 said:
What happens if k > n? That to me is the problem...

Since the OP said

bob j said:
(assume k < n)

I'm not worried about this case. My program can cause nasal demons if that happens.
 
  • #8


CRGreathouse said:
I'm not worried about this case. My program can cause nasal demons if that happens.

Oh, fair enough. Then my algorithm should work too.

However this k<n assumption is still pretty suspicious. How can one claim not to know n yet make the claim that k<n? :eek:
 
  • #9


ych22 said:
Oh, fair enough. Then my algorithm should work too.

Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it.

ych22 said:
However this k<n assumption is still pretty suspicious. How can one claim not to know n yet make the claim that k<n? :eek:

Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.
 
  • #10


CRGreathouse said:
Well, yours doesn't pick them with the desired probability. I think that mine does, though I haven't proved it.



Perhaps k is small and n is very large. You might know that n is between a million and ten million, and that k is 10.

Oh oops. You are right on both counts!
 

1. What is the purpose of picking k items from a list?

The purpose of picking k items from a list is to select a specific number of items from a larger set in order to perform some kind of analysis or study. This can help to simplify a complex dataset or to focus on a particular subset of data.

2. How is k determined when picking items from a list?

The value of k is typically determined based on the specific research question or objective. It can also depend on the size of the original list and the amount of information needed to accurately represent the data.

3. What factors should be considered when selecting k items from a list?

When selecting k items from a list, it is important to consider the purpose of the analysis, the size and diversity of the original list, and the specific characteristics or attributes of the items that are being chosen. It may also be helpful to consult with colleagues or experts in the field for additional input.

4. Can k items be randomly selected from a list?

Yes, k items can be randomly selected from a list. Random selection can be useful for creating a representative sample of the larger dataset, as long as the selection process is unbiased and the sample size is appropriate for the research question.

5. How can the results of picking k items from a list be applied to the entire dataset?

The results of picking k items from a list can be applied to the entire dataset through statistical methods such as extrapolation and generalization. These methods use the information gathered from the selected items to make inferences about the entire list, assuming that the selected items are representative of the larger set. However, it is important to note that these methods have limitations and the results should be interpreted with caution.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
907
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
913
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
Back
Top