Random Sampling: Set Inclusion

In summary: The probability of choosing exactly one element in common from two subsets A and B of size |A| and |B| is given by:p = (|A| * |B| * (n - |A|)! * (n - |B|)!) / ((n - |A| - |B| + 1)! * n!)This can be simplified to:p = (|A| * |B| * (n - |A|) * (n - |B|)!) / ((n - |A| - |B| + 1) * n!)And further simplified to:p = (|A| * |B| * (n - |A|) * (n
  • #1
Monte_Carlo
72
0
Hello,

I don't really know if this is considered a challenging problem but this is not for homework:

You're given a set of numbers S of size n. From S, you draw a random sample A, |A| < n. From S, you draw a random sample B, |B| < n. Sampling doesn't remove items from S. What is the probability that intersection of A and B has size 1?

Monte
 
Physics news on Phys.org
  • #2
Monte_Carlo said:
Hello,

I don't really know if this is considered a challenging problem but this is not for homework:

You're given a set of numbers S of size n. From S, you draw a random sample A, |A| < n. From S, you draw a random sample B, |B| < n. Sampling doesn't remove items from S. What is the probability that intersection of A and B has size 1?

Monte

I didn't finish the problem but I made a good start. I think the probability is

[ 2^(n-1) * (2^n - 2) + n ] / 2^(n+1)

The general idea is to create a multiplication table with all the subsets as row and column headers; then fill in the intersections. When you do this, you can then count the singletons by row. By convention you order the subsets as the empty set first, then all the singletons, etc. That part doesn't matter except that I want the first and last rows of the multiplication table to be the empty set and the entire set, respectively, for the following discussion.

There are 2^n subsets, so we have a 2^n X 2^n table whose grids are filled in with the intersection of the row and column header.

With the convention that the empty set is the top row, the top row of the table contains zero singletons; since every set intersected with the empty set is empty.

The bottom row is the entire set S, which acts as an identity for the operation of set intersection. So the bottom row contains exactly n singletons, one for each element of the set S.

There are 2^n - 2 remaining rows. (There are 2^n total rows, minus the top and bottom).

Claim/guess/conjecture: Each of these remaining rows (not the top, not the bottom) contains exactly 2^(n-1) singletons. This part has to be proved. In other words for each row, exactly half of the intersections are singletons.

If this is true then the total number of singletons is the number of non-top/bottom rows, which is 2^n - 2; times the number of singletons for each of those rows, which is 2^(n-1); plus the singletons in the bottom row, which is n. The sum of those three terms is the total number of singletons; and to get the probability you just divide by the total number of possible pairings of 2^n subsets, which is 2^(n+1).

Perhaps something I wrote will inspire someone to see through to the end of this.

(edit) My conjecture's wrong, but there's a pattern to how many singletons there are per row. It amounts to asking: for a given subset of a set, how many other subsets contain exactly one of the elements of the first set and no others. For example for some S with more than 3 elements, if you pick the subset {1,2,3}, you are asking: how many other subsets of S contain exactly one of either 1, 2, or 3. This sounds like it must be a well-known combinatorial problem.
 
Last edited:
  • #3
Here is a combinatorial approach. We will assume [itex]|A| + |B| \leq n + 1[/itex].

There are [tex]\binom{n}{A} \binom{n}{B}[/tex] ways to select the two subsets, all of which we assume are equally likely. We want to count the number of arrangements in which A and B have exactly one element in common. There are n ways to pick the common element. Then there are [tex]\binom{n-1}{|A|-1}[/tex] ways to select the remaining elements of A. Except for the one common element, A and B are disjoint, so there are [tex]\binom{n-|A|}{|B|-1}[/tex] ways to pick the remaining elements of B. So there are
[tex]n \binom{n-1}{|A|-1} \binom{n-|A|}{|B|-1}[/tex]
ways to pick the elements of A and B, and the probability that A and B will have exactly one element in common is
[tex]p = \frac{n \binom{n-1}{|A|-1} \binom{n-|A|}{|B|-1}}{\binom{n}{A} \binom{n}{B}}[/tex]
which simplifies to
[tex]p = \frac{|A| \; |B| \; (n-|A|)! \; (n - |B|)!} {(n - |A| - |B| + 1)! \; n!}[/tex]
 
  • #4
Could you check your logic with the actual computation? I know the answer is correct, but I'm not quite sure it follows from your calculation.
 
  • #5
Never mind, I got it.
 

1. What is random sampling?

Random sampling is a statistical technique where a subset of individuals or objects is selected from a larger population in a way that each member of the population has an equal chance of being chosen. This allows researchers to make inferences about the entire population based on the sample.

2. How is set inclusion used in random sampling?

Set inclusion refers to the process of ensuring that every member of a population has an equal chance of being included in the sample. This is achieved by using a random selection method, such as simple random sampling or stratified random sampling, where each member of the population is assigned a number and then a random number generator is used to select the sample.

3. What is the purpose of using random sampling?

The purpose of using random sampling is to obtain a representative sample that accurately reflects the characteristics of the entire population. This allows researchers to make generalizations and draw conclusions about the population based on the sample data.

4. What are the advantages of using random sampling?

Random sampling is advantageous because it eliminates bias and ensures that the sample is representative of the entire population. It also allows for the use of statistical methods to make inferences about the population, as the sample is selected in a way that is independent and unbiased.

5. Are there any limitations to random sampling?

While random sampling is a commonly used and effective method, it does have some limitations. For example, it can be time-consuming and expensive to obtain a truly random sample. In addition, it may not always be feasible to include all members of the population in the sample, especially if the population is large and geographically dispersed.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
454
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
865
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
519
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
441
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
908
Back
Top