Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Random Sampling: Set Inclusion

  1. Feb 12, 2012 #1

    I don't really know if this is considered a challenging problem but this is not for homework:

    You're given a set of numbers S of size n. From S, you draw a random sample A, |A| < n. From S, you draw a random sample B, |B| < n. Sampling doesn't remove items from S. What is the probability that intersection of A and B has size 1?

  2. jcsd
  3. Feb 12, 2012 #2
    I didn't finish the problem but I made a good start. I think the probability is

    [ 2^(n-1) * (2^n - 2) + n ] / 2^(n+1)

    The general idea is to create a multiplication table with all the subsets as row and column headers; then fill in the intersections. When you do this, you can then count the singletons by row. By convention you order the subsets as the empty set first, then all the singletons, etc. That part doesn't matter except that I want the first and last rows of the multiplication table to be the empty set and the entire set, respectively, for the following discussion.

    There are 2^n subsets, so we have a 2^n X 2^n table whose grids are filled in with the intersection of the row and column header.

    With the convention that the empty set is the top row, the top row of the table contains zero singletons; since every set intersected with the empty set is empty.

    The bottom row is the entire set S, which acts as an identity for the operation of set intersection. So the bottom row contains exactly n singletons, one for each element of the set S.

    There are 2^n - 2 remaining rows. (There are 2^n total rows, minus the top and bottom).

    Claim/guess/conjecture: Each of these remaining rows (not the top, not the bottom) contains exactly 2^(n-1) singletons. This part has to be proved. In other words for each row, exactly half of the intersections are singletons.

    If this is true then the total number of singletons is the number of non-top/bottom rows, which is 2^n - 2; times the number of singletons for each of those rows, which is 2^(n-1); plus the singletons in the bottom row, which is n. The sum of those three terms is the total number of singletons; and to get the probability you just divide by the total number of possible pairings of 2^n subsets, which is 2^(n+1).

    Perhaps something I wrote will inspire someone to see through to the end of this.

    (edit) My conjecture's wrong, but there's a pattern to how many singletons there are per row. It amounts to asking: for a given subset of a set, how many other subsets contain exactly one of the elements of the first set and no others. For example for some S with more than 3 elements, if you pick the subset {1,2,3}, you are asking: how many other subsets of S contain exactly one of either 1, 2, or 3. This sounds like it must be a well-known combinatorial problem.
    Last edited: Feb 12, 2012
  4. Feb 13, 2012 #3
    Here is a combinatorial approach. We will assume [itex]|A| + |B| \leq n + 1[/itex].

    There are [tex]\binom{n}{A} \binom{n}{B}[/tex] ways to select the two subsets, all of which we assume are equally likely. We want to count the number of arrangements in which A and B have exactly one element in common. There are n ways to pick the common element. Then there are [tex]\binom{n-1}{|A|-1}[/tex] ways to select the remaining elements of A. Except for the one common element, A and B are disjoint, so there are [tex]\binom{n-|A|}{|B|-1}[/tex] ways to pick the remaining elements of B. So there are
    [tex]n \binom{n-1}{|A|-1} \binom{n-|A|}{|B|-1}[/tex]
    ways to pick the elements of A and B, and the probability that A and B will have exactly one element in common is
    [tex]p = \frac{n \binom{n-1}{|A|-1} \binom{n-|A|}{|B|-1}}{\binom{n}{A} \binom{n}{B}}[/tex]
    which simplifies to
    [tex]p = \frac{|A| \; |B| \; (n-|A|)! \; (n - |B|)!} {(n - |A| - |B| + 1)! \; n!}[/tex]
  5. Nov 14, 2012 #4
    Could you check your logic with the actual computation? I know the answer is correct, but I'm not quite sure it follows from your calculation.
  6. Nov 14, 2012 #5
    Never mind, I got it.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook