Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Question about probabilistic combinatorial maximization

  1. Jul 28, 2011 #1

    I have some question about probabilistic combinatorial maximization as follows:

    Let X = {X_1, ..., X_n} be a set of i.i.d. positive random variables,
    S = {s_i} be a set of all combinations of selecting m r.v.'s from X, and
    Y(s_i) = the sum of r.v.'s in the combination s_i .
    I would like to evaluate the expectation of max_{s_i \in S} Y(s_i) or find out its distribution.

    Since I am not sure what this type of problem is called, I have not been able to search for the solution. Any help or pointer is appreciated. Thank you so much.
    Last edited: Jul 28, 2011
  2. jcsd
  3. Jul 28, 2011 #2
    Hmm -- I've never seen anything like this before, either.

    First, for any given sample, the s_i that maximizes Y is going to be the one that includes all positive X's and leaves out all the negative ones. The number of positive X's will just be a binomial with p = Pr[X > 0] and N=n. Let's call that k. Then, those k X's will be IID, with the distribution being that of the original X, conditioned on X>0. If X has a PDF f(x), the PDF of the positive X's will be

    f^+(x) & = & \frac{1}{N}
    \{ &
    0 & x<0 \\
    f(x) & x\geq 0

    \end{array} \\

    N & = & \int _0^{\infty }f(x)dx

    So your problem reduces to the sum of k IID variables, with k binomially distributed, and each variable having a distribution corresponding to the positive part of the distribution of the X's.

    Still not a simple problem, but it begins to look tractable.
  4. Jul 28, 2011 #3
    Thanks pmrw3. But, I should have added that X_i is positive r.v. only.
  5. Jul 28, 2011 #4
    Well, in that case it's trivial. You obviously maximize the sum by summing up all the X's. So you just have the problem of finding the distribution of the sum of IID random variables.
  6. Jul 28, 2011 #5
    Here Z=max(...) is simply the sum of the m largest values of the X_k, i.e. the upper order statistics. Using the notation X_{1:n}<=...<=X_{n:n} for the order stats then Z=X_{n-m+1:n}+...+X_{n:n} and E[Z]=E[X_{n-m+1:n}]+...+E[X_{n:n}].

    If the X_k are continuous then E[X_{k:n}] can be calculated as

    E[X_{k:n}] = int_{-inf}^{inf} x*(n!/(k-1)!(n-k)!)*F(x)^(k-1)*(1-F(x))^(n-k)*f(x) dx
  7. Jul 28, 2011 #6
    Thanks both pmrw3 and bpet for the replies. I think what bpet had is essentially the answer.

    In addition, does anyone know if there is limiting result? For example, how Z scales when n -> infnity or when n,m -> infnity with fixed ratio m/n. Thank you.
  8. Jul 29, 2011 #7
    The empirical cdf converges to the distribution cdf as n -> inf, so that implies that Z/m -> E[X|F(X)>1-m/n]. If m is fixed then Z/m -> xf (the right end-point of the distribution) or if m/n is fixed then Z/m -> E[X|X>F^{-1)(1-m/n)] and I guess if either of these are finite you could apply a CLT argument as well.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook