# Help! - Combinatorics cause heavy computing

## Main Question or Discussion Point

I have n comparable sets of data, several thousand numbered values in each set. A certain number can be calculated by choosing m of these sets, let's call it S_m. Now, the value of S_m depends on which sets I choose, so it can be thought of as a function of m variables which can only take discrete values, and no two the same. (corresponding to the sets chosen)

My problem is to for a certain m find the combination/s which give/s me the max/min of S_m.
It can be done by calculating S_m for every combination, but this quickly amounts to a huge number of calculations, the most of course for m=n/2. I therefore probably need some algorithm for reaching the right combination.

Any ideas?

Hurkyl
Staff Emeritus
Gold Member
Any ideas?
With the information you've given us, any (correct) algorithm requires you to try every combination. To do better than that, you will need to use knowledge of the specific function you're trying to maximize.

Ok, the sets are columns of values, and when we choose m sets we put the columns together in a matrix.

Now, S_m = a/b, where a is the largest row-sum and b is the sum of the maxima of each column.

CRGreathouse
Homework Helper
How is the row sum defined? $$S_2(\{1,2\},\{3,4\})$$ can involve

$$\left[\begin{array}{ccc} 1&3\\ 2&4\end{array}\right]$$ or $$\left[\begin{array}{ccc} 2&3\\ 1&4\end{array}\right]$$

which have maximum row sums 6 and 5, respectively. Since sets (by definition!) don't have order, how do you choose?

I guess I was a little mathematically unprecise. (Not a mathematician, almost a physicist)
Lists is probably more precise than sets, the values are in a time sequence and therefore ordered.
By row sum I just mean the sum of the elements in a row, and the maximum row sum is just the largest row sum.

Physical interpretation of the problem: values are recorded in n sites in a time sequence.
Choosing m of the n sites, I need to find out which combination gives me the highest/lowest ratio between (the highest peak of the sum of values) and (the sum of (the highest peak in each site))

CRGreathouse
Homework Helper
I guess I was a little mathematically unprecise. (Not a mathematician, almost a physicist)
Lists is probably more precise than sets, the values are in a time sequence and therefore ordered.
In that case m = n/2 is no longer your worst case; your worst case is m = n (or n - 1). This nearly squares the number of cases!

Can you give a small worked out example so I'm sure I properly understand? For example, with sets {1, 2} and {2, 3}, the matrices for $$S_2$$ could be

$$\left[\begin{array}{cc} 1&2\\ 2&3 \end{array}\right]$$ $$\left[\begin{array}{cc} 1&3\\ 2&2 \end{array}\right]$$ $$\left[\begin{array}{cc} 2&2\\ 1&3 \end{array}\right]$$ $$\left[\begin{array}{cc} 2&3\\ 1&2 \end{array}\right]$$

and the matrices for $$S_1$$

[1 2] [1 3] [2 2] [2 3]

right?

I think you are making this more complicated than it is:

I have a table of values, each column (n columns) represents a value recording site, each row all values recorded at a specific time. I pick m of these columns and calculate mentioned S_m for those. Which combination of m columns give me the highest/lowest S_m?

Example:

1 & 2 &2 \\
2 & 4 & 3 \\
3 & 1 & 1 \\

Let's calculate S_2 for all 3 combinations:
Column 1 and 2:
Maximum row sum is 6, sum of maximum of columns is 7, so S_2=6/7.

columns 1 and 3:
Maximum row sum is 5, sum of maximum of columns is 6, so S_2=5/6.

Columns 2 and 3:
Mrs = 7 , sum of max of col is 7, so S_2=1.

This should clear it up. Btw, how do I show LaTeX?

CRGreathouse
Homework Helper
(LaTeX: do $...$ as [ tex]...[/tex] and $...$ as [ itex]...[/itex] without the spaces.)

Okay, that's yet another case (glad I asked for an example!). The bad case there is m = n/3, give or take. I'll have to think about this. What restrictions are there on the values? All positive? Integers or reals? Will they tend to be close in value?

All right,
the values are positive reals, and high values tend to happen closely in time over all columns.(I.e. in rows close to each other) Values can differ in 1-2 magnitudes of order.

It's a tough one, but I really have to find some other method than brute-force. Taking only 10 thousand combinations for each m, having n=50, calculating $S_m$ for all combinations and all m up to 49 takes a few hours. And that doesn't even come close to solving my problem, finding the max/min of $S_m$ for each m. This is partly caused by the vast number of rows that need to be summed up, about 35 thousand.

CRGreathouse
Homework Helper
At the moment you're talking about a problem that (with n = 50) would take in the neighborhood of 1e51 years, or worst-case (n = 10,000) 1e16122 years. The age of the universe, for comparison, is estimated around 1e10 to 3e10 years.

Would a good estimate be worthwhile? Simulated annealing might be a viable approach if that's the case.

I started looking around yesterday for search algorithms and stumbled upon simulated annealing, and it does look slightly promising.

I might try that, but I suspect that to truly solve this in an elegant way I need to come up with something based on how $$S_m$$ is calculated.

CRGreathouse