MHB Find the best function grouping with an iterative method

MathBoard001 · Apr 3, 2018

Hello to everyone!
I'm having a problem, banal in form, but perhaps complex in practical application.
Let's suppose we have 300 functions $$y = f (x)$$ of random trend and we want to group these functions into 10 subgroups (each group consisting of 30 functions).
However, this grouping must not be random, but such that the sum of the areas subtended by these functions is the 'minimum' possible in each group (there should be a sort of balance between the 10 soubgroups, preferably the best balance).
The ideal solution is to have $$Area = 0 $$ for each subgroup!

According to you which iterative method, it would be preferable for a problem like this?

Comparing the functions one by one is definitely unthinkable but of course I would like to use a robust algorithm that programatically would be able to subgroup in a good (not necessary perfect) way.

Do you have any precious suggestions?

ps. Sorry for my English, maybe not perfect!

Ackbach · Apr 3, 2018

That's a very interesting problem! Thanks for posting! Here are a few ideas:

1. Calculate the area for each function, call it $A_i$, for $1\le i\le 300$. We'll denote each function by $f_i$.
2. You have ten groups, and your goal is to assign thirty functions to each group so as to get the sum of the areas as close to zero as possible. This would suggest a cost function of the form
$$C=\sum_{\text{groups}}\left[\,\sum_{f_i\in\;\text{group}}A_i\right]^2.$$
You have to square the inner sum, or take its magnitude, or else the algorithm will happily try to make the sums as negative as possible. This cost function is what you want to minimize. Now, your search space is absolutely gargantuan:
$$\frac{300!}{(30!)^{10}}\approx 1.78\times 10^{290}.$$
An exhaustive search is not possible, as you've already mentioned.

I would probably use an evolutionary algorithm, maybe with simulated annealing, to solve this one. But there's certainly no guarantee you'll find the absolute best possible solution.

Thankfully, this is not a 2SAT or 3SAT kind of problem, where whether one assignment of T/F to the variables gives you zero feedback on whether your next guess is better or worse.

MathBoard001 · Apr 3, 2018

That's really interesting. I've deepened the 'simulated annealing' and I think that it could work if we start with a reasonable starting condition. I really appreciated your idea.
I would like to deepen the problem in this way:
Let's suppose we need to add a further constraint in this method. The constraint is that the functions should be as much opposite to X axis as possible between each other.
For example if we consider 4 functions (instead of 30) in the subgroup, one ideal combination is this one:

View attachment 8041
In this case, as you can see:

1. Total Area = 0 [positive area - negative Area = 0]
2. Function 1 is opposite to Function 2 and Function 3 is opposite to Function 4.

Then there is a perfect match between the 4 functions!
I would like something similar to that (not necessary a perfect opposition!). Just to be clear, if we have group of two functions; the positive maximum of the first function should correspond to the negative minimum of the second function and so on!

I was thinking to split the domain in more parts and perform a check in the single domains, but maybe this is a massive operation.

Do you have any suggestion?

Ackbach · Apr 3, 2018

What's the bigger context of this problem? I think what you're asking is going to increase the complexity of the problem a great deal, because now you have a lot more areas to calculate in advance. You'd have to something like define the $i,j$ area:
$$A_{ij}=\int_{x\,\in\,\mathcal{D}(f_i)\cap\mathcal{D}(f_j)}[f_i(x)-f_j(x)]\,dx,$$
so instead of three hundred areas, you now have $44700$ areas to compute! (You can divide the $300\times 300$ by two because $A_{ij}=-A_{ji}$, and then subtract $300$ because $A_{ii}=0$.) Maybe that's ok, but that's a lot of computation. You could re-cast your cost function like this:
$$C=\sum_{\text{groups}}\;\sum_{\begin{array}{c}f_i,\, f_j\;\in\;\text{group}\\ i<j\end{array}}A_{ij}^2.$$
I'm moving the square inside, because we don't want one of the $A_{ij}$ to cancel with another one. Another option is to define
$$A_{ij}=\int_{x\,\in\,\mathcal{D}(f_i)\cap\mathcal{D}(f_j)}[f_i(x)-f_j(x)]^2\,dx,$$
and then not bother squaring in the cost function. This approach will more closely line up your functions across the $x$ axis, but it will be more computationally expensive, as I've pointed out.

MHB Find the best function grouping with an iterative method

Attachments

Thread 'Area of Overlapping Squares'

Similar threads

Undergrad Trigonometry problem of interest

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Undergrad Geometry problem of interest with a 3-4-5 triangle

High School Can Six Pencils Be Arranged to Each Touch the Other Five?

High School Excel: converting a 3-ish week count into a monthly count

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers