Reducing computation for large power sets

volican · Jan 24, 2018

To illustrate my problem say I have the following table:

Option, x , y
A , 25 , 30
B , 5 , 12
C , 3 , 9
D, 12, 13

I want to create a graph of every possible combination in the set where the x values are added and the y values are added. For example say it was the combination AB then it would be (x = 25 + 5 , y = 30 + 12). This would become one point on a graph with x and y coordinates (30,42). I am working on the assumption that this is a power set problem. I have created a code to generate the binary table and this works fine when I have less than 15 entries.

In reality I will have 50 options and this I will give 2^50 combinations. My MS Access database that I am writing this to has a maximum capacity of 2GB and it is slow to run. There is definitely scope to improve the efficiency of my code but I am thinking that a fundamental change of tact might be in order but I am not sure how. Any pointers or advice would be much appreciated.

I am really only interested in the outline shape of the scatter from all the plotted points and where one particular point might sit within the context of this outline shape. I am wondering if there is a way that I can work out the shape without running through every possible calculation? Does anyone know if there is a way that I can know what the outline will be without having to do the calculations for all the points in the middle? Also, if not, any ideas for alternative strategies that might work better?

Thank you for your time and help :)

mathman · Jan 24, 2018

Your description is a little confusing. If you have 50 x values andd 50 y values then you have 2500 possible pairs. What are you describing to get [tex]2^{50}[/tex]?

volican · Jan 24, 2018

Appologies. 2^50 is the total number of possible combinations for when I have 50 options. Some examples for the simplified case above being e.g. AB, BC, ABC, CD, ADC ... In the case above there would be 2^4 possible combinations of A,B,C, &D

StoneTemplePython · Jan 24, 2018

volican said:

In reality I will have 50 options and this I will give 2^50 combinations. My MS Access database that I am writing this to has a maximum capacity of 2GB and it is slow to run. There is definitely scope to improve the efficiency of my code but I am thinking that a fundamental change of tact might be in order but I am not sure how. Any pointers or advice would be much appreciated.

I also found it rather hard to figure out what exactly you are trying to accomplish with this.

Let me be clear: you must find a different approach. Individual computers tap out on problem sizes in the neighborhood of ##2^{25}##, which I will generously round up to ##2^{30}##. A naive implementation of your problem would thus require ##2^{20}## computers working in parallel to solve it. You need a serious rethink on this.

volican · Jan 25, 2018

any thoughts on how to reduce the number of points? What I am really aiming for is to be able to determine what the outline shape of the distribution would be just from looking at the initial data. Do you think it would be useful to run the program and save the results for 0 options all the way to 20 options and see what the outline points for these runs where and then feed the results into Sci-Kit learn to see if it can find any patterns or relationships to have a good guess at what 2^50 would look like?

StoneTemplePython · Jan 25, 2018

The typical approach is to just use random sampling -- in most anything if the underlying population is large (perhaps uncountably so if you are trying to approximate an integral) but you do a 'good enough' job with random sampling you get to the essence very nicely.

I'm not understanding the motivation for this problem though, and kind of thinking it needs a lot more care.

volican · Jan 25, 2018

Thanks, I will definitely give that a try with a convex hull algorithm. It's for comparing different options.

Stephen Tashi · Jan 27, 2018

volican said:

Appologies. 2^50 is the total number of possible combinations for when I have 50 options. Some examples for the simplified case above being e.g. AB, BC, ABC, CD, ADC ... In the case above there would be 2^4 possible combinations of A,B,C, &D

You mention "ADC". Is that a typo or do you allow taking 3 of the letters at a time?.

"AB" produces the same point as "BA". You need to consider such things in trying to count the total number of distinct points that will result.

volican said:

Thanks, I will definitely give that a try with a convex hull algorithm. It's for comparing different options.

What is the bottom line of the problem? Are you trying to find the convex hull of the set of points formed by adding all possible combinations of two of the vectors A,B,C...?

volican · Jan 27, 2018

Yes all number of unique combinations (order does not matter) are possible. Duplicates are not included. To generate the set I've just used binary numbers. The problem is that at 50 options the number of possible combinatins is huge. I have not had chance to try it yet but I think that it does make total sense to try random sampling up to what the computer can handle. The program seems to run with around 1 min wait time at 2^20 combinations. I am actually quite interested to try random sampling of 2^20 combinations in the 2^50 set and running this several times to see how much the outline shape actually changes, I am hoping that it isn't much and that I can indeed reduce the number of points needed much further - that would be great and I would indeed owe StoneTemplePython a beer in that case.

Stephen Tashi · Jan 27, 2018

volican said:

Yes all number of unique combinations (order does not matter) are possible. Duplicates are not included.

If you were replying to my questions, you didn't answer them.
1) Are combinations of more than one letter considered? Are combinations of a letter with itself considered (e.g. AA, ABABA )?
2) What is the goal of plotting these combinations? Are you trying to calculate the convex hull of the resulting set of points?

volican · Jan 27, 2018

Yes it is for all unique combinations - so ABC is included - otherwise the number of combinations wouldn’t be so large.

It is the outline shape I am really interested in. Any thoughts how to get that without having to plot the points at all? At the moment I am thinking of plotting the points and using convex hull to get the outline. If I also wanted to get an equation for the outline shape what do you think is the best way to do that? Convex hull and then Lagrange interpolation?

The goal is that I can then pick any particular combination and see where it is in relation to a particular inflexion point that I am expecting to appear on the outline shape. This has some physical relevance but I am just wondering about how best to do it from a maths point of view.

Stephen Tashi · Jan 27, 2018

Are all the (x,y) components of the data non-negative?

volican · Jan 27, 2018

yes they will be non-negative

Stephen Tashi · Jan 27, 2018

volican said:

Yes it is for all unique combinations

The phrase "unique combinations" isn't specific. We haven't settled whether a datum like A can be repeated.
Can a point be generated by AAABBBCC ?

What is the maxiumum number of (x,y) that can be added to create a point? If you have 50 pairs of (x,y) data, what the most pairs you can add to create a point? Can you add all 50 pairs to create a point?

willem2 · Jan 28, 2018

There are 50 vectors that are each included 0 or 1 times in a vector sum. It seems you would need to consider 2^50 different vector sums.
Since you want to graph all the points, the numbers involved are probably integers that aren't too large. Let's say in the range 0..999
In this case there are only 10^6 different possible outcomes. One method already mentioned, is to generate random vector sums. If you would generate 10^7 random sums it seems pretty likely that you would hit nearly all the possible outcomes. Of course there could still be points that only one single combination could produce, and you might have to produce more than 2^50 points to get them all.

It's also possible to eliminate all the duplicates during the enumeration of all the points.
A depth first enumeration would look like this:

Code:

search(int i, vector startv) {
    if (i <= maxv) {
        search (i+1, startv); // do not use vector i
        plot(startv + v[i]);
        search(i+1, startv+v[i]);  // use vector i
    }
}

you call this with search(1, (0,0))
v[1]..v[maxv] is an array of 2 element vectors.

This can be made radically faster if you have an array with one element of all possible outcomes, and you store wether you've already produced this outcome and at which depth.

array should be initialized with all values that are bigger than maxv

Code:

search(int i, vector startv) {
    if (i <= maxv) {
        if (array[startv] > i) {
            array[startv] = i;
            search (i+1, startv);
            plot(startv + v[i]);
            search(i+1, startv+v[i]);
         }
    }
}

If you've ever visited a point before at a search depth smaller or equal to the current depth you can stop, because all the vectors that you might have added to the current vector have already been tried.
The absolute maximum time this could run in would be (search depth) * (number of possible outcomes)

This doesn't seem to be a very good project for MsAccess. I suppose you could use a points table with attributes (x, y) as a unique key, and start with only (0,0) in it and then run 50 insert queries where you insert (x+v.x, y+v.y).

willem2 · Jan 28, 2018

The msaccess method might actually be feasible. After using twelve random vectors I get 1775 points in the table.
Export the table to Excel and make an xy graph.

A primitive fractal. Of course if you use many more vectors it will start to look more like a blob.

Reducing computation for large power sets

Attachments

Related to Reducing computation for large power sets

1. What is the purpose of reducing computation for large power sets?

2. How can computation for large power sets be reduced?

3. What are the potential benefits of reducing computation for large power sets?

4. Are there any challenges associated with reducing computation for large power sets?

5. How does reducing computation for large power sets impact scientific research?

Similar threads

Hot Threads

Recent Insights