Reducing computation for large power sets

In summary, the problem involves finding the convex hull of a set of points formed by adding all possible combinations of two vectors. The number of combinations can be extremely large, making it difficult to run efficiently on a computer. There is a need for a different approach, such as using random sampling, to reduce the number of points needed while still accurately representing the overall shape of the distribution.
  • #1
volican
41
0
To illustrate my problem say I have the following table:

Option, x , y
A , 25 , 30
B , 5 , 12
C , 3 , 9
D, 12, 13

I want to create a graph of every possible combination in the set where the x values are added and the y values are added. For example say it was the combination AB then it would be (x = 25 + 5 , y = 30 + 12). This would become one point on a graph with x and y coordinates (30,42). I am working on the assumption that this is a power set problem. I have created a code to generate the binary table and this works fine when I have less than 15 entries.

In reality I will have 50 options and this I will give 2^50 combinations. My MS Access database that I am writing this to has a maximum capacity of 2GB and it is slow to run. There is definitely scope to improve the efficiency of my code but I am thinking that a fundamental change of tact might be in order but I am not sure how. Any pointers or advice would be much appreciated.

I am really only interested in the outline shape of the scatter from all the plotted points and where one particular point might sit within the context of this outline shape. I am wondering if there is a way that I can work out the shape without running through every possible calculation? Does anyone know if there is a way that I can know what the outline will be without having to do the calculations for all the points in the middle? Also, if not, any ideas for alternative strategies that might work better?

Thank you for your time and help :)
 
Physics news on Phys.org
  • #2
Your description is a little confusing. If you have 50 x values andd 50 y values then you have 2500 possible pairs. What are you describing to get [tex]2^{50}[/tex]?
 
  • #3
Appologies. 2^50 is the total number of possible combinations for when I have 50 options. Some examples for the simplified case above being e.g. AB, BC, ABC, CD, ADC ... In the case above there would be 2^4 possible combinations of A,B,C, &D
 
  • #4
volican said:
In reality I will have 50 options and this I will give 2^50 combinations. My MS Access database that I am writing this to has a maximum capacity of 2GB and it is slow to run. There is definitely scope to improve the efficiency of my code but I am thinking that a fundamental change of tact might be in order but I am not sure how. Any pointers or advice would be much appreciated.

I also found it rather hard to figure out what exactly you are trying to accomplish with this.

Let me be clear: you must find a different approach. Individual computers tap out on problem sizes in the neighborhood of ##2^{25}##, which I will generously round up to ##2^{30}##. A naive implementation of your problem would thus require ##2^{20}## computers working in parallel to solve it. You need a serious rethink on this.
 
  • #5
any thoughts on how to reduce the number of points? What I am really aiming for is to be able to determine what the outline shape of the distribution would be just from looking at the initial data. Do you think it would be useful to run the program and save the results for 0 options all the way to 20 options and see what the outline points for these runs where and then feed the results into Sci-Kit learn to see if it can find any patterns or relationships to have a good guess at what 2^50 would look like?
 
Last edited:
  • #6
The typical approach is to just use random sampling -- in most anything if the underlying population is large (perhaps uncountably so if you are trying to approximate an integral) but you do a 'good enough' job with random sampling you get to the essence very nicely.

I'm not understanding the motivation for this problem though, and kind of thinking it needs a lot more care.
 
  • #7
Thanks, I will definitely give that a try with a convex hull algorithm. It's for comparing different options.
 
Last edited:
  • #8
volican said:
Appologies. 2^50 is the total number of possible combinations for when I have 50 options. Some examples for the simplified case above being e.g. AB, BC, ABC, CD, ADC ... In the case above there would be 2^4 possible combinations of A,B,C, &D

You mention "ADC". Is that a typo or do you allow taking 3 of the letters at a time?.

"AB" produces the same point as "BA". You need to consider such things in trying to count the total number of distinct points that will result.

volican said:
Thanks, I will definitely give that a try with a convex hull algorithm. It's for comparing different options.

What is the bottom line of the problem? Are you trying to find the convex hull of the set of points formed by adding all possible combinations of two of the vectors A,B,C...?
 
  • #9
Yes all number of unique combinations (order does not matter) are possible. Duplicates are not included. To generate the set I've just used binary numbers. The problem is that at 50 options the number of possible combinatins is huge. I have not had chance to try it yet but I think that it does make total sense to try random sampling up to what the computer can handle. The program seems to run with around 1 min wait time at 2^20 combinations. I am actually quite interested to try random sampling of 2^20 combinations in the 2^50 set and running this several times to see how much the outline shape actually changes, I am hoping that it isn't much and that I can indeed reduce the number of points needed much further - that would be great and I would indeed owe StoneTemplePython a beer in that case.
 
  • #10
volican said:
Yes all number of unique combinations (order does not matter) are possible. Duplicates are not included.
If you were replying to my questions, you didn't answer them.
1) Are combinations of more than one letter considered? Are combinations of a letter with itself considered (e.g. AA, ABABA )?
2) What is the goal of plotting these combinations? Are you trying to calculate the convex hull of the resulting set of points?
 
  • #11
Yes it is for all unique combinations - so ABC is included - otherwise the number of combinations wouldn’t be so large.

It is the outline shape I am really interested in. Any thoughts how to get that without having to plot the points at all? At the moment I am thinking of plotting the points and using convex hull to get the outline. If I also wanted to get an equation for the outline shape what do you think is the best way to do that? Convex hull and then Lagrange interpolation?

The goal is that I can then pick any particular combination and see where it is in relation to a particular inflexion point that I am expecting to appear on the outline shape. This has some physical relevance but I am just wondering about how best to do it from a maths point of view.
 
Last edited:
  • #12
Are all the (x,y) components of the data non-negative?
 
  • #13
yes they will be non-negative
 
  • #14
volican said:
Yes it is for all unique combinations

The phrase "unique combinations" isn't specific. We haven't settled whether a datum like A can be repeated.
Can a point be generated by AAABBBCC ?

What is the maxiumum number of (x,y) that can be added to create a point? If you have 50 pairs of (x,y) data, what the most pairs you can add to create a point? Can you add all 50 pairs to create a point?
 
  • #15
There are 50 vectors that are each included 0 or 1 times in a vector sum. It seems you would need to consider 2^50 different vector sums.
Since you want to graph all the points, the numbers involved are probably integers that aren't too large. Let's say in the range 0..999
In this case there are only 10^6 different possible outcomes. One method already mentioned, is to generate random vector sums. If you would generate 10^7 random sums it seems pretty likely that you would hit nearly all the possible outcomes. Of course there could still be points that only one single combination could produce, and you might have to produce more than 2^50 points to get them all.

It's also possible to eliminate all the duplicates during the enumeration of all the points.
A depth first enumeration would look like this:

Code:
search(int i, vector startv) {
    if (i <= maxv) {
        search (i+1, startv); // do not use vector i
        plot(startv + v[i]);
        search(i+1, startv+v[i]);  // use vector i
    }
}

you call this with search(1, (0,0))
v[1]..v[maxv] is an array of 2 element vectors.

This can be made radically faster if you have an array with one element of all possible outcomes, and you store wether you've already produced this outcome and at which depth.

array should be initialized with all values that are bigger than maxv

Code:
search(int i, vector startv) {
    if (i <= maxv) {
        if (array[startv] > i) {
            array[startv] = i;
            search (i+1, startv);
            plot(startv + v[i]);
            search(i+1, startv+v[i]);
         }
    }
}

If you've ever visited a point before at a search depth smaller or equal to the current depth you can stop, because all the vectors that you might have added to the current vector have already been tried.
The absolute maximum time this could run in would be (search depth) * (number of possible outcomes)

This doesn't seem to be a very good project for MsAccess. I suppose you could use a points table with attributes (x, y) as a unique key, and start with only (0,0) in it and then run 50 insert queries where you insert (x+v.x, y+v.y).
 
  • #16
The msaccess method might actually be feasible. After using twelve random vectors I get 1775 points in the table.
Export the table to Excel and make an xy graph.
upload_2018-1-28_13-0-3.png

A primitive fractal. Of course if you use many more vectors it will start to look more like a blob.
 

Attachments

  • upload_2018-1-28_12-58-10.png
    upload_2018-1-28_12-58-10.png
    25.8 KB · Views: 506
  • upload_2018-1-28_13-0-3.png
    upload_2018-1-28_13-0-3.png
    15.6 KB · Views: 342

Related to Reducing computation for large power sets

1. What is the purpose of reducing computation for large power sets?

The purpose of reducing computation for large power sets is to decrease the amount of time and resources needed to perform calculations on large sets of data. This can improve efficiency and speed up the overall process of data analysis.

2. How can computation for large power sets be reduced?

Computation for large power sets can be reduced through various techniques such as using more efficient algorithms, parallel processing, and data compression methods. These approaches can help to minimize the number of operations needed to analyze large sets of data.

3. What are the potential benefits of reducing computation for large power sets?

The potential benefits of reducing computation for large power sets include faster data analysis, improved efficiency, and cost savings. By reducing the time and resources needed to perform calculations, researchers and scientists can have more time and resources to focus on other aspects of their work.

4. Are there any challenges associated with reducing computation for large power sets?

Yes, there can be challenges associated with reducing computation for large power sets. Some of these challenges include finding the most effective methods for reducing computation, ensuring accuracy and reliability of results, and managing and organizing large sets of data.

5. How does reducing computation for large power sets impact scientific research?

Reducing computation for large power sets can have a significant impact on scientific research by allowing for faster and more efficient analysis of data. This can lead to quicker discoveries and advancements in various fields of study, such as medicine, engineering, and environmental science.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
961
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
959
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
787
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Programming and Computer Science
Replies
7
Views
475
  • Precalculus Mathematics Homework Help
Replies
3
Views
896
  • Introductory Physics Homework Help
Replies
11
Views
1K
Back
Top