I Find Smallest Subtree of G w/ Nodes from S

  • I
  • Thread starter Thread starter Sneaky6666
  • Start date Start date
  • Tags Tags
    Graph Graph theory
AI Thread Summary
The discussion revolves around finding the smallest subtree in a directed graph G that contains all nodes from a subset S. The Chu-Liu/Edmonds algorithm is mentioned but deemed unsuitable for this specific subset problem. The proposed approach involves using BFS to determine the shortest paths from a starting node in S, storing distances and references to construct the subtree. The complexity of the problem is acknowledged, with suggestions for handling unavoidable paths and independent target nodes through recursive methods. The need for ideal solutions is emphasized, and the potential computational challenges are noted, indicating that no guarantees exist regarding processing time due to the problem's NP-completeness.
Sneaky6666
Messages
14
Reaction score
0
I have a unique problem and looking for a known fast algorithm for it. I have an unweighted but directed graph `G`. I then have a subset collection of nodes `S` existing in `G`. I want to find the minimum sub tree in `G` such that it contains all the nodes in `S`.

I so far found Chu-Liu/Edmonds algorithm but it is for finding the MST for all nodes in the graph.

Does anyone know how to do this for a subset of nodes?

Example
Untitled.png

If the picture is G. And S is the node set {1, 6}. Then the solution (smallest sub tree) is the subset graph of {1, 3, 4, 6} since it includes the nodes of S. It wouldn't be {1, 3, 2, 5, 6} (even though it includes the nodes of S) because that is larger.

Thanks
 
Last edited:
Mathematics news on Phys.org
The ideal solution to this would be very similar to the traveling salesman problem - which is NP complete.
I very much suspect this is also NP complete.

My exact attack on this problem would depend very much on other specifics such as:
Is only an ideal solution good enough?
What is the cost of non-ideal solutions?
What is the range of node set sizes you will be working with?
What kind of processor power is available and how quickly do you need an answer?
 
I am not familiar with Chu-Liu/Edmonds, but if you start with the nodes M=S is it then not enough to "simply" find shortest path (Dijkstra's algorithm) between each node in M that ends with a different node in M (possibly handling cycles if S is not a tree) and then add those to M?

1 min later: well, I guess its possible to make an S that such that the set M as constructed above will not be the minimal solution. Right, disregard my question please.
 
I was thinking about this, and what I had in mind is first we pick a random node in S, and do a BFS traversal from that node, basically storing 2 pieces of data in every node. First is the smallest distance from the current node to the starting node, and a reference of the previous node to the current node (for the shortest distance path).
So while doing the BFS, if it checks a node and sees the distance for it is smaller than a previous calculated distance, then it overwrites the data stored on that node with the better one.

In the end, every node knows its shortest distance, and a reference to the previous node, which allows a path to form from the current node to the starting node. If we take all the paths and combine them, we can construct a sub tree.

What do you all think of this?
 
.Scott said:
The ideal solution to this would be very similar to the traveling salesman problem - which is NP complete.
I very much suspect this is also NP complete.

My exact attack on this problem would depend very much on other specifics such as:
Is only an ideal solution good enough?
What is the cost of non-ideal solutions?
What is the range of node set sizes you will be working with?
What kind of processor power is available and how quickly do you need an answer?
I think only ideal, exact solutions are enough. Range of S, could be anything, from 1 node to all nodes in G. We can take our time doing any calculations, even if its 5 seconds and have a very strong processor.
 
I would start by identifying all "unavoidable" paths - the ones that, if eliminated, would result in some of the target nodes being isolated from the home node.
Then, each of those unavoidable paths becomes a separate problem (think recursion) and all the target nodes from that unavoidable path can be treated in the parent problem as a single "target node".

Once you are left with a problem (either the original problem or a child problem from the recursion) with no "unavoidable paths", then you need to attempt to find "independent" target nodes or groups of target nodes, node groups which cannot share a path home with any node outside of the group. Those independent groups can be identified by listing all node path segments then following all paths home for each target node (think tree search) and adding the target node id to the list of nodes for each path segment used by that node. So for every node path segment (the line between adjacent nodes), you have a list of target nodes that might find that path segment useful. Then, starting at the home position, work out through the network and look for nodes that have down-tree path segments (DTPSs) (or sets of DTPSs ) with no target node overlaps. The most straight forward way of finding such groups is brute force. If there are 7 DTPSs, go through values 1 to 126 expressed in binary - each of the 7 bits representing whether the corresponding DTPS is part of candidate group 0 or candidate group 1. Then tally the target node coverage for each DTPS group - if there is clean split of the target nodes, then they are independent and can be treated as separate problems (recursion). Once a group split has occurred, each of the two smaller groups should also be tested to see if there aren't three or more independent DTPS groups.

That DTSP group processing can be done in concert with the "unavoidable path" checking. Depending of the characteristics of you problem set, identifying the "unavoidable" before the "independents" may be more or less optimal.

Once you have a problem with no "unavoidables" or "independents", you're probably at the NP point. The most brute force method would simply be to do elimination trials. Pick a path to eliminate then attempt to solve the problem without it. Since all "unavoidables" have already been sidelined, at least one solution will be found. Note its score, restore it, and disable the next one. This scoring process will also be implemented recursively. It will be be similar to tree search where you are looking for the best score.

So this will give you only ideal solutions.
Since you have placed no size limits on S or G and no topology limits on G, there is no guarentee that your "strong processor" will complete this task within 5 minutes or 5 centuries.
 
Thread 'Video on imaginary numbers and some queries'
Hi, I was watching the following video. I found some points confusing. Could you please help me to understand the gaps? Thanks, in advance! Question 1: Around 4:22, the video says the following. So for those mathematicians, negative numbers didn't exist. You could subtract, that is find the difference between two positive quantities, but you couldn't have a negative answer or negative coefficients. Mathematicians were so averse to negative numbers that there was no single quadratic...
Thread 'Unit Circle Double Angle Derivations'
Here I made a terrible mistake of assuming this to be an equilateral triangle and set 2sinx=1 => x=pi/6. Although this did derive the double angle formulas it also led into a terrible mess trying to find all the combinations of sides. I must have been tired and just assumed 6x=180 and 2sinx=1. By that time, I was so mindset that I nearly scolded a person for even saying 90-x. I wonder if this is a case of biased observation that seeks to dis credit me like Jesus of Nazareth since in reality...
Thread 'Imaginary Pythagoras'
I posted this in the Lame Math thread, but it's got me thinking. Is there any validity to this? Or is it really just a mathematical trick? Naively, I see that i2 + plus 12 does equal zero2. But does this have a meaning? I know one can treat the imaginary number line as just another axis like the reals, but does that mean this does represent a triangle in the complex plane with a hypotenuse of length zero? Ibix offered a rendering of the diagram using what I assume is matrix* notation...
Back
Top