Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Programming Challenge

  1. Nov 29, 2006 #1
    Disclaimer : This thread, and its solution, by no means suggests how to launch a network attack in real life. The purpose of this challenge is to learn how to decompose graphs and analyze them as efficiently as possible. There is no knocking down of network connections, or even speculations as to how to do so. Any solutions to this exercise, and the challenge itself, is not analogous to hacking!​

    Just for fun, I thought I'd post this up here, in case any of you are bored and want some extra practice in whichever language you are learning.

    The reason I posted it here, is because the knowledge required to complete this program is quite limited. All you need to know is recursion, File I/O, and arrays. But! The challenge's difficulty comes from efficiency. I'll explain near the end of this post, how greatly the choice of code will influence the speed.

    Here's the challenge:

    There are up to 26 different network terminals. They are labelled 'A' - 'Z' inclusive. A data file contains the information for how they are wired together. For example, 'AC' means terminal 'A' is wired to terminal 'C'. When two terminals are wired together, data can be transferred in both directions.

    You are a computer hacker, hired by a network company to test the integrity of their network. They only need to ensure that 'A' and 'B' will always be able to transfer data to eachother. They believe that in a worst case scenario, a hacker, like yourself could knock down one connection.

    Given the map to how all of the network terminals are connected, you must decide which one connection should be knocked down, in order to prevent 'A' from transmitting data to 'B'. In some situations, there may be more than one answer, and in some, no answers. Your algorithm must permit any of these possibilities, and output a response accordingly.

    All network terminals, in their original condition, can be reached directly or indirectly through one another. In other words, if terminals 'A', 'B' and 'C' exist, then 'C' will be reachable through 'A', directly or indirectly. However, if the connection between 'A' and 'B' are disabled, then 'A' might not be able to connect to 'C' any more... if a route to 'C' no longer exists through 'A'.

    The map for all of the connections between terminals will be in the following format. Each connection is separated by a newline, and the last line is '**' (since EOF is different between Windows and Linux).

    Code (Text):

    The solution to that data set is 'BC', since without 'BC', 'A' can no longer find a route to 'B'.

    The reason this challenge varies dramatically in programming abilities, is from your choice of algorithm. If you plan on doing a basic recursive algorithm that analyzes what each connection is doing, your program will be O(n^3). But that is fairly simple to accomplish! It is at approximately the level of grade 12 computer science.

    However, if you want to do it the "best" way, in linear time, that requires third-year University Algorithms. Or... in my case, too much time on your hands!!

    I won't tell you guys how to do it that way. I might give hints, but try to figure it out on your own, as I did.

    Post back your thoughts and attempts here, I'll be glad to help anyone out. Remember, this is just for fun!
    Last edited: Nov 29, 2006
  2. jcsd
  3. Nov 29, 2006 #2


    User Avatar
    Science Advisor

    First, what is n? Normally it's the size of the input, so I'll assume n is the number of edges in the graph. However you'll have to allow more than 26 terminals if you want to get an asymptotic complexity greater than 0, since a simple lookup table that stores all possible graphs on 26 vertices would solve the problem in constant time, though the space requirements would be too large. So if you don't mind, I'd like to restate the problem as follows:

    Find an algorithm that, given a graph G = (V, E) and two vertices v1, v2 in V, determines all edges whose removal separates G into two components, one of which contains v1 and the other of which contains v2. This algorithm should be efficient in |E|.

    I'll think about this more later.
  4. Nov 29, 2006 #3
    N being the number of vertices. Where, the best possible setup (discluding start and end points), contains N-1 edges. In which case, the best algorithm will run in O(n) time.

    Don't consider the constrains on N an influence on the time complexity of an algorithm. What happens when we want to run this one billion times? For a linear algorithm, running N=26, you would get the same result as one run where N ~= 26000000000. Sounds like enough for an asymptopic sample to me.

    By the way, don't think about it too much! The contest I got it from gives you about 25 minutes to think about it and program it.
    Last edited: Nov 29, 2006
  5. Nov 29, 2006 #4


    User Avatar
    Science Advisor

    My point about time complexity is this: in the statement of the problem, you stated a constant 26 vertices. That means it can be solved in constant time, regardless of how inefficient your algorithm is.

    Your statement "in the best possible setup, the best algorithm will run in O(n) time" doesn't mean much. Do you mean [tex]\Theta[/tex](n)? And if you do, what is the worst case complexity of your algorithm? It would surprise me if you really got O(n) time since even Dijkstra's algorithm can't be done quite that fast. In fact, it's technically impossible in this case given the presentation of the edges--you have to at least read the edges which is O(|E|) time.

    One efficient algorithm you could do would pare down the graph into its component structure. Starting at A, repeatedly reduce regions down into vertices that are adjacent to all the nodes that the region was adjacent to, and delete pendant vertices. When you're done, you should have a single path from A to B consisting of vertices of degree 2, and the edges still standing in that path are the ones that can be deleted to disconnect A and B.

    I'm not totally sure of the time complexity of that, but I believe it would be O(|E|).
    Last edited: Nov 29, 2006
  6. Nov 29, 2006 #5
    Then go ahead. Build that database. Certainly, you'll have constant time. But how much time will it take to make this database, and how many hard drives to store all of the information?

    Has anyone ever told you that you read too deeply into things?

    I'm not including reading the data file, or the representation. Only the algorithm itself. If you're incrementing the clock on every visit to a node, then the best algorithm's clock will never extend past n, where n is the number of nodes.

    That method seems to go against the algorithm this problem lends itself to, but certainly go for it. I wonder if it's faster than what I was thinking of.
    Last edited: Nov 29, 2006
  7. Nov 29, 2006 #6


    User Avatar
    Science Advisor

    The database was just a preposterous example of how a ridiculously bad algorithm would run in O(1) time here. Any algorithm, including more reasonable ones, will run in O(1) time on this problem if you restrict the number of vertices to 26.

    You increment the clock every time you do some atomic operation. If, for example, each node may connect to as many as n-1 other nodes, then looking at all the edges of that node is an O(n) operation in itself.

    So, what were you thinking of? Do you have a reason not to say your algorithm?
  8. Nov 29, 2006 #7
    Yes, I'm glad we agree that is pretty ridiculous. :approve:

    Considering that, it is [itex]\Theta(n)[/itex]. But the next node can be determined, essentially, in constant time. If you can test to see if a node has been visited, with one operation (a lookup into a hash table/array), and on average a node has only one back-edge (an adjacent node that has already been visited).

    Since it would be uncommon to have a data file where each node has two or more back-edges, and exponentially rarer to have even more back-edges per node, therefore the algorithm asymptopically approaches time-complexity of n. Worst-case scenario, it's doing just under O(n^2).

    That's why I consider it linear, but this all seems to be splitting hairs anyways.

    Two reasons. Firstly, I don't want to influence how someone might approach the problem. You demonstrated this nicely, through how you may have thought of a better algorithm, after I never said the foundation of mine. Now that I think of it, I think your idea might be roughly the same as mine, just attacked in a different manner.

    Secondly, it's a challenge. I don't want to ruin it for anyone by telling the answer, or making it seem more difficult than it could be (it can be done quite easily in cubic time).
    Last edited: Nov 29, 2006
  9. Nov 30, 2006 #8


    User Avatar
    Science Advisor

    You can't consider an algorithm as linear if it does, in the worst case, non-linear time, even if it is linear 99% of the time. For every problem you can formulate there is an algorithm solving that problem in constant time P% of the time.
  10. Nov 30, 2006 #9
    I don't know what your sources are, but the time complexity is always expressed by the average circumstance. For example, Library sort operates in a worst case scenario of O(n^2), but best case O(n). The average, and the time complexity of the algorithm is therefore O(n log n).

    If you want to consider only worst case scenario, the best algorithm for this question will still be less than linear, if n is the number of edges from each node.
  11. Nov 30, 2006 #10


    User Avatar
    Science Advisor

    Average case complexity is often used, but it's hard to calculate and anyway doesn't put a firm upper bound on the time the algorithm can take. Worst case complexity is the most common measure since it is simple in many cases to derive from an algorithm, and provides a clear upper bound on the time. If someone says something is an O(f(n)) algorithm, they are usually referring to worst-case complexity.
  12. Nov 30, 2006 #11
    Common is a relative term. Different ages are taught differently. As it was for me, the time everyone wants to hear is the time that will happen most often.

    Here's some trial times as a bench-mark for you guys to compare against. If anyone ever tries.

    Trials : 1 Million
    Language : C
    Specs : 3.00 GhZ, 512 MB
    Notes :
    • Parsing time was discluded.
    • Variable initiation was included.
    • N is the number of edges per node (both directions).
    0.672 sec, where N=20
    1.078 sec, where N=34
    2.297 sec, where N=72

    Average : 0.03 seconds per N per million trials.​

    You can distinctly see a linear relationship between N and the times.

    Interpolating between (N=20 -> N=34) and (N=34 -> N=72) gives roughly the same time. I should be able to improve it by a few more factors.
    Last edited: Nov 30, 2006
  13. Nov 30, 2006 #12


    User Avatar
    Science Advisor

    The big O notation defines the class of the algorithm, and that's why it's important. It can be misleading because, while an algorithm may run mostly in O(n), an actual algorithm running in O(n) may be impossible to produce.
    If you're not referring to worst-case performance then you shouldn't use big O notation, it's misleading.
    Anyway, getting back on topic. In O(|E|) time we can determine if there is at least one path from A to B. So a naive algorithm involves removing paths one by one and checking if a path from A to B still exists. This runs in O(|E|^2). We can bring it down to O(|V||E|) with a similar approach. The only algorithm i can think of in O(|E|) does not guarantee a correct answer everytime. Orthodontist if you don't mind please post a clearer version of your algorithm, because either i'm misunderstanding or it doesn't work.
    Last edited: Nov 30, 2006
  14. Nov 30, 2006 #13
    Yes, I'm glad you mentioned that. This is the obvious solution that was expected by the challenge. This is how I would expect any student, rightfully so, to accomplish the task. Even though it is brutally slower than the optimum algorithm, it is easy to prove, and very correct! Given limited time, this is the solution that should be chosen.

    This is also why I said the challenge has a great variety in skill levels. Anyone with basic experience in programming should be able to complete it with this "naive algorithm".

    Try drawing a picture; rethink your strategy. You'll see that analyzing every edge with a DFS does tell you everything you need to know about the structure of the graph. Figure out the solution by analyzing some random sample with a DFS, by hand, then convert this analysis into a general algorithm.

    Once you see it, it is quite obvious. However, all keys look alike, but only one will open the right lock. You'll notice that when you start coding, that the first analysis that took place may only have worked for that one situation. You must somehow compliment the property that the order of branching will take place in an arbitrary fashion. I won't say how it ends up, but a very general realisation must take place.

    I hope you take a shot at it. It's a great exercise, and good luck if you do.
    Last edited: Nov 30, 2006
  15. Dec 1, 2006 #14


    User Avatar
    Science Advisor

    Try it with N = 10,000 for a few thousand trials--that should be enough to roughly distinguish between a linear or greater than linear average behavior. So here, N is the number of nodes? So how are you generating your edges--randomly choosing each edge to be present or not present? With what probability do you choose that? What method do you use to enforce the guarantee that A and B must be connected by a path?

    By the way--a DFS is time O(|V| + |E|) which is not linear in the number of nodes. So unless you have a way that will stop short before the full DFS completes, your algorithm is at least linear in the number of edges.
    Last edited: Dec 1, 2006
  16. Dec 1, 2006 #15
    I can't, the most each vertex can have is 25 edges, through 26 different verticies. A maximum N of 676. If I were to make the node labels extend past the alphabet, then I could do that, but I don't have any samples that are this complicated.

    I said under the "Notes" that N was the number of edges for each node (both directions).

    I'm using three random maps that have been divised to test this problem.
  17. Dec 1, 2006 #16


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    I'm going to have to echo the criticism; your presentation really is quite imprecise. :frown:

    A correction for Orthodontist: for this problem, we only need to search the component of the graph containing one of the target nodes. Thus, a depth first search runs in O(E) time.

    My first thought, which runs in O(E) time, is essentially the minimal spanning tree algorithm. (a.k.a. a cycle-finding algorithm) I'll describe a DFS version, because it's technically simpler. (because our working set is always connected)

    You iterate through the edges, arranging them into a rooted tree. A given edge would break the tree property if and only if both endpoints already appear in our tree.

    In that case, the edge we just found must be part of a cycle. We walk up the tree to determine all of the vertices in the cycle, and then we merge all of those nodes into one single node. (It is impossible to separate any of these vertices by knocking out a network connected, so we can treat them as one big vertex)

    When this algorithm completes, we have either

    (1) Identified our two target nodes, thus indicating that they lie in a cycle and cannot be separated.

    (2) Discovered the unique path between our target nodes (by starting at each and walking up the tree), and have thus found the desired set of edges.

    (note: I'm assuming we can compute hash codes for nodes, and test them for equality, in O(1) time. Realistically, these operations should take [itex]\Omega(ln(V))[/itex] time.)
    Last edited: Dec 1, 2006
  18. Dec 1, 2006 #17
    What's difficult to understand, or what doesn't make sense?

    Oh no! You got it! Merge was exactly the word I was looking for. One of the functions in my code is called "mergeBranches", who's job it is to merge two nodes together.

    For anyone who was still trying to figure it out, pretend you didn't read that.
  19. Dec 1, 2006 #18


    User Avatar
    Science Advisor

    That's basically what I meant in my second post, though you are more detailed and precise.

    Edit: also you don't need to use a hash table for node equality, you can just number all of the nodes and use a bit array to indicate those you have visited, which really is O(1).
    Last edited: Dec 1, 2006
  20. Dec 1, 2006 #19


    User Avatar
    Science Advisor

    Ah yes, that does work, i should have given it a little more thought.
  21. Dec 2, 2006 #20


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    It still takes [itex]\Omega(\mathop{\text{lg}} V)[/itex] time to write down an index into the array. :tongue: (And unless the vertices come pre-numbered, you still have to do |V| hashcode computations and at least |V| equality comparisons!)

    Of course, we are getting into the range where such pedantry outlives its practical usefulness, because for any realistic problem, indices are going to fit into one computer word.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook