Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

I've reached a dead end in my algorithm

  1. Mar 25, 2014 #1
    I'm trying to write a function that takes as parameters two strings, textStart and textTarget, and prints out the steps necessary to transform transform the first into the second, with the allowed steps being:

    (1) Add a character somewhere in textStart
    (2) Remove a character somewhere textStart

    I've reached the point in the algorithm where I've made maps for the characters that need removed and added. For instance, if textStart="dude" and textTarget="deck", then my procedure gives needAdded['c']=1, needAdded['k']=1, needRemoved['u']=1, and needRemoved['d']=1.

    Now I need to use this information to figure out the proper removal and insertion steps. Let's deal with the removal steps first.

    In the example above, 1 'd' needs removed -- the 2nd one of 2 'd's. But how can I design an algorithm to figure out which of the 2 'd's to remove?

    For now, I'm just gonna go ahead and write an algorithm that removes the first occurrences of letters that need removed, and adds at the beginning the letters that need added, before finally sorting everything.

    (I understand the way of doing this with a tree and branch-and-bound algorithm, but I'm not using it. Maybe some other time.)
     
  2. jcsd
  3. Mar 25, 2014 #2

    jbunniii

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    In addition to the considerations you mentioned, what if you need to add or remove more than one instance of a letter? E.g. to transform "dude" to "cake", you will need to remove both 'd's. How do you represent that? Is it needRemoved['d'] = 2?

    Also, how do you represent a situation where the letters are all OK but need to be rearranged? E.g. transform "meat" to "team".

    Finally, you need a better specification of what you are trying to achieve. One way to transform "dude" to "cake" is to remove 'd', 'u', 'd', 'e', and add 'c', 'a', 'k', 'e'. If that is not an acceptable solution, then define what is acceptable.
     
  4. Mar 25, 2014 #3
     
    Last edited: Mar 25, 2014
  5. Mar 25, 2014 #4
    How are these items stored?

    The easiest solution would be to set these up as a matrix so that a1=d, a2=u, a3=d, a4=e
    Then, you can set your target as a matrix: b1=d, b2=e, b3=c, b4=k

    If b1=a1, then leave a1 alone and print "current word: a1 a2 a3 a4", else set a1 = b1 and print "substitute (value)a1 with (value)b1" then print, "current word: a1 a2 a3 a4"
    b1=a1, therefore print "current word: a1 a2 a3 a4" (at this point, it will still read D U D E"

    If b2=a2, then leave a2 alone and print "current word: a1 a2 a3 a4", else set a2 = b2 and print "substitute (value)a2 with (value)b2" then print, "current word: a1 a2 a3 a4"
    b2 =/= a2, therefore set a2 = b2 and print "substitute (value)a2 with (value)b2" then print, "current word: a1 a2 a3 a4"
    (at this point the word will be "D E D E"

    and so on
     
  6. Mar 25, 2014 #5
    Right, but if the words are of different length then some adding and/or subtracting of characters has to take place at some point.
     
  7. Mar 25, 2014 #6
    Ok, well I don't know what programming language you are using (I probably couldn't help with the specific syntax if I did) but the way I see it would be:

    You still use the matrix idea, but you make a base matrix of, say, 20 characters. a1, a2,...a20.
    Set the values of each node to their respective letters and set the others to zero.
    run the algorithm as normal and have in the loop a call out that says, "if An=Bn=0, stop the loop and print the resulting word
     
  8. Mar 26, 2014 #7
    Part of the problem is that, if the letters aren't in the same order, they don't count.

    Here's my shot at it.
    For example:
    A: "different"
    B: "traffic"

    1) Create four "Remnant lists" Ar, Br, As, and Bs. Each element in these lists is able to hold a letter and a position.
    2) Create a "Match list" M. Each element in this list is able to carry a letter and two letter positions.
    3) Copy the letter and its position of every letter in word A that matches a letter in word B into Ar.

    Result: Ar: i2 f3 f4 r6 t9

    4) Copy the letter and its position of every letter in word B that matches a letter in word A into Br.

    Result: Br: t1 r2 f3 f4 i5

    5) Create two indices iA and iB, one into "Ar" and one into "Br" and set both of them to 1 - representing the first letter in each list.
    6) Copy Ar[iA++] to the end of As (since As is empty, it will be the only element).

    Current State:
    iA:2; iB: 1; As: i2, Bs: empty
    Note that iA is 20% used and iB is 0% used.

    7) Judging from (iA-1)/sizeof(Ar) and (iB-1)/sizeof(Br), pick the list that is least used and copy the next character and position into the corresponding As or Bs list. If both are equally used, the choice is arbitrary - so move from Ar to As.

    On the first time through, the state will be:
    iA:2; iB: 2; As: i2, Bs: t1
    Note that now both iA and iB are 20% used.

    8) Compare the letter just moved in step 7, to each of the letters in the other As or Bs list.
    9) If no match is found, and either Ar or Br still has unused letters, loop to step 7.
    10) If a match is found, copy the information from the match from As and Bs to M.

    On the first time through, the state will be:
    Ar: i2 f3 f4 r6 t9
    Br: t1 r2 f3 f4 i5
    iA:4; iB: 4
    As: i2 f3 f4
    Bs: t1 r2 f3
    M: (f,3,3)

    11) Now remove the matching letters and all letters that preceded them from both As and Bs.

    On the first time through, the state will be:
    As: f4
    Bs: empty
    M: (f,3,3)

    12) If there are still unused characters in Ar or Br, loop to step 7.

    So, with "different" and "traffic" you get:
    Ar: i2 f3 f4 r6 t9
    Br: t1 r2 f3 f4 i5
    Which is processed into:
    M: (f,3,3), (f,4,4)

    Although there are other matches, they are too far out of order to be used.

    Since you're only keeping three letter from the original work, your first steps will be to remove letter. I would remove them from last to first:
    remove #9: differen
    remove #8: differe
    remove #7: differ
    remove #6: diffe
    remove #5: diff
    remove #2: dff
    remove #1: ff

    Now you would just build up the word "traffic":
    add "t" to 1: tff
    add "r" to 2: trff
    add "a" to 3: traff
    add "i" to 6: traffi
    add "c" to 7: traffic
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook