Unsolvable python code bug? (finding the difference between two input strings)

  • #1
Sucvicbil
5
0
TL;DR Summary
Finding differences between two bodies of text, difference output in order
All day I have been trying to find a method of finding the difference between two input (body of text), and get the difference but in order. I have tried breaking down the text into separate lines, then try to find the difference between both lines, but somehow this doesn't solve the problem and the code seems to keep outputting the difference but in the wrong order, can anybody help? this is my current code, it keeps outputting the difference but in the wrong order:
Python:
def compare_word_groups(group1, group2):
    words1 = group1.split()
    words2 = group2.split()

    removed_words = []
    i = j = 0
    while i < len(words1) and j < len(words2):
        if words1[ i].lower() == words2[j].lower():
            i += 1
            j += 1
        else:
            removed_words.append(words1[ i])
            i += 1
   
    # Add any remaining words from group1
    removed_words.extend(words1[ i:])

    print(f"[{', '.join(removed_words)}]")

Example usage

group1 = """should have known, like it was a red flag because for one he was asking to use my car. what's up y'all? welcome back to my channel. it's payden, and today i'm back with another video. today's video is going to be a very long overdue gardening wing stop mukbang. I've been meaning to do another Wingstop Mukbang for a long time. y'all, Something just told me: let me come and give y'all a story time while I eat some Wingstop. Today's video is going to be my story time on my worst situation shift, literally ever in my entire life. Y'all, I had to come and tell y'all this story time just because I went through so much and like y'all might already know what this video is going to be about, who this video is going to be about, i'm not going to say any names, but- and i would prefer if no one comments any names or mentions any names in the comment section- got this wing stop. i got my wing stop order. i'm so freaking hungry right now. y'all, it's about to be two in the morning, but i didn't let that stop me from coming and making this video for y'all. i did not want to do this video because the person who was involved with this story time is actually going to be really happy that i did this. this person is very happy when it comes to getting attention and throughout our relationship- which i'll explain a little bit more about the relationship that we had- i could just tell that they were trying to use me. i'm just going to tell you guys all the crazy stuff that happened just as quick as it started. it ended quickly as well. i'm also going to be getting lit in today's video. okay, i have this little pre-roll. let's pop my little pre-roll off camera. make sure you guys are subscribed on patreon. I don't want them to age restrict this video. They might age restrict it anyway. I always see you guys in the comments asking why I have to like bleep everything out and do all this extra stuff. It's basically just because YouTube is really strict. you guys Sorry that I can't show everything. If you guys want to, y'all can subscribe to Patreon if you can. If y'all want to see the extended version, just go on Patreon. Let's go ahead and get started. And by the time I come back, the food is going to be warmed up, And I warmed all my food up. Now let me tell you guys what I ordered. I ordered a lot of Parmesan cinders and honey hot tenders. i've never tried honey hot before. i actually hate trying new things, but i decided to try something new today. i'm mad that i even tried honey hot, because i didn't know it was dry rub. i thought it was going to be wet. if i would have not, i would have just got buffalo. this is buffalo and garlic parmesan- all flats, extra fried, crispy bone and wings. this is the honey hot and garlic parmesan tenders. and then i got some voodoo fries, but they didn't put no rant. i'ma put the ranch. they honestly pissed me off when i went to go and get this. and of course, y'all can't go wrong with the ranch and the honey mustard. i'm definitely not. as so y'all this story time. it goes back to like last year."""

group2 = """should have known, like it was a red flag because for one he was asking to use my car. it's payden, and today i'm back with another video. today's video is going to be a very long overdue gardening wing stop mukbang. I've been meaning to do another Wingstop Mukbang for a long time. y'all, Something just told me: let me come and give y'all a story time while I eat some Wingstop. Today's video is going to be my story time on my worst situation shift, literally ever in my entire life. Y'all, I had to come and tell y'all this story time just because I went through so much and like y'all might already know what this video is going to be about, who this video is going to be about, i'm not going to say any names, but- and i would prefer if no one comments any names or mentions any names in the comment section- got this wing stop. y'all, it's about to be two in the morning, but i didn't let that stop me from coming and making this video for y'all. i did not want to do this video because the person who was involved with this story time is actually going to be really happy that i did this. this person is very happy when it comes to getting attention and throughout our relationship- which i'll explain a little bit more about the relationship that we had- i could just tell that they were trying to use me. i'm just going to tell you guys all the crazy stuff that happened just as quick as it started. i'm also going to be getting lit in today's video. let's pop my little pre-roll off camera. They might age restrict it anyway. I always see you guys in the comments asking why I have to like bleep everything out and do all this extra stuff. It's basically just because YouTube is really strict. you guys Sorry that I can't show everything. If y'all want to see the extended version, just go on Patreon. Now let me tell you guys what I ordered. i actually hate trying new things, but i decided to try something new today. i thought it was going to be wet. this is the honey hot and garlic parmesan tenders. and then i got some voodoo fries, but they didn't put no rant. they honestly pissed me off when i went to go and get this. and of course, y'all can't go wrong with the ranch and the honey mustard. i'm definitely not. as so y'all this story time. it goes back to like last year."""

compare_word_groups(group1, group2)
 
Last edited by a moderator:
Technology news on Phys.org
  • #2
This is an exciting and somewhat complex subject.

Have you used the Linux diff command, i.e., placed the two texts in separate files and used the diff command to compare them?

https://en.wikipedia.org/wiki/Diff

The diff command uses the Hunt-Szymanski algorithm:

https://en.wikipedia.org/wiki/Hunt–Szymanski_algorithm

One common usage is in visual diff displays, as can be found in some programmer IDE tools. These tools display line-by-line differences, showing lines in one file, not in the other, or vice-versa.

---

I implemented something like this many years ago using two queues, one for each file. I would read the lines into their respective queues and compare the A and B queues, looking for two lines that matched.

I used a two-line comparison for alignment as a single-line comparison came up with less meaningful differences.

On no match, I would read a line in from each file and add to its respective queue.

On a match of two lines, I would print the non-matching lines of both queues and remove them from the queues.

It's akin to a left-lookahead scheme for diffing.

---

One thing that can make for a more helpful diff command is discovering when people have replicated a few lines repeatedly in the second text, as happens when using a text editor.

You might decide to add logging statements to some code and so you enter one line, copy it and then paste in several spots. To the diff algorithm they are just inserted lines not present in the older file. However with an improved diff, you could mark the lines as copies of a common line.
 
  • Like
Likes Swamp Thing and berkeman
  • #3
Don’t forget you don’t need Linux, difflib is a standard Python3 library.


The reason your algorithm won’t work is consider s1=“I threw the book at the dumb cat” s2= “I threw the book (about the boy at play with the childless cat lady) at the dumb cat” notice how your algorithm will match things inside the parens to outside in the other string. That’s why the longest matching substrings matter. It’s a super interesting space though, with ties to how search engines index, and LLMs if you go deep enough. If you want to play with it, a useful data structure is two linked lists in a tree. The trees will serve as indexes for each word, recording their instances of appearance. The linked list will tell what words come after or before with a word instance. Your job is to make a one to one map. So in the example above, “at” will occur once in s1, twice in s2. You can then see that the first occurrence in s2 is followed by “play” unlike in s1, but the second is followed by “the cat” like in s1, so that’s the one that should be mapped to. The thing is everything having to do with this has applications in a million other places.
 
  • Like
Likes PhDeezNutz and jedishrfu

Similar threads

Back
Top