- #1
- 14,981
- 26
I have two very long strings of length s and t. I want to find every k-byte substring that occurs in both of them. What good ways are there to do that?
My current algorithm is to store each k-byte substring of the first in a hash table, then look up every k-byte substring of the second. A reasonable algorithm, but the O(k*s) memory requirement is getting to be bothersome.
On the other hand, I simply cannot afford the O(k*s*t) performance of the naive method.
For some reason, I'm sure there has to be a good algorithm for doing exactly this, but I can't remember it.
Now, I have thought that I might be able to build a finite state machine that recognizes k-byte substrings of s, and will probably only take up O(s) memory, and might even be faster than the hash table. But, I haven't worked out a way to build the state machine. (It defeats my purposes if I need more than O(s) memory to build the state machine!)
I'm sorry if any of these are easy questions... I'm fairly exhausted today. =)
My current algorithm is to store each k-byte substring of the first in a hash table, then look up every k-byte substring of the second. A reasonable algorithm, but the O(k*s) memory requirement is getting to be bothersome.
On the other hand, I simply cannot afford the O(k*s*t) performance of the naive method.
For some reason, I'm sure there has to be a good algorithm for doing exactly this, but I can't remember it.
Now, I have thought that I might be able to build a finite state machine that recognizes k-byte substrings of s, and will probably only take up O(s) memory, and might even be faster than the hash table. But, I haven't worked out a way to build the state machine. (It defeats my purposes if I need more than O(s) memory to build the state machine!)
I'm sorry if any of these are easy questions... I'm fairly exhausted today. =)