Need help designing an algorithm for similarities between words

AI Thread Summary
The discussion revolves around creating a C++ program that generates word suggestions based on user input, similar to predictive text features in SMS applications. The developer has compiled a list of 5000 common English words and created a mapping of pairwise distances between keys on a standard keyboard to account for typing errors. The goal is to compare the user-inputted word against this list using an algorithm that incorporates the keyboard distance map to identify potential misspellings. Key challenges include determining how to handle missing or extra letters in the input. Suggestions include utilizing the Levenshtein distance with weighted character substitutions and exploring a simple distance formula that sums the distances between corresponding letters in words of the same length. However, the developer is seeking a more comprehensive approach to address these issues effectively.
Jamin2112
Messages
973
Reaction score
12
For fun, I'm trying to make a C++ program that takes a word from the user and comes up with an ordered list of suggested words, similar to the kind of thing you have on your cell phone when sending SMS messages.

So far I have:
  • An std::vector<std::string> of the 5000 most common English words
    [*]An std::map<std::pair<char,char>,int> of the pairwise distances of keys on a standard computer keyboard. For instance, (A,A) → 0, (A,Q) → 1, (W,C) → 3, (Z,M) → 6.


The idea is that when the user types in a word, it is checked against every one of the 5000 most common English words using some algorithm that uses my map. That map is supposed to help detect if a user hit a wrong key or two when typing in a word. Any ideas?
 
Technology news on Phys.org
What do you do with missing or additional letters?

The Levenshtein distance could be interesting, you just have to apply weights to the character substitutions in some way.
 
mfb said:
What do you do with missing or additional letters?

That's my problem -- I don't know!

An easy distance formula for comparing words of the same length is to sum up the distances between letters at corresponding indices. For instance, ("rhe","the")→1+0+0=1, so "the" is going to be one of the top suggestions. But yea, I'm having trouble thinking of a general formula.
 
Thread 'Is this public key encryption?'
I've tried to intuit public key encryption but never quite managed. But this seems to wrap it up in a bow. This seems to be a very elegant way of transmitting a message publicly that only the sender and receiver can decipher. Is this how PKE works? No, it cant be. In the above case, the requester knows the target's "secret" key - because they have his ID, and therefore knows his birthdate.
I tried a web search "the loss of programming ", and found an article saying that all aspects of writing, developing, and testing software programs will one day all be handled through artificial intelligence. One must wonder then, who is responsible. WHO is responsible for any problems, bugs, deficiencies, or whatever malfunctions which the programs make their users endure? Things may work wrong however the "wrong" happens. AI needs to fix the problems for the users. Any way to...
Back
Top