Need help designing an algorithm for similarities between words

In summary, the conversation discusses the creation of a C++ program that generates a list of suggested words based on a user-inputted word. The program uses an std::vector<std::string> of the 5000 most common English words and an std::map<std::pair<char,char>,int> of pairwise distances on a standard computer keyboard to check for potential typos or errors. The use of the Levenshtein distance and a general formula for comparing words of the same length are also mentioned as potential ideas for improving the program. The issue of handling missing or additional letters in the user's input is also brought up.
  • #1
Jamin2112
986
12
For fun, I'm trying to make a C++ program that takes a word from the user and comes up with an ordered list of suggested words, similar to the kind of thing you have on your cell phone when sending SMS messages.

So far I have:
  • An std::vector<std::string> of the 5000 most common English words
  • An std::map<std::pair<char,char>,int> of the pairwise distances of keys on a standard computer keyboard. For instance, (A,A) → 0, (A,Q) → 1, (W,C) → 3, (Z,M) → 6.

The idea is that when the user types in a word, it is checked against every one of the 5000 most common English words using some algorithm that uses my map. That map is supposed to help detect if a user hit a wrong key or two when typing in a word. Any ideas?
 
Technology news on Phys.org
  • #2
What do you do with missing or additional letters?

The Levenshtein distance could be interesting, you just have to apply weights to the character substitutions in some way.
 
  • #3
mfb said:
What do you do with missing or additional letters?

That's my problem -- I don't know!

An easy distance formula for comparing words of the same length is to sum up the distances between letters at corresponding indices. For instance, ("rhe","the")→1+0+0=1, so "the" is going to be one of the top suggestions. But yea, I'm having trouble thinking of a general formula.
 

1. What is an algorithm?

An algorithm is a step-by-step procedure or formula used to solve a problem or complete a task. In the context of comparing similarities between words, an algorithm would be a set of instructions for determining the level of similarity between two words.

2. How is an algorithm for similarities between words designed?

An algorithm for similarities between words is designed by breaking down the problem into smaller, more manageable steps and then coming up with a logical sequence of instructions to perform those steps. This process involves analyzing the problem, identifying the key variables and relationships, and then translating those into a series of steps that a computer can follow.

3. What factors should be considered when designing an algorithm for similarities between words?

When designing an algorithm for similarities between words, it is important to consider factors such as the definition of "similarity", the type of data being compared (e.g. words, phrases, sentences), the size of the dataset, and the desired level of accuracy. Additionally, the algorithm should be designed to handle different types of input and account for potential errors or exceptions.

4. Are there different types of algorithms for comparing similarities between words?

Yes, there are different types of algorithms for comparing similarities between words. Some common approaches include cosine similarity, Jaccard similarity, and Levenshtein distance. Each of these algorithms uses a different method for measuring similarity and may be more suitable for certain types of data or applications.

5. How can I evaluate the effectiveness of my algorithm for similarities between words?

The effectiveness of an algorithm for similarities between words can be evaluated by testing it on a variety of different datasets and comparing the results to a known standard. Additionally, you can also evaluate the efficiency of the algorithm by measuring the time and resources it takes to complete the task. It is important to continuously assess and refine the algorithm to improve its performance.

Similar threads

  • Programming and Computer Science
3
Replies
75
Views
4K
  • Programming and Computer Science
Replies
3
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
8
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
5
Views
2K
  • DIY Projects
Replies
8
Views
245
  • Programming and Computer Science
Replies
3
Views
318
  • Engineering and Comp Sci Homework Help
Replies
17
Views
5K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
626
  • Programming and Computer Science
Replies
1
Views
2K
  • Programming and Computer Science
Replies
29
Views
3K
Back
Top