Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Need help designing an algorithm for similarities between words

  1. Feb 25, 2014 #1
    For fun, I'm trying to make a C++ program that takes a word from the user and comes up with an ordered list of suggested words, similar to the kind of thing you have on your cell phone when sending SMS messages.

    So far I have:
    • An std::vector<std::string> of the 5000 most common English words
    • An std::map<std::pair<char,char>,int> of the pairwise distances of keys on a standard computer keyboard. For instance, (A,A) → 0, (A,Q) → 1, (W,C) → 3, (Z,M) → 6.

    The idea is that when the user types in a word, it is checked against every one of the 5000 most common English words using some algorithm that uses my map. That map is supposed to help detect if a user hit a wrong key or two when typing in a word. Any ideas?
  2. jcsd
  3. Feb 25, 2014 #2


    User Avatar
    2017 Award

    Staff: Mentor

    What do you do with missing or additional letters?

    The Levenshtein distance could be interesting, you just have to apply weights to the character substitutions in some way.
  4. Feb 25, 2014 #3
    That's my problem -- I don't know!

    An easy distance formula for comparing words of the same length is to sum up the distances between letters at corresponding indices. For instance, ("rhe","the")→1+0+0=1, so "the" is going to be one of the top suggestions. But yea, I'm having trouble thinking of a general formula.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook