Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

I need to write a function that reads words from a text file and

  1. Apr 15, 2007 #1
    I need to write a function that reads words from a text file and writes the words to an output tex file while removing all duplicate words.



    void remove_dup(string filename)

    {
    vector <string> a;
    string word;
    ifstream infile(filename.c_str());
    if(infile.fail())
    exit(0);
    while(!infile.eof(0))
    {
    infile >> word;

    I'm stuck.

    Should I use if statement?

    if( a.at(i) == a.at(i) )
    a.erase(i,i); ?
    else

    {
     
  2. jcsd
  3. Apr 19, 2007 #2

    mezarashi

    User Avatar
    Homework Helper

    Dealing with text is always a complicated task in programming especially for beginners. How are the words formatted in the text file? Are they separated by a delimiter of sorts (e.g. comma, semi-colon, etc)? Or is it one word per line?

    So you need to make sure that you are indeed getting your words into the vector.

    A straight-forward algorithm would be, using my own rough pseudo-code:

    Code (Text):

    WhileNot (EndofFile)
      TempWord = GetNextWordFromTextFile()
      ForEach (Word in Vector)
        If TempWord == Word[x] in Vector
          DuplicateWordTest = true
          BreakForLoop
        EndIf
      EndFor

      If (DuplicateWordTest = false)
        Vector.AddWord(TempWord)
      EndIf
    EndWhile
     
    Now you will have an array that is free from any duplicates.
     
  4. Apr 19, 2007 #3

    -Job-

    User Avatar
    Science Advisor

    Instead of a vector you should use a Hashtable, or dictionary, this will give better performance because for each word you don't have to check every previous word. For example, in C#:
    Code (Text):

    string[] words = text.Split(new char[]{ " "});
    Dictionary<string, bool> index = new Dictionary<string, bool>(words.Length);

    foreach(string word in words){
        if(index.ContainsKey(word)) Console.Write(" " + word);
        else index[word] = true;
    }
     
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?