I need to write a function that reads words from a text file and

AI Thread Summary
To create a function that reads words from a text file and writes them to an output file while removing duplicates, it's essential to first ensure the words are correctly extracted from the input file. The discussion emphasizes the importance of understanding the format of the text file, such as whether words are separated by spaces, commas, or are listed one per line. A straightforward approach involves using a loop to read each word and checking against previously stored words to identify duplicates. Using a vector is one option, but it may be inefficient due to the need to compare each new word with all existing words. Instead, utilizing a Hashtable or dictionary can significantly enhance performance, allowing for faster lookups to determine if a word has already been encountered. This method involves splitting the text into an array of words and then iterating through them while maintaining a dictionary to track unique entries. This approach ensures that the final output contains only unique words, effectively addressing the problem of duplicate removal in text processing.
Nusc
Messages
752
Reaction score
2
I need to write a function that reads words from a text file and writes the words to an output tex file while removing all duplicate words.



void remove_dup(string filename)

{
vector <string> a;
string word;
ifstream infile(filename.c_str());
if(infile.fail())
exit(0);
while(!infile.eof(0))
{
infile >> word;

I'm stuck.

Should I use if statement?

if( a.at(i) == a.at(i) )
a.erase(i,i); ?
else

{
 
Technology news on Phys.org
Dealing with text is always a complicated task in programming especially for beginners. How are the words formatted in the text file? Are they separated by a delimiter of sorts (e.g. comma, semi-colon, etc)? Or is it one word per line?

So you need to make sure that you are indeed getting your words into the vector.

A straight-forward algorithm would be, using my own rough pseudo-code:

Code:
WhileNot (EndofFile)
  TempWord = GetNextWordFromTextFile()
  ForEach (Word in Vector)
    If TempWord == Word[x] in Vector
      DuplicateWordTest = true
      BreakForLoop
    EndIf
  EndFor

  If (DuplicateWordTest = false)
    Vector.AddWord(TempWord)
  EndIf
EndWhile

Now you will have an array that is free from any duplicates.
 
Instead of a vector you should use a Hashtable, or dictionary, this will give better performance because for each word you don't have to check every previous word. For example, in C#:
Code:
string[] words = text.Split(new char[]{ " "});
Dictionary<string, bool> index = new Dictionary<string, bool>(words.Length);

foreach(string word in words){
    if(index.ContainsKey(word)) Console.Write(" " + word);
    else index[word] = true;
}
 
Thread 'Is this public key encryption?'
I've tried to intuit public key encryption but never quite managed. But this seems to wrap it up in a bow. This seems to be a very elegant way of transmitting a message publicly that only the sender and receiver can decipher. Is this how PKE works? No, it cant be. In the above case, the requester knows the target's "secret" key - because they have his ID, and therefore knows his birthdate.
Thread 'Project Documentation'
Trying to package up a small bank account manager project that I have been tempering on for a while. One that is certainly worth something to me. Although I have created methods to whip up quick documents with all fields and properties. I would like something better to reference in order to express the mechanical functions. It is unclear to me about any standardized format for code documentation that exists. I have tried object orientated diagrams with shapes to try and express the...

Similar threads

Replies
16
Views
4K
Replies
33
Views
5K
Replies
2
Views
2K
Replies
1
Views
2K
Replies
15
Views
3K
Replies
3
Views
17K
Replies
21
Views
5K
Back
Top