C program to mimic wc command in UNIX

  • Thread starter Thread starter Gagan A
  • Start date Start date
  • Tags Tags
    Program Unix
Click For Summary
SUMMARY

The forum discussion centers on a C program designed to mimic the UNIX 'wc' command for counting characters, lines, and words in a text file. The initial implementation fails to accurately count words due to not accounting for multiple whitespace characters, such as tabs and carriage returns. The suggested solution involves using the isspace() function from ctype.h to correctly identify word delimiters. Additionally, a specific test case illustrates the discrepancy in word counting between the custom program and the actual 'wc' command.

PREREQUISITES
  • Understanding of C programming language
  • Familiarity with file handling in C
  • Knowledge of character classification functions in ctype.h
  • Basic concepts of word counting algorithms
NEXT STEPS
  • Implement the isspace() function in the C program to improve word counting accuracy
  • Explore additional test cases to validate the program against various whitespace scenarios
  • Study the UNIX 'wc' command to understand its word counting logic
  • Learn about other character classification functions in ctype.h for enhanced text processing
USEFUL FOR

C programmers, software developers, and anyone interested in text processing and command-line utilities will benefit from this discussion.

Gagan A
Messages
19
Reaction score
0
I did it the following way. The number of characters and lines are coming out fine, but the words are usually more than the actual given by wc.

#include<stdio.h>
int main()
{
FILE *fp;
int words=0,chars=0,lines=0;
char prev,curr; //prev variable is included to exclude multiple spaces.
fp=fopen("input.txt","r");
while((fscanf(fp,"%c",&curr))!=EOF)
{
chars++;
if (curr=='\n') lines++;
if ((curr==' ' && prev!=' ') || (curr=='\n' && prev!='\n')) words++; //prev variable comes into play here. if the current char is a space and the previous was also a space then it should not be counted.
prev=curr;
}
printf("%d %d %d\n",chars,lines,words);
return 0;
}
 
Last edited:
Technology news on Phys.org
Use isspace() declared in ctype.h

HiHo!

The mistake is that you have not included the other whitespace characters
(e.g., '\t' and '\r') as tokens that delimit a word.
So, instead of (curr==' ' && prev!=' ') || (curr=='\n' && prev!='\n'), you
should use those is*-functions (e.g. isspace()) declared in ctype.h.

Regards,
Eus
 
Flawed algorithm

HiHo!

Oh, one more thing, you have not considered a test case like this one below.
people<SPACE><ENTER>people.
wc will count that as two words but yours will count that as three words.

Regards,
Eus
 

Similar threads

Replies
89
Views
7K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
Replies
15
Views
5K
  • · Replies 89 ·
3
Replies
89
Views
6K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
73
Views
6K
Replies
7
Views
2K
  • · Replies 32 ·
2
Replies
32
Views
4K
Replies
5
Views
2K