A question from my data sturture project

  • Thread starter Thread starter kant
  • Start date Start date
  • Tags Tags
    Data Project
Click For Summary
SUMMARY

The discussion focuses on string parsing in C, specifically how to extract words from a text input while ignoring symbols. The user is attempting to implement a solution using the `scanf` function and a custom `AllocateName` function. Key functions mentioned include `strpbrk`, `strcspn`, and `strtok`, which are essential for effective string manipulation. The consensus is that string parsing is inherently complex and requires a thorough understanding of available string functions.

PREREQUISITES
  • Proficiency in C programming language
  • Understanding of string manipulation functions in C
  • Familiarity with memory allocation using malloc
  • Knowledge of regular expressions for advanced parsing
NEXT STEPS
  • Research the `strpbrk` function for character searching in strings
  • Explore the `strcspn` function for finding the length of a substring
  • Learn about the `strtok` function and its limitations in tokenizing strings
  • Investigate regular expressions in C for more sophisticated string parsing
USEFUL FOR

Software developers, particularly those working with C programming and string manipulation, as well as anyone involved in data processing and text analysis.

kant
Messages
388
Reaction score
0
I have this input file that contain words. I am suppose to scan this text, and save each word into a some data structure( not part of my question). My question is: How do i get the words but ignore the symbols? The text that is given contain symbols like - , : ; ( ) _ - + - ...etc. Here is what i got so far:

while( scanf(fpname, %s, stun) !=EOF) )
{
******
****
***
pname= AllocateName( stun);
*******
*****
***
*

}
/* here is my allocate name fuction*/

char* AllocateName( char *stun)
{
char*name;
char let;
int num;
num= strlen(stun);
--num;
let=stun[num];
if(let=='.' ||let==',' || let==':')
{
stun[num]='\o';
}
if(!(name=(char*)malloc( strlen(stun)+1, sizeof(char))))
{
printf("problem allocating name\n");
exit(2);
}
strcmp(name, stun);

return name;
}

Yes, yes.. It only care for words that ends with a comma, or a period.
So it is a not a perfect solution. There can be words like:

(log)(base4)(12) <-- consider one word

but not this:

I have the utter-most-hatred-this-lab, where "utter", "most", "hatred", "this" , "lab" are consider individual words, without the god damn '-'. In other word, if i save "utter-most-hatred-this-lab" as a string as stun, than it must be borken up into some unknown number of pieces individually allocate in the heap! unless i am to use [^!@#$%^&*()]. What is a alogant way to god damn do this??
 
Last edited:
Technology news on Phys.org
Well, to be elegant you have to first stop swearing. :-p

Do you know the strpbrk function? Or strcspn? There's also strtok, but it has issues.

There's a whole slew of useful string manipulation routines -- you should find a chapter in a book on them and just read it, or scan through all the manpages for them (they usually contain references to each other) to learn what they all do.

You can even do a google search for "man strpbrk" :biggrin:


But, the whole lesson is: string parsing is not elegant. Deal with it. :smile:
 
Last edited:

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 2 ·
Replies
2
Views
5K
  • · Replies 34 ·
2
Replies
34
Views
7K