How to Sort Word Frequencies in a C Program?

  • Thread starter Thread starter lynk26
  • Start date Start date
  • Tags Tags
    Array Sorting
AI Thread Summary
The discussion focuses on developing a C program to count word frequencies from a text file and output the results to another file. Key tasks include reading the input file, counting occurrences of each word, and sorting the results by frequency using the `qsort` function. Participants express challenges with implementing the sorting mechanism and correctly outputting the sorted data to the specified file. A comparison function is suggested to facilitate sorting based on word frequency, ensuring that higher frequencies appear first in the output. The thread emphasizes the need for clarity in using `qsort` and managing file operations effectively.
lynk26
Messages
1
Reaction score
0

Homework Statement



Develop a C program that to count how many times each word appears in a large text file. Your program must read words from a file and output the number of times each word of the file appears.
1. Implement an algorithm to count how many times each word appears in a large text file.
2. Ask the name of the file to count the words
3. The output of the program is a file. The name of the file must be asked by the program
4. The output file should be ordered by frequency of the words

Homework Equations



N/A

The Attempt at a Solution



Code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef struct WORD
{
    char wordname[50];
    int count;
}word_t;

word_t new_word(char* words)
{
    word_t new_w;
    strcpy (new_w.wordname, words);
    return new_w;
}

int main(int argc, char *argv[])
{
    FILE *fp;
    FILE* pInput;
    FILE* pOutput;
    char buffer[50];
    char buffer1;
    char filename[50];
    char output[50];
    char array1[1000][30];
    char* tempword;
    int i = 0;
    int j = 0;
    int numcount = 0;
    int i1 = 0;
    int j1 = 0;
    int tempcount;
    int char_count = 0;
    int word_count = 1;
    bool check = false;
    bool inlist;

    printf("Enter the name of the file to open (include the .txt): ");
    gets(filename);
    printf("Enter the name of the output file (include the .txt): ");
    gets(output);

    pInput = fopen(filename, "r");
    pOutput = fopen("temporary.txt","w");
    fp = fopen(output, "w");
    system("cls");

    while (buffer1 != EOF)
    {
        buffer1 = fgetc(pInput);
        if (buffer1 != ' ')
        {
            array1[j1][i1] = buffer1;
            fprintf(pOutput, "%c", array1[j1][i1]);
            i1++;
            if (buffer1 != '\n' && buffer1 != '\0')
            {
                char_count++;
            }
            if (buffer1 == '\n' || buffer1 == '\0')
            {
                word_count++;
            }
        }
        else
        {
            if (buffer1 == ' ')
            {
                word_count++;
            }
            j1++;
            i1 = 0;
            if (buffer1 == '\0')
                fprintf(pOutput, '\0');
            else
                fprintf(pOutput, "\n");
        }
    }
    fclose(pInput);
    fclose(pOutput);

    fp = fopen("temporary.txt", "r");

    word_t words[word_count];

    for (i = 0; i < word_count; i++)
    {
        fgets (buffer, 50, fp);
        words[i] = new_word(buffer);
        printf("words[%d] is %s", i, words[i].wordname);
    }
    fclose (fp);
    printf("\n");
    tempcount = word_count;
    for (j = 0; j < tempcount; j++)
    {
        tempword = words[j].wordname;
        for (i = 0; i < word_count; i++)
        {
            if (strcmp(tempword, words[i].wordname) == 0)
            {
                numcount++;
            }
        }

        words[j].count = numcount;
        numcount = 0;
    }
    printf ("\nPrinting results...\n\n");
    printf("\n#\tWord\n");
    printf("-\t----\t\n");

    for (j = 0; j < tempcount; j++)
    {
        for (i = 0; i < j; i++)
        {
            if (strcmp(words[j].wordname, words[i].wordname) == 0)
            {
                inlist = true;
                break;
            }
            else
            {
                inlist = false;
            }
        }
        if (inlist == false)
        {
            printf ("%d\t%s", words[j].count, words[j].wordname);
            fprintf(fp, "%d\t%s", words[j].count, words[j].wordname);
        }
    }
    printf("\n");
    return 0;
}

I need help with the sorting of the structs, and outputting the sorted structs to a file.

Here's the sample text file that I've been working with:

in the beginning god created the heaven and the earth
and the Earth was without form and void and darkness was upon the face
of the deep and the spirit of god moved upon the face of the waters
and god said let there be light and there was light
and god saw the light that it was good and god divided the light from
the darkness
and god called the light day and the darkness he called night and the
evening and the

I can read the file, can count the number of times each word appears, but I'm not quite sure how to use the qsort function to sort an array of structs, nor does the output want to go into the specified file. I've looked all over the internet but still can't seem to understand how it works. Can anyone show me how to at least get the sorting done?
 
Physics news on Phys.org
For the sorting part, qsort() is set up to sort an arbitrary array. One of the parameters is a pointer to a comparison function that you supply. The comparison function will need to decide whether the frequence of the word in words.wordname is "less than" the frequency of the word in words[j].wordname. At this level, all you need to do is compare words.count with words[j].count. The word with the higher frequency should go before the word with lower frequency I would think.

Once you have the array sorted by frequency, print (using fprintf) the words and their frequencies to the output file.
 

Similar threads

Replies
8
Views
2K
Replies
3
Views
1K
Replies
7
Views
3K
Replies
9
Views
4K
Replies
7
Views
2K
Back
Top