How to Sort Word Frequencies in a C Program?

  • Thread starter Thread starter lynk26
  • Start date Start date
  • Tags Tags
    Array Sorting
Click For Summary
SUMMARY

This discussion focuses on developing a C program to count word frequencies from a text file and output the results to another file. The program reads the input file, counts occurrences of each word using a custom structure, and requires sorting the results by frequency before writing them to the specified output file. Key functions mentioned include fopen, fgets, and qsort for file handling and sorting operations.

PREREQUISITES
  • Understanding of C programming language
  • Familiarity with file I/O operations in C
  • Knowledge of structures and arrays in C
  • Basic understanding of sorting algorithms, specifically qsort
NEXT STEPS
  • Implement the qsort function to sort the array of word structures by frequency
  • Research how to create a comparison function for qsort that compares count fields
  • Learn about dynamic memory allocation in C for handling larger datasets
  • Explore error handling techniques for file operations in C
USEFUL FOR

C programmers, students working on data processing assignments, and anyone interested in text analysis and frequency counting algorithms.

lynk26
Messages
1
Reaction score
0

Homework Statement



Develop a C program that to count how many times each word appears in a large text file. Your program must read words from a file and output the number of times each word of the file appears.
1. Implement an algorithm to count how many times each word appears in a large text file.
2. Ask the name of the file to count the words
3. The output of the program is a file. The name of the file must be asked by the program
4. The output file should be ordered by frequency of the words

Homework Equations



N/A

The Attempt at a Solution



Code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef struct WORD
{
    char wordname[50];
    int count;
}word_t;

word_t new_word(char* words)
{
    word_t new_w;
    strcpy (new_w.wordname, words);
    return new_w;
}

int main(int argc, char *argv[])
{
    FILE *fp;
    FILE* pInput;
    FILE* pOutput;
    char buffer[50];
    char buffer1;
    char filename[50];
    char output[50];
    char array1[1000][30];
    char* tempword;
    int i = 0;
    int j = 0;
    int numcount = 0;
    int i1 = 0;
    int j1 = 0;
    int tempcount;
    int char_count = 0;
    int word_count = 1;
    bool check = false;
    bool inlist;

    printf("Enter the name of the file to open (include the .txt): ");
    gets(filename);
    printf("Enter the name of the output file (include the .txt): ");
    gets(output);

    pInput = fopen(filename, "r");
    pOutput = fopen("temporary.txt","w");
    fp = fopen(output, "w");
    system("cls");

    while (buffer1 != EOF)
    {
        buffer1 = fgetc(pInput);
        if (buffer1 != ' ')
        {
            array1[j1][i1] = buffer1;
            fprintf(pOutput, "%c", array1[j1][i1]);
            i1++;
            if (buffer1 != '\n' && buffer1 != '\0')
            {
                char_count++;
            }
            if (buffer1 == '\n' || buffer1 == '\0')
            {
                word_count++;
            }
        }
        else
        {
            if (buffer1 == ' ')
            {
                word_count++;
            }
            j1++;
            i1 = 0;
            if (buffer1 == '\0')
                fprintf(pOutput, '\0');
            else
                fprintf(pOutput, "\n");
        }
    }
    fclose(pInput);
    fclose(pOutput);

    fp = fopen("temporary.txt", "r");

    word_t words[word_count];

    for (i = 0; i < word_count; i++)
    {
        fgets (buffer, 50, fp);
        words[i] = new_word(buffer);
        printf("words[%d] is %s", i, words[i].wordname);
    }
    fclose (fp);
    printf("\n");
    tempcount = word_count;
    for (j = 0; j < tempcount; j++)
    {
        tempword = words[j].wordname;
        for (i = 0; i < word_count; i++)
        {
            if (strcmp(tempword, words[i].wordname) == 0)
            {
                numcount++;
            }
        }

        words[j].count = numcount;
        numcount = 0;
    }
    printf ("\nPrinting results...\n\n");
    printf("\n#\tWord\n");
    printf("-\t----\t\n");

    for (j = 0; j < tempcount; j++)
    {
        for (i = 0; i < j; i++)
        {
            if (strcmp(words[j].wordname, words[i].wordname) == 0)
            {
                inlist = true;
                break;
            }
            else
            {
                inlist = false;
            }
        }
        if (inlist == false)
        {
            printf ("%d\t%s", words[j].count, words[j].wordname);
            fprintf(fp, "%d\t%s", words[j].count, words[j].wordname);
        }
    }
    printf("\n");
    return 0;
}

I need help with the sorting of the structs, and outputting the sorted structs to a file.

Here's the sample text file that I've been working with:

in the beginning god created the heaven and the earth
and the Earth was without form and void and darkness was upon the face
of the deep and the spirit of god moved upon the face of the waters
and god said let there be light and there was light
and god saw the light that it was good and god divided the light from
the darkness
and god called the light day and the darkness he called night and the
evening and the

I can read the file, can count the number of times each word appears, but I'm not quite sure how to use the qsort function to sort an array of structs, nor does the output want to go into the specified file. I've looked all over the internet but still can't seem to understand how it works. Can anyone show me how to at least get the sorting done?
 
Physics news on Phys.org
For the sorting part, qsort() is set up to sort an arbitrary array. One of the parameters is a pointer to a comparison function that you supply. The comparison function will need to decide whether the frequence of the word in words.wordname is "less than" the frequency of the word in words[j].wordname. At this level, all you need to do is compare words.count with words[j].count. The word with the higher frequency should go before the word with lower frequency I would think.

Once you have the array sorted by frequency, print (using fprintf) the words and their frequencies to the output file.
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 21 ·
Replies
21
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 7 ·
Replies
7
Views
2K