Character array command to find # of specific word in a paragraph

  • Thread starter Thread starter gfd43tg
  • Start date Start date
  • Tags Tags
    Array Specific
AI Thread Summary
The discussion focuses on how to count the occurrences of specific words in a character array representing the Gettysburg Address. Users are guided to use the command `strfind` to locate the starting indices of the word 'that' within the array. One participant initially struggles with the concept but eventually realizes that the output from `strfind` indicates the starting positions of the word, leading to the conclusion that 'that' appears 13 times. The conversation emphasizes understanding the output of commands in MATLAB for effective word counting. Overall, the thread illustrates the process of using MATLAB functions to analyze text data.
gfd43tg
Gold Member
Messages
947
Reaction score
48

Homework Statement


Download the mat file Gettysburg.mat. View in a new window and load it in the workspace using the command

clear;
load Gettysburg;

This loads a 1-by-1452 char array GA that is the Gettysburg Address in English. Display the first 100 or so characters for yourself to see this.

In this array, how many times does the char array 'that' occur?

Hint: You do not need to count the occurrences of 'that' by hand. What command can be used to find the starting index of a specified string (char array) in a larger string (char array)?

How many times does the char array 'for' occur?

How many times does the char array 'we' occur?

How many times does the char array 'We' occur?

Homework Equations


The Attempt at a Solution


I don't know what command to use to find the number of words in the character array, and the hint went over my head, so I am unsure what to do with it.

I did
Code:
 char([GA])

to get the statement to show up, but to show how many times I get ''that'' in the statement I keep trying variations of

Code:
 Char([GA, 'that')]

The command
Code:
 numel([GA, 'that'])
gives me the number of characters, 1456. I guess a word is not an element, but a character is. So I don't know where to go from here.

EDIT: I found the command

Code:
 strfind(GA, 'that'])

and it gives me

Code:
 ans =

  Columns 1 through 6

         145         234         346         394         467         472

  Columns 7 through 12

         532        1096        1155        1220        1248        1292

  Column 13

        1359

and I don't know how to interpret this.

EDIT2: I guess the columns is the number of times the word shows up. Is the number supposed to be the beginning character number where that word is?
 
Last edited:
Physics news on Phys.org
It makes more sense to say that the word 'that' starts at character position 146 and then again at 234 and again 346...

Since there are 13 elements then you could conclude that 'that' appears 13 times.
 
  • Like
Likes 1 person
Back
Top