Character array command to find # of specific word in a paragraph

  • Thread starter Thread starter gfd43tg
  • Start date Start date
  • Tags Tags
    Array Specific
Click For Summary
SUMMARY

The discussion focuses on using MATLAB to count the occurrences of specific words within the Gettysburg Address, stored as a character array in the variable GA. The command strfind(GA, 'that') is identified as the method to find the starting indices of the word 'that', revealing that it appears 13 times in the text. Participants also explore similar queries for the words 'for', 'we', and 'We', emphasizing the utility of string manipulation functions in MATLAB for text analysis.

PREREQUISITES
  • Basic understanding of MATLAB syntax and commands
  • Familiarity with character arrays in MATLAB
  • Knowledge of string manipulation functions in MATLAB
  • Ability to interpret output from MATLAB commands
NEXT STEPS
  • Learn how to use strfind for different string searches in MATLAB
  • Explore the count function in MATLAB for counting occurrences of substrings
  • Investigate regular expressions in MATLAB for advanced text searching
  • Practice manipulating character arrays and strings in MATLAB with various examples
USEFUL FOR

Students, educators, and researchers working with MATLAB who need to analyze text data, particularly those interested in string manipulation and text processing techniques.

gfd43tg
Gold Member
Messages
949
Reaction score
48

Homework Statement


Download the mat file Gettysburg.mat. View in a new window and load it in the workspace using the command

clear;
load Gettysburg;

This loads a 1-by-1452 char array GA that is the Gettysburg Address in English. Display the first 100 or so characters for yourself to see this.

In this array, how many times does the char array 'that' occur?

Hint: You do not need to count the occurrences of 'that' by hand. What command can be used to find the starting index of a specified string (char array) in a larger string (char array)?

How many times does the char array 'for' occur?

How many times does the char array 'we' occur?

How many times does the char array 'We' occur?

Homework Equations


The Attempt at a Solution


I don't know what command to use to find the number of words in the character array, and the hint went over my head, so I am unsure what to do with it.

I did
Code:
 char([GA])

to get the statement to show up, but to show how many times I get ''that'' in the statement I keep trying variations of

Code:
 Char([GA, 'that')]

The command
Code:
 numel([GA, 'that'])
gives me the number of characters, 1456. I guess a word is not an element, but a character is. So I don't know where to go from here.

EDIT: I found the command

Code:
 strfind(GA, 'that'])

and it gives me

Code:
 ans =

  Columns 1 through 6

         145         234         346         394         467         472

  Columns 7 through 12

         532        1096        1155        1220        1248        1292

  Column 13

        1359

and I don't know how to interpret this.

EDIT2: I guess the columns is the number of times the word shows up. Is the number supposed to be the beginning character number where that word is?
 
Last edited:
Physics news on Phys.org
It makes more sense to say that the word 'that' starts at character position 146 and then again at 234 and again 346...

Since there are 13 elements then you could conclude that 'that' appears 13 times.
 
  • Like
Likes   Reactions: 1 person

Similar threads

  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 4 ·
Replies
4
Views
14K
  • · Replies 3 ·
Replies
3
Views
3K
Replies
8
Views
2K
Replies
1
Views
4K
  • · Replies 3 ·
Replies
3
Views
6K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
6K