How can I create a PHP code to generate word pairs from a given text?

In summary, the conversation discusses the development of a PHP page that lists all words in a given text alphabetically, along with their respective counts. The conversation then moves onto discussing the creation of word pairs and using regular expressions and loops to achieve this. There is also a mention of potential memory and time constraints when dealing with a large number of words and bigrams.
  • #1
nanoWatt
88
2
I developed a php page that allows you to paste in text, and gives all the words listed alphabetically, and their counts in the text, tab delimited.

http://www.cnetworksllc.com/word_lister

for instance, if I type "the quick red fox jumps over the lazy brown dog"

I get:

brown 1
dog 1
fox 1
jumps 1
lazy 1
over 1
quick 1
red 1
the 2


I am wondering about pseudo-code to give me word pairs like:

the quick 1
quick red 1
red fox 1
fox jumps 1
jumps over 1
over the 1
the lazy 1
lazy brown 1
brown dog 1

PHP gives an EXPLODE function that converts a string to an array using a delimiter. I probably could use a reverse loop counting down from the word count, and concatinating two words.
 
Last edited by a moderator:
Technology news on Phys.org
  • #2
What about using a regular expression that matches any two words and applying a function like preg_match_all ?
 
  • #3
I actually don't need matching. I want to group the words. So I think I can loop through them, and concatenate word n and word n+1 with a space separator.
 
  • #4
Looping through should work fine. One technique might be to use an array to store the counts using text indices as you go through the sentence in a for loop.

IE:
$arr['the']['quick']++;
$arr['quick']['red']++;
$arr['red']['fox']++;

Then just loop through printing out the counts.
 
  • #5
Be careful with memory (and time!), because with n words you have [itex]\mathcal{O}(n^2)[/itex] bigrams.
 
  • #6
The # of bigrams won't be squared. I am only using neighboring words. So for the 9-word example, I had 9 bigrams.

I'm using this to process content off of my own websites, so it will most likely have < 1000 words.

The 2D array might not work for what I need.

This is what I'm going for:
$arr[0] = "the quick"
$arr[1] = "quick red"
$arr[2] = "red fox"

and so on.

My original string will be $myString that contains the content of the text. A temporary string stores this value, is lowercased, and all punctuation is removed. Newlines and tabs are replaced with spaces, and non alpha-numeric (except for apostrophe) are removed.

Explode creates the array, using space as a delimiter.

Then, I do sort to sort the array, and an array_unique to make the array unique.

I can then loop through the array (using foreach), and do substr_count to get the # of times the word-pair is used, with each iteration of the array's value.
 
Last edited:
  • #7
So you could do something like
Code:
$string = "The quick brown fox jumps over the lazy dog";
$words = explode(' ', $string);
$singleWords = sort($words); // and array_unique, and whatever
$wordPairs = array();
for($i = 0; $i <= count($words); $i++) 
  $wordPairs[] = $words[$i] . " " . $words[$i + 1];
$wordPairs = sort($wordPairs); // and array_unique, etc.

Perhaps not the most optimized code, but should work.
 
  • #8
nanoWatt said:
The # of bigrams won't be squared. I am only using neighboring words. So for the 9-word example, I had 9 bigrams.

What I mean is that

"apple bear cow apple bear apple cow bear apple bear cow apple"

has 3 unique words, 6 = 3(3-1) unique bigrams, and 6 = 3(3-1)(3-2) unique trigrams. For a large corpus you might have a few million words, 1 million unique words, and hundreds of billions of bigrams.
 
Last edited:

1. What are combinations of word pairs?

Combinations of word pairs refer to the different ways two words can be combined or paired together to create new phrases or concepts.

2. Why are combinations of word pairs important?

Combinations of word pairs can help expand vocabulary and improve language skills. They can also be useful in creative writing, problem-solving, and brainstorming ideas.

3. How are combinations of word pairs created?

Combinations of word pairs can be created by combining words with similar meanings, opposite meanings, or words that are unrelated but still make sense together.

4. What are some examples of combinations of word pairs?

Some examples of combinations of word pairs include "dark chocolate," "hot coffee," "fast car," "happy marriage," and "green apple."

5. How can combinations of word pairs be used in research or experiments?

Combinations of word pairs can be used in research or experiments to study the effects of language and word associations on human cognition and behavior. They can also be used to test hypotheses or generate new ideas.

Similar threads

Replies
1
Views
3K
  • Programming and Computer Science
2
Replies
49
Views
10K
  • Programming and Computer Science
Replies
13
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
16
Views
10K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
6K
  • Engineering and Comp Sci Homework Help
Replies
5
Views
16K
  • STEM Academic Advising
Replies
13
Views
2K
Replies
4
Views
3K
Replies
26
Views
17K
Back
Top