- #1
Gnophos
- 21
- 0
I hope a few of you are avid perl programmers. There seems to be a surprising paucity of decent perl-centric boards out there, so I thought I'd try this one since so many smart people come here :-)
I myself am new to perl but have plenty of experience in C++/Objective-C and other languages. I am finding perl very easy to learn and experiment with but am having the darndest time with this one module, Text::ParseWords. It's supposed to, well, parse a line of text into words. I thought it was working until I started checking its math. Here, run this perl script (I apologize for the lack of spaces or tabs, this old browser doesn't support CODE tags):
----------------------
use Text::ParseWords;
open(INFILE, "/someplace/test.txt") or die "Can't read!";
#Contents of test.txt are following four lines:
# here are four words
#here are three
#This word's invisible
#And the end.
$word = 0;
$wc = 0;
while (<INFILE>)
{
#@words = &shellwords($_);
@words = "ewords('\s+', 1, $_);
$i = 0;
foreach (@words)
{
if ($i == 1) # for each line's second word...
{
$word = &_;
}
$wc = $wc + 1;
$i++;
}
print "2nd word is: '", $word, "' and this line makes ", $wc, " words so far\n";
}
print("Total of ", $wc, " words\n";
close INFILE;
---------------------
There, type or paste that in and run it. See the results? Notice that the total word count is right but the first line's word count is totally off and the third line isn't counted at all. The code that prints the 2nd word of each line is proof that the 3rd line is invisible. This reveals at least two bugs:
- leading spaces on lines seem to get counted even though no logical word parser would work that way (it still doesn't make sense that line 1 has "six" words, you would expect "five")
- apostrophes, aka "single-quotes" as far as computers are concerned, wreak havoc; that's the only way I can put it
Unless I can get the word parsing module to handle leading spaces and apostrophes I won't be able to build this utility I'm working on. I know there must be a way!
Btw, replace that call to quotewords() with the call to shellwords() that I commented out above it to see another possible way to handle it, which also fails miserably.
Any help you guys can give would be much appreciated. And please don't reply with code in CODE tags, I might not be able to read it.
I myself am new to perl but have plenty of experience in C++/Objective-C and other languages. I am finding perl very easy to learn and experiment with but am having the darndest time with this one module, Text::ParseWords. It's supposed to, well, parse a line of text into words. I thought it was working until I started checking its math. Here, run this perl script (I apologize for the lack of spaces or tabs, this old browser doesn't support CODE tags):
----------------------
use Text::ParseWords;
open(INFILE, "/someplace/test.txt") or die "Can't read!";
#Contents of test.txt are following four lines:
# here are four words
#here are three
#This word's invisible
#And the end.
$word = 0;
$wc = 0;
while (<INFILE>)
{
#@words = &shellwords($_);
@words = "ewords('\s+', 1, $_);
$i = 0;
foreach (@words)
{
if ($i == 1) # for each line's second word...
{
$word = &_;
}
$wc = $wc + 1;
$i++;
}
print "2nd word is: '", $word, "' and this line makes ", $wc, " words so far\n";
}
print("Total of ", $wc, " words\n";
close INFILE;
---------------------
There, type or paste that in and run it. See the results? Notice that the total word count is right but the first line's word count is totally off and the third line isn't counted at all. The code that prints the 2nd word of each line is proof that the 3rd line is invisible. This reveals at least two bugs:
- leading spaces on lines seem to get counted even though no logical word parser would work that way (it still doesn't make sense that line 1 has "six" words, you would expect "five")
- apostrophes, aka "single-quotes" as far as computers are concerned, wreak havoc; that's the only way I can put it
Unless I can get the word parsing module to handle leading spaces and apostrophes I won't be able to build this utility I'm working on. I know there must be a way!
Btw, replace that call to quotewords() with the call to shellwords() that I commented out above it to see another possible way to handle it, which also fails miserably.
Any help you guys can give would be much appreciated. And please don't reply with code in CODE tags, I might not be able to read it.