How can I fix the bugs in my Perl script using Text::ParseWords?

Gnophos · Apr 17, 2005

I hope a few of you are avid perl programmers. There seems to be a surprising paucity of decent perl-centric boards out there, so I thought I'd try this one since so many smart people come here :-)

I myself am new to perl but have plenty of experience in C++/Objective-C and other languages. I am finding perl very easy to learn and experiment with but am having the darndest time with this one module, Text::ParseWords. It's supposed to, well, parse a line of text into words. I thought it was working until I started checking its math. Here, run this perl script (I apologize for the lack of spaces or tabs, this old browser doesn't support CODE tags):

----------------------
use Text::ParseWords;

open(INFILE, "/someplace/test.txt") or die "Can't read!";
#Contents of test.txt are following four lines:
# here are four words
#here are three
#This word's invisible
#And the end.

$word = 0;
$wc = 0;

while (<INFILE>)
{
#@words = &shellwords($_);
@words = "ewords('\s+', 1, $_);
$i = 0;

foreach (@words)
{
if ($i == 1) # for each line's second word...
{
$word = &_;
}
$wc = $wc + 1;
$i++;
}
print "2nd word is: '", $word, "' and this line makes ", $wc, " words so far\n";
}

print("Total of ", $wc, " words\n";

close INFILE;
---------------------

There, type or paste that in and run it. See the results? Notice that the total word count is right but the first line's word count is totally off and the third line isn't counted at all. The code that prints the 2nd word of each line is proof that the 3rd line is invisible. This reveals at least two bugs:

- leading spaces on lines seem to get counted even though no logical word parser would work that way (it still doesn't make sense that line 1 has "six" words, you would expect "five")

- apostrophes, aka "single-quotes" as far as computers are concerned, wreak havoc; that's the only way I can put it

Unless I can get the word parsing module to handle leading spaces and apostrophes I won't be able to build this utility I'm working on. I know there must be a way!

Btw, replace that call to quotewords() with the call to shellwords() that I commented out above it to see another possible way to handle it, which also fails miserably.

Any help you guys can give would be much appreciated. And please don't reply with code in CODE tags, I might not be able to read it.

abhishek · Apr 17, 2005

This is real hackish, but it seems to work. All I did was add two regex substitution commands to replace " with \" and ' with \'.

There were a couple other typos in your code in the original post, too. I presume you have a correct version already.

-----------------
use Text::ParseWords;

open(INFILE, "test.txt") or die "Can't read!";
#Contents of test.txt are following four lines:
## here are four words
##here are three
##This word's invisible
##And the end.

$word = 0;
$wc = 0;

while (<INFILE>)
{
s/\"/\\"/g; s/\'/\\'/g; #escape double and single quote characters
@words = &shellwords($_);
#@words = quotewords('\s+', 1, $_);
$i = 0;

foreach (@words)
{
if ($i == 1) # for each line's second word...
{
$word = $_;
}
$wc = $wc + 1;
$i++;
}
print "2nd word is: '", $word, "' and this line makes ", $wc, " words so far\n";
}

print("Total of ", $wc, " words\n");

close INFILE;

chroot · Apr 17, 2005

This all seems incredibly convoluted.

Why not just use this:

#!/usr/bin/perl

while (<>)
{
s/^\s+//; # remove leading
s/\s+$//; # and trailing whitespace

@words = split(/\s/, $_);
print scalar(@words) . "\n";
}

- Warren

Gnophos · Apr 19, 2005

Domo arigato!

By golly, why *don't* I just do that? :)

I had thought about removing single-quotes and leading whitespaces from the text, but didn't know how. Plus I figured I just wasn't using the ParseWords functions properly.

Regarding my typos, I notice at least one that seems to be a glitch in the post. Something ate a few characters. I'm glad it didn't throw you guys.

Thanks to both chroot and abhishek for your responses. One of you gave me a way to work around quote characters, and the other gave me a way to remove spaces, which were the two obstacles I was facing. Of course I hadn't told either of you what I was working on and whether it was okay to alter the source text by removing characters, so you each came up with your own solution.

Incidentally, it's no secret project. I'm just writing a command to give the total size of -- and number of items in -- a directory, an obviously useful function which my bash shell strangely does not seem to offer. I figured writing it would be instructive in learning the CLI and Perl. The catch was that I was saving the contents of a dir (find [...] > output.txt) and analyzing it with a perl script to get the total file size and count, but as you saw, ParseWords wasn't handling a couple things properly.

I can post it when it's finished if anyone has a use for it, but it's written specifically for Mac OS X, i.e., it handles .apps properly. The whole catch is that on a Mac an application is actually a folder full of (sometimes) thousands of resource files, so using Unix's find command returns all those files within the apps, which shouldn't count as files. So the whole project became surprisingly complicated, what with the finding and the grepping and the perling (word?).

Anyway, maybe I'll post it at some point; not like you guys couldn't write such a command yourselves, of course, and do it better than me, but maybe someone else will find it useful.

chroot · Apr 19, 2005

Gnophos said:

I'm just writing a command to give the total size of -- and number of items in -- a directory, an obviously useful function which my bash shell strangely does not seem to offer.

The bash shell is not in the business of listing files. Use the programs ls and du to perform those functions. Use `man ls` and `man du` to get more information about these programs.

To count the number of files (not inluding directories): `ls -l | grep -v -c "^d"`

To count the number of files (including directories): `ls -l | wc -l`

To get the total size of all the files in the current directory (not including subdirectories): `du -Ssh .`

To get the total size of all the files in the current directory (including subdirectories): `du -sh .`

I figured writing it would be instructive in learning the CLI and Perl.

This is an entertaining exercise, for sure, even though it is sort of reinventing the wheel.

Here's a simple script I whipped up that will count files; you should be able to edit it pretty easily to handle the .apps folders as you'd prefer. (I didn't quite understand the behavior you're looking for, so I didn't attempt to code it.)

#!/usr/bin/perl

($count, $size) = tally(shift || ".");

if ($size > (1024 * 1024 * 1024))
{
$size = sprintf "%.2f GB", $size / (1024 * 1024 * 1024);
}
elsif ($size > (1024 * 1024))
{
$size = sprintf "%.2f MB", $size / (1024 * 1024);
}
elsif ($size > 1024)
{
$size = sprintf "%.2f kB", $size / 1024;
}
else
{
$size = $size . " b";
}

print "$count files, total size $size\n";

sub tally
{
my $thing = shift;
my ($count, $size, $subcount, $subsize, $entry);

if (-f $thing)
{
return (1, -s $thing);
}
elsif (-d $thing)
{
# Uncomment to count directories, too
# $count++;

opendir(DIR, $thing) || die "Can't open directory: $thing";
my @contents = grep { !/^\.$/ && !/^\.\.$/ } readdir DIR; # Read all files, even hidden ones
# my @contents = grep { !/^\./ } readdir DIR; # Read only files that are not hidden
closedir DIR;

foreach $entry (@contents)
{
($subcount, $subsize) = tally("$thing/$entry");
$count += $subcount;
$size += $subsize;
}
}

return ($count, $size);
}

- Warren

How can I fix the bugs in my Perl script using Text::ParseWords?

Thread 'AI Marketed To Students: "Hello easy A's!"'

Thread 'Is AI hype?'

Thread 'ChatGPT Examples, Good and Bad'

Similar threads

Hot Threads

Is AI hype?

Q-Day: When Quantum Computers can Factor ultra-large numbers in a few...

New recent SMS scams

Seeking Information on a WW II Era Westinghouse Gyro

How to disable AI responses in Google Searches?

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective