How can I fix the bugs in my Perl script using Text::ParseWords?

  • Thread starter Thread starter Gnophos
  • Start date Start date
  • Tags Tags
    Text
AI Thread Summary
The discussion revolves around challenges faced while using the Perl module Text::ParseWords for parsing text files. A user, new to Perl but experienced in other programming languages, encounters issues with word counting, particularly with leading spaces and apostrophes affecting the output. The initial code provided fails to accurately count words in lines with leading spaces and does not recognize words with apostrophes. Suggestions are made to use regex substitutions to handle quote characters and to remove leading/trailing whitespace. An alternative approach using a simpler split function is proposed, which effectively counts words without the complications of the ParseWords module. The user is working on a project to calculate the total size and number of items in a directory, specifically for Mac OS X, which adds complexity due to the nature of application folders. The conversation highlights the importance of correctly handling text parsing in Perl and offers insights into alternative methods for achieving the desired outcomes.
Gnophos
Messages
21
Reaction score
0
I hope a few of you are avid perl programmers. There seems to be a surprising paucity of decent perl-centric boards out there, so I thought I'd try this one since so many smart people come here :-)

I myself am new to perl but have plenty of experience in C++/Objective-C and other languages. I am finding perl very easy to learn and experiment with but am having the darndest time with this one module, Text::ParseWords. It's supposed to, well, parse a line of text into words. I thought it was working until I started checking its math. Here, run this perl script (I apologize for the lack of spaces or tabs, this old browser doesn't support CODE tags):

----------------------
use Text::ParseWords;

open(INFILE, "/someplace/test.txt") or die "Can't read!";
#Contents of test.txt are following four lines:
# here are four words
#here are three
#This word's invisible
#And the end.

$word = 0;
$wc = 0;

while (<INFILE>)
{
#@words = &shellwords($_);
@words = "ewords('\s+', 1, $_);
$i = 0;

foreach (@words)
{
if ($i == 1) # for each line's second word...
{
$word = &_;
}
$wc = $wc + 1;
$i++;
}
print "2nd word is: '", $word, "' and this line makes ", $wc, " words so far\n";
}

print("Total of ", $wc, " words\n";

close INFILE;
---------------------

There, type or paste that in and run it. See the results? Notice that the total word count is right but the first line's word count is totally off and the third line isn't counted at all. The code that prints the 2nd word of each line is proof that the 3rd line is invisible. This reveals at least two bugs:

- leading spaces on lines seem to get counted even though no logical word parser would work that way (it still doesn't make sense that line 1 has "six" words, you would expect "five")

- apostrophes, aka "single-quotes" as far as computers are concerned, wreak havoc; that's the only way I can put it

Unless I can get the word parsing module to handle leading spaces and apostrophes I won't be able to build this utility I'm working on. I know there must be a way!

Btw, replace that call to quotewords() with the call to shellwords() that I commented out above it to see another possible way to handle it, which also fails miserably.

Any help you guys can give would be much appreciated. And please don't reply with code in CODE tags, I might not be able to read it.
 
Computer science news on Phys.org
This is real hackish, but it seems to work. All I did was add two regex substitution commands to replace " with \" and ' with \'. :smile:

There were a couple other typos in your code in the original post, too. I presume you have a correct version already. :cool:

-----------------
use Text::ParseWords;

open(INFILE, "test.txt") or die "Can't read!";
#Contents of test.txt are following four lines:
## here are four words
##here are three
##This word's invisible
##And the end.

$word = 0;
$wc = 0;

while (<INFILE>)
{
s/\"/\\"/g; s/\'/\\'/g; #escape double and single quote characters
@words = &shellwords($_);
#@words = quotewords('\s+', 1, $_);
$i = 0;

foreach (@words)
{
if ($i == 1) # for each line's second word...
{
$word = $_;
}
$wc = $wc + 1;
$i++;
}
print "2nd word is: '", $word, "' and this line makes ", $wc, " words so far\n";
}

print("Total of ", $wc, " words\n");

close INFILE;
 
Last edited:
This all seems incredibly convoluted.

Why not just use this:

#!/usr/bin/perl

while (<>)
{
s/^\s+//; # remove leading
s/\s+$//; # and trailing whitespace

@words = split(/\s/, $_);
print scalar(@words) . "\n";
}

- Warren
 
Last edited:
Domo arigato!

By golly, why *don't* I just do that? :)

I had thought about removing single-quotes and leading whitespaces from the text, but didn't know how. Plus I figured I just wasn't using the ParseWords functions properly.

Regarding my typos, I notice at least one that seems to be a glitch in the post. Something ate a few characters. I'm glad it didn't throw you guys.

Thanks to both chroot and abhishek for your responses. One of you gave me a way to work around quote characters, and the other gave me a way to remove spaces, which were the two obstacles I was facing. Of course I hadn't told either of you what I was working on and whether it was okay to alter the source text by removing characters, so you each came up with your own solution.

Incidentally, it's no secret project. I'm just writing a command to give the total size of -- and number of items in -- a directory, an obviously useful function which my bash shell strangely does not seem to offer. I figured writing it would be instructive in learning the CLI and Perl. The catch was that I was saving the contents of a dir (find [...] > output.txt) and analyzing it with a perl script to get the total file size and count, but as you saw, ParseWords wasn't handling a couple things properly.

I can post it when it's finished if anyone has a use for it, but it's written specifically for Mac OS X, i.e., it handles .apps properly. The whole catch is that on a Mac an application is actually a folder full of (sometimes) thousands of resource files, so using Unix's find command returns all those files within the apps, which shouldn't count as files. So the whole project became surprisingly complicated, what with the finding and the grepping and the perling (word?).

Anyway, maybe I'll post it at some point; not like you guys couldn't write such a command yourselves, of course, and do it better than me, but maybe someone else will find it useful.
 
Gnophos said:
I'm just writing a command to give the total size of -- and number of items in -- a directory, an obviously useful function which my bash shell strangely does not seem to offer.
The bash shell is not in the business of listing files. Use the programs ls and du to perform those functions. Use `man ls` and `man du` to get more information about these programs.

To count the number of files (not inluding directories): `ls -l | grep -v -c "^d"`

To count the number of files (including directories): `ls -l | wc -l`

To get the total size of all the files in the current directory (not including subdirectories): `du -Ssh .`

To get the total size of all the files in the current directory (including subdirectories): `du -sh .`
I figured writing it would be instructive in learning the CLI and Perl.
This is an entertaining exercise, for sure, even though it is sort of reinventing the wheel.

Here's a simple script I whipped up that will count files; you should be able to edit it pretty easily to handle the .apps folders as you'd prefer. (I didn't quite understand the behavior you're looking for, so I didn't attempt to code it.)

#!/usr/bin/perl

($count, $size) = tally(shift || ".");

if ($size > (1024 * 1024 * 1024))
{
$size = sprintf "%.2f GB", $size / (1024 * 1024 * 1024);
}
elsif ($size > (1024 * 1024))
{
$size = sprintf "%.2f MB", $size / (1024 * 1024);
}
elsif ($size > 1024)
{
$size = sprintf "%.2f kB", $size / 1024;
}
else
{
$size = $size . " b";
}

print "$count files, total size $size\n";

sub tally
{
my $thing = shift;
my ($count, $size, $subcount, $subsize, $entry);

if (-f $thing)
{
return (1, -s $thing);
}
elsif (-d $thing)
{
# Uncomment to count directories, too
# $count++;

opendir(DIR, $thing) || die "Can't open directory: $thing";
my @contents = grep { !/^\.$/ && !/^\.\.$/ } readdir DIR; # Read all files, even hidden ones
# my @contents = grep { !/^\./ } readdir DIR; # Read only files that are not hidden
closedir DIR;

foreach $entry (@contents)
{
($subcount, $subsize) = tally("$thing/$entry");
$count += $subcount;
$size += $subsize;
}
}

return ($count, $size);
}

- Warren
 
Last edited:
In my discussions elsewhere, I've noticed a lot of disagreement regarding AI. A question that comes up is, "Is AI hype?" Unfortunately, when this question is asked, the one asking, as far as I can tell, may mean one of three things which can lead to lots of confusion. I'll list them out now for clarity. 1. Can AI do everything a human can do and how close are we to that? 2. Are corporations and governments using the promise of AI to gain more power for themselves? 3. Are AI and transhumans...
Thread 'ChatGPT Examples, Good and Bad'
I've been experimenting with ChatGPT. Some results are good, some very very bad. I think examples can help expose the properties of this AI. Maybe you can post some of your favorite examples and tell us what they reveal about the properties of this AI. (I had problems with copy/paste of text and formatting, so I'm posting my examples as screen shots. That is a promising start. :smile: But then I provided values V=1, R1=1, R2=2, R3=3 and asked for the value of I. At first, it said...
Back
Top