How can I fix the bugs in my Perl script using Text::ParseWords?

  • Thread starter Gnophos
  • Start date
  • Tags
    Text
In summary, chroot and abhishek both say that Perl is easy to learn and they recommend this forum for people just starting out. abhishek also mentions a module that is difficult to use and causes errors. chroot says that the module he is having trouble with has a bug that is easy to reproduce.
  • #1
Gnophos
21
0
I hope a few of you are avid perl programmers. There seems to be a surprising paucity of decent perl-centric boards out there, so I thought I'd try this one since so many smart people come here :-)

I myself am new to perl but have plenty of experience in C++/Objective-C and other languages. I am finding perl very easy to learn and experiment with but am having the darndest time with this one module, Text::ParseWords. It's supposed to, well, parse a line of text into words. I thought it was working until I started checking its math. Here, run this perl script (I apologize for the lack of spaces or tabs, this old browser doesn't support CODE tags):

----------------------
use Text::ParseWords;

open(INFILE, "/someplace/test.txt") or die "Can't read!";
#Contents of test.txt are following four lines:
# here are four words
#here are three
#This word's invisible
#And the end.

$word = 0;
$wc = 0;

while (<INFILE>)
{
#@words = &shellwords($_);
@words = "ewords('\s+', 1, $_);
$i = 0;

foreach (@words)
{
if ($i == 1) # for each line's second word...
{
$word = &_;
}
$wc = $wc + 1;
$i++;
}
print "2nd word is: '", $word, "' and this line makes ", $wc, " words so far\n";
}

print("Total of ", $wc, " words\n";

close INFILE;
---------------------

There, type or paste that in and run it. See the results? Notice that the total word count is right but the first line's word count is totally off and the third line isn't counted at all. The code that prints the 2nd word of each line is proof that the 3rd line is invisible. This reveals at least two bugs:

- leading spaces on lines seem to get counted even though no logical word parser would work that way (it still doesn't make sense that line 1 has "six" words, you would expect "five")

- apostrophes, aka "single-quotes" as far as computers are concerned, wreak havoc; that's the only way I can put it

Unless I can get the word parsing module to handle leading spaces and apostrophes I won't be able to build this utility I'm working on. I know there must be a way!

Btw, replace that call to quotewords() with the call to shellwords() that I commented out above it to see another possible way to handle it, which also fails miserably.

Any help you guys can give would be much appreciated. And please don't reply with code in CODE tags, I might not be able to read it.
 
Computer science news on Phys.org
  • #2
This is real hackish, but it seems to work. All I did was add two regex substitution commands to replace " with \" and ' with \'. :smile:

There were a couple other typos in your code in the original post, too. I presume you have a correct version already. :cool:

-----------------
use Text::ParseWords;

open(INFILE, "test.txt") or die "Can't read!";
#Contents of test.txt are following four lines:
## here are four words
##here are three
##This word's invisible
##And the end.

$word = 0;
$wc = 0;

while (<INFILE>)
{
s/\"/\\"/g; s/\'/\\'/g; #escape double and single quote characters
@words = &shellwords($_);
#@words = quotewords('\s+', 1, $_);
$i = 0;

foreach (@words)
{
if ($i == 1) # for each line's second word...
{
$word = $_;
}
$wc = $wc + 1;
$i++;
}
print "2nd word is: '", $word, "' and this line makes ", $wc, " words so far\n";
}

print("Total of ", $wc, " words\n");

close INFILE;
 
Last edited:
  • #3
This all seems incredibly convoluted.

Why not just use this:

#!/usr/bin/perl

while (<>)
{
s/^\s+//; # remove leading
s/\s+$//; # and trailing whitespace

@words = split(/\s/, $_);
print scalar(@words) . "\n";
}

- Warren
 
Last edited:
  • #4
Domo arigato!

By golly, why *don't* I just do that? :)

I had thought about removing single-quotes and leading whitespaces from the text, but didn't know how. Plus I figured I just wasn't using the ParseWords functions properly.

Regarding my typos, I notice at least one that seems to be a glitch in the post. Something ate a few characters. I'm glad it didn't throw you guys.

Thanks to both chroot and abhishek for your responses. One of you gave me a way to work around quote characters, and the other gave me a way to remove spaces, which were the two obstacles I was facing. Of course I hadn't told either of you what I was working on and whether it was okay to alter the source text by removing characters, so you each came up with your own solution.

Incidentally, it's no secret project. I'm just writing a command to give the total size of -- and number of items in -- a directory, an obviously useful function which my bash shell strangely does not seem to offer. I figured writing it would be instructive in learning the CLI and Perl. The catch was that I was saving the contents of a dir (find [...] > output.txt) and analyzing it with a perl script to get the total file size and count, but as you saw, ParseWords wasn't handling a couple things properly.

I can post it when it's finished if anyone has a use for it, but it's written specifically for Mac OS X, i.e., it handles .apps properly. The whole catch is that on a Mac an application is actually a folder full of (sometimes) thousands of resource files, so using Unix's find command returns all those files within the apps, which shouldn't count as files. So the whole project became surprisingly complicated, what with the finding and the grepping and the perling (word?).

Anyway, maybe I'll post it at some point; not like you guys couldn't write such a command yourselves, of course, and do it better than me, but maybe someone else will find it useful.
 
  • #5
Gnophos said:
I'm just writing a command to give the total size of -- and number of items in -- a directory, an obviously useful function which my bash shell strangely does not seem to offer.
The bash shell is not in the business of listing files. Use the programs ls and du to perform those functions. Use `man ls` and `man du` to get more information about these programs.

To count the number of files (not inluding directories): `ls -l | grep -v -c "^d"`

To count the number of files (including directories): `ls -l | wc -l`

To get the total size of all the files in the current directory (not including subdirectories): `du -Ssh .`

To get the total size of all the files in the current directory (including subdirectories): `du -sh .`
I figured writing it would be instructive in learning the CLI and Perl.
This is an entertaining exercise, for sure, even though it is sort of reinventing the wheel.

Here's a simple script I whipped up that will count files; you should be able to edit it pretty easily to handle the .apps folders as you'd prefer. (I didn't quite understand the behavior you're looking for, so I didn't attempt to code it.)

#!/usr/bin/perl

($count, $size) = tally(shift || ".");

if ($size > (1024 * 1024 * 1024))
{
$size = sprintf "%.2f GB", $size / (1024 * 1024 * 1024);
}
elsif ($size > (1024 * 1024))
{
$size = sprintf "%.2f MB", $size / (1024 * 1024);
}
elsif ($size > 1024)
{
$size = sprintf "%.2f kB", $size / 1024;
}
else
{
$size = $size . " b";
}

print "$count files, total size $size\n";

sub tally
{
my $thing = shift;
my ($count, $size, $subcount, $subsize, $entry);

if (-f $thing)
{
return (1, -s $thing);
}
elsif (-d $thing)
{
# Uncomment to count directories, too
# $count++;

opendir(DIR, $thing) || die "Can't open directory: $thing";
my @contents = grep { !/^\.$/ && !/^\.\.$/ } readdir DIR; # Read all files, even hidden ones
# my @contents = grep { !/^\./ } readdir DIR; # Read only files that are not hidden
closedir DIR;

foreach $entry (@contents)
{
($subcount, $subsize) = tally("$thing/$entry");
$count += $subcount;
$size += $subsize;
}
}

return ($count, $size);
}

- Warren
 
Last edited:

1. What is Perl issue (parsing text)?

Perl issue (parsing text) refers to a common problem encountered by users of the Perl programming language when trying to extract and manipulate information from large amounts of text. This issue arises due to the complex and versatile nature of Perl's regular expressions, which can make it challenging to correctly parse and manipulate text data.

2. Why is parsing text in Perl difficult?

Parsing text in Perl can be difficult because of the complex nature of regular expressions in the language. Perl's regular expressions support numerous special characters and operators, which can make it challenging to understand and use them effectively. Additionally, the lack of strict syntax rules in Perl can also contribute to difficulties in parsing text.

3. How can I improve my text parsing skills in Perl?

To improve your text parsing skills in Perl, it is essential to have a strong understanding of regular expressions and their syntax. You can also practice by working on various text parsing exercises and projects to gain hands-on experience. Additionally, referring to online resources and seeking help from experienced Perl programmers can also be beneficial.

4. What are some common errors encountered while parsing text in Perl?

Some common errors encountered while parsing text in Perl include incorrect syntax, improper use of regular expressions, and issues with matching patterns. These errors can be challenging to debug, so it is crucial to thoroughly test and troubleshoot your code to identify and fix any issues.

5. Are there any tools or libraries available for simplifying text parsing in Perl?

Yes, there are several tools and libraries available for simplifying text parsing in Perl. These include modules like Text::ParseWords, which can help with splitting and parsing text based on specific delimiters or patterns. Additionally, there are also various online resources and tutorials available that provide tips and techniques for more efficient text parsing in Perl.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
5
Views
2K
  • Programming and Computer Science
Replies
16
Views
3K
  • Programming and Computer Science
Replies
4
Views
1K
  • Programming and Computer Science
Replies
3
Views
307
  • Engineering and Comp Sci Homework Help
Replies
7
Views
2K
  • Programming and Computer Science
Replies
1
Views
1K
  • Programming and Computer Science
Replies
3
Views
5K
  • Sticky
  • Engineering and Comp Sci Homework Help
Replies
1
Views
13K
  • Programming and Computer Science
Replies
3
Views
1K
Replies
2
Views
862
Back
Top