Text Processing Language for a computaional physicist

Click For Summary

Discussion Overview

The discussion centers around the selection of a text processing language suitable for computational physicists, particularly in the context of automating post-simulation analysis. Participants explore various options, including AWK, Perl, Python, Ruby, and Groovy, considering factors such as ease of learning, efficiency, and availability across platforms.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant suggests using AWK due to its availability on Unix systems and its relatively simple C-like syntax, while noting its limitations in error reporting.
  • Another participant recommends Perl for its power and quick usability, and mentions Python as a more comprehensive programming language that requires more effort.
  • Some participants express surprise that AWK is still in use, indicating a preference for more modern languages like Perl, Python, Ruby, and Groovy.
  • There is a mention of the classic Aho, Kernighan, and Weinberger book as a resource for learning AWK, alongside an online reference.
  • One participant highlights the usefulness of AWK for specific tasks like file searching and log scanning, emphasizing its associative arrays feature.
  • Concerns are raised about the readability of Perl and AWK, with a suggestion that many capabilities are now available in languages like Java.

Areas of Agreement / Disagreement

Participants express differing opinions on the best text processing language to use, with no consensus reached. Some favor AWK for its simplicity and availability, while others advocate for Perl or Python for their power and versatility.

Contextual Notes

Participants mention various programming languages and their features without resolving the debate over which is superior for text processing tasks. There is also an acknowledgment of the varying levels of familiarity and comfort with different languages among users.

Who May Find This Useful

This discussion may be useful for computational physicists and other STEM professionals looking to automate text processing tasks and seeking recommendations on programming languages suited for such purposes.

Useful nucleus
Messages
374
Reaction score
62
So I have been using a combination of Linux shell and Fortran to process big output files from my simulations. But I realized that it would save me a lot of time and effort if I can learn a text processing language. I got different recommendations and it seems that AWK and perl are on the top of the list. Ideally I need something that I can learn fast but is also efficient.
Any tips would be appreciated.
 
Technology news on Phys.org
I'd go with AWK. It's on every distro of Unix, and is available on Windows too.

It doesn't have all the special character classes of Perl but is perhaps easier to learn as it follows a C-like syntax.

Code:
#!/bin/awk -f

BEGIN {

}

/.../ {...}

...

END {

}

Program format is pretty straight forward with initializing begin block and finalizing end block and a bunch of matching rules.

If you have specific text format to parse I can help you.

One caveat is that when it fails it may not be clear about what line the error is on. I haven't run into that much though. I use awk a lot for finding files, creating txt based menu selections, scanning log4j output and for simplifying other arcane programmer tasks.
 
Thank you, jedishrfu! I actually do not have something that I want to parse right now, but in the next few months I need to automate a lot of post-simulation analysis to save myself time. I will definitely get back to the forums if I have question. Thanks a lot for offering help!

Just one more question, do you recommenced certain book to learn AWK?
 
Thank you very much! I think I can borrow a copy of the book from the library.
 
I would suggest perl if you want something that's quick to use and powerful, python if you want something more like a real programming language (but which requires a little more effort to do stuff). I honestly didn't know people still use awk...
 
Coin said:
I would suggest perl if you want something that's quick to use and powerful, python if you want something more like a real programming language (but which requires a little more effort to do stuff). I honestly didn't know people still use awk...

Both are good, but AWK is classic. Others to consider are ruby and groovy which provide OO features. Groovy is especially nice because of its close connection to Java, so close that you can copy java source into your script and with very few changes make it work as is.

the trouble with these is that they sometimes aren't installed on the *nux distro that you're using but AWK is alway present. Awk is also used a lot as one-liners in shell scripts to get things done that the script language just can't.

Biologists use Perl, Python and sometimes Ruby a lot for data manipulation.

In my own work I use AWK for developer tools and Groovy for java source code parsing when I need to migrate large amounts of code and need to play with the java syntax a lot or work with XML files. The regular expressions used in AWK are the same as in many other languages so you can't go wrong learning it first with AWK.

For me the best feature of AWK was its closeness to C syntax, ability to run other commands and parse the output and the fact that it had associative arrays which I used for a lot of table lookup tasks. groovy has many of the same features with associative arrays being replaced with properties objects.
 
Last edited:
Most modern programming languages have regular expressions that are quite powerful. So unless you are having severe performance issues, use a "good" language that you already know, and just learn the regular expressions.

Perl and awk are not my favorites...people in here will probably scream, but they can be rather difficult to read. I won't argue with their power, but many of the the same capabilities will now exist in, say, Java.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 40 ·
2
Replies
40
Views
9K
Replies
9
Views
3K
  • · Replies 45 ·
2
Replies
45
Views
6K
Replies
27
Views
15K
Replies
1
Views
4K
  • · Replies 18 ·
Replies
18
Views
7K