Text Processing Language for a computaional physicist

AI Thread Summary
The discussion focuses on the need for a text processing language to efficiently handle large output files from simulations. AWK and Perl are the top recommendations, with AWK being favored for its simplicity and availability across Unix systems. Users highlight AWK's C-like syntax, ease of learning, and effectiveness for tasks like file searching and log scanning. While Perl is noted for its power, some participants suggest Python for a more comprehensive programming experience. Overall, AWK is recommended for its accessibility and utility in post-simulation analysis.
Useful nucleus
Messages
374
Reaction score
62
So I have been using a combination of Linux shell and Fortran to process big output files from my simulations. But I realized that it would save me a lot of time and effort if I can learn a text processing language. I got different recommendations and it seems that AWK and perl are on the top of the list. Ideally I need something that I can learn fast but is also efficient.
Any tips would be appreciated.
 
Technology news on Phys.org
I'd go with AWK. It's on every distro of Unix, and is available on Windows too.

It doesn't have all the special character classes of Perl but is perhaps easier to learn as it follows a C-like syntax.

Code:
#!/bin/awk -f

BEGIN {

}

/.../ {...}

...

END {

}

Program format is pretty straight forward with initializing begin block and finalizing end block and a bunch of matching rules.

If you have specific text format to parse I can help you.

One caveat is that when it fails it may not be clear about what line the error is on. I haven't run into that much though. I use awk a lot for finding files, creating txt based menu selections, scanning log4j output and for simplifying other arcane programmer tasks.
 
Thank you, jedishrfu! I actually do not have something that I want to parse right now, but in the next few months I need to automate a lot of post-simulation analysis to save myself time. I will definitely get back to the forums if I have question. Thanks a lot for offering help!

Just one more question, do you recommenced certain book to learn AWK?
 
Thank you very much! I think I can borrow a copy of the book from the library.
 
I would suggest perl if you want something that's quick to use and powerful, python if you want something more like a real programming language (but which requires a little more effort to do stuff). I honestly didn't know people still use awk...
 
Coin said:
I would suggest perl if you want something that's quick to use and powerful, python if you want something more like a real programming language (but which requires a little more effort to do stuff). I honestly didn't know people still use awk...

Both are good, but AWK is classic. Others to consider are ruby and groovy which provide OO features. Groovy is especially nice because of its close connection to Java, so close that you can copy java source into your script and with very few changes make it work as is.

the trouble with these is that they sometimes aren't installed on the *nux distro that you're using but AWK is alway present. Awk is also used a lot as one-liners in shell scripts to get things done that the script language just can't.

Biologists use Perl, Python and sometimes Ruby a lot for data manipulation.

In my own work I use AWK for developer tools and Groovy for java source code parsing when I need to migrate large amounts of code and need to play with the java syntax a lot or work with XML files. The regular expressions used in AWK are the same as in many other languages so you can't go wrong learning it first with AWK.

For me the best feature of AWK was its closeness to C syntax, ability to run other commands and parse the output and the fact that it had associative arrays which I used for a lot of table lookup tasks. groovy has many of the same features with associative arrays being replaced with properties objects.
 
Last edited:
Most modern programming languages have regular expressions that are quite powerful. So unless you are having severe performance issues, use a "good" language that you already know, and just learn the regular expressions.

Perl and awk are not my favorites...people in here will probably scream, but they can be rather difficult to read. I won't argue with their power, but many of the the same capabilities will now exist in, say, Java.
 
Back
Top