Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Text Processing Language for a computaional physicist

  1. Mar 9, 2013 #1
    So I have been using a combination of Linux shell and Fortran to process big output files from my simulations. But I realized that it would save me a lot of time and effort if I can learn a text processing language. I got different recommendations and it seems that AWK and perl are on the top of the list. Ideally I need something that I can learn fast but is also efficient.
    Any tips would be appreciated.
  2. jcsd
  3. Mar 9, 2013 #2


    Staff: Mentor

    I'd go with AWK. It's on every distro of Unix, and is available on Windows too.

    It doesn't have all the special character classes of Perl but is perhaps easier to learn as it follows a C-like syntax.

    Code (Text):

    #!/bin/awk -f

    BEGIN {


    /......../ {.......}


    END {

    Program format is pretty straight forward with initializing begin block and finalizing end block and a bunch of matching rules.

    If you have specific text format to parse I can help you.

    One caveat is that when it fails it may not be clear about what line the error is on. I haven't run into that much though. I use awk a lot for finding files, creating txt based menu selections, scanning log4j output and for simplifying other arcane programmer tasks.
  4. Mar 10, 2013 #3
    Thank you, jedishrfu! I actually do not have something that I want to parse right now, but in the next few months I need to automate a lot of post-simulation analysis to save myself time. I will definitely get back to the forums if I have question. Thanks a lot for offering help!

    Just one more question, do you recommenced certain book to learn AWK?
  5. Mar 10, 2013 #4


    Staff: Mentor

    Last edited by a moderator: May 6, 2017
  6. Mar 12, 2013 #5
    Thank you very much! I think I can borrow a copy of the book from the library.
  7. Mar 12, 2013 #6
    I would suggest perl if you want something that's quick to use and powerful, python if you want something more like a real programming language (but which requires a little more effort to do stuff). I honestly didn't know people still use awk...
  8. Mar 13, 2013 #7


    Staff: Mentor

    Both are good, but AWK is classic. Others to consider are ruby and groovy which provide OO features. Groovy is especially nice because of its close connection to Java, so close that you can copy java source into your script and with very few changes make it work as is.

    the trouble with these is that they sometimes aren't installed on the *nux distro that you're using but AWK is alway present. Awk is also used a lot as one-liners in shell scripts to get things done that the script language just can't.

    Biologists use Perl, Python and sometimes Ruby a lot for data manipulation.

    In my own work I use AWK for developer tools and Groovy for java source code parsing when I need to migrate large amounts of code and need to play with the java syntax a lot or work with XML files. The regular expressions used in AWK are the same as in many other languages so you can't go wrong learning it first with AWK.

    For me the best feature of AWK was its closeness to C syntax, ability to run other commands and parse the output and the fact that it had associative arrays which I used for a lot of table lookup tasks. groovy has many of the same features with associative arrays being replaced with properties objects.
    Last edited: Mar 13, 2013
  9. Mar 16, 2013 #8


    User Avatar
    Gold Member

    Most modern programming languages have regular expressions that are quite powerful. So unless you are having severe performance issues, use a "good" language that you already know, and just learn the regular expressions.

    Perl and awk are not my favorites...people in here will probably scream, but they can be rather difficult to read. I won't argue with their power, but many of the the same capabilities will now exist in, say, Java.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook