Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Reading in ASCII Data in IDL with Format Codes

  1. Feb 12, 2009 #1
    Hi all,

    I have a tab delimited ASCII file (an output from IRAF). I need to read it into IDL, but it features a timestamp column (in UT - hh:mm:ss). I assume I need to read it in with some format codes, but I'm not familiar with fortran or C (or IDL to be honest), and it seems the format code system is borried from both fortran and C.

    The first few lines look this:

    00:05:54 1.741252 1 788.818 38.071 22525.62 19.316 0.009
    00:05:54 1.741252 2 434.052 49.973 5557.795 20.836 0.032
    00:05:54 1.741252 3 461.841 66.111 6695.824 20.633 0.026
    00:05:54 1.741252 4 390.721 105.630 4991.148 20.952 0.035
    00:05:54 1.741252 5 61.415 119.133 5739.358 20.801 0.030

    Another problem I can see is that not every value in each column is of exactly the same format (e.g. differing numbers of characters before the decimal point). The column of integers also goes from single figure to double figure.

    So far I've tried this:

    PRO READONELINE
    infile = 'LCgen/20090127/fbexp062_2.dump'
    OPENR,lun,infile,/GET_LUN
    data = fltarr(10)
    READF,lun,format = '(I0,":",I0,":",I0,F5,Q)',data
    print,data
    close,lun
    END

    Can anyone help me?
     
  2. jcsd
  3. Feb 12, 2009 #2
    You want regular expressions. I don't know what language you're using, but you definitely want to look up its documentation and see how it implements regexps.

    Your timestamp matches/\d{2}:\d{2}:\d{2}/, and your data fields match /\d+(\.\d+)?/.

    Or you could go for a low-tech solution, spitting data fields along spaces. In most languages there is a string 'split' method to do this,

    Code (Text):
    >>> "00:05:54 1.741252 1 788.818 38.071 22525.62 19.316 0.009".split(" ")
    ['00:05:54', '1.741252', '1', '788.818', '38.071', '22525.62', '19.316', '0.009']
     
    Which you can quickly replicate in any low level language, I think.
     
  4. Feb 12, 2009 #3
    Hi signerror, thanks for the help. Never come across regular expressions before, are you suggesting them as a solution because they will be able to overcome the issue of needing different format codes for each row (to account for the changing number of characters?)
     
  5. Feb 12, 2009 #4
    If you use C, use fscanf from <stdio.h>. I don't know what the Fortran equivalent is.

    http://en.wikipedia.org/wiki/Scanf

    The variable string lengths of the numeric representations should not make a difference in any reading function. The different sizes don't matter: they are all parsed equally as floating-point numbers.

    I take back what I said: because you are working in C and Fortran, you do not have native support for regular expressions.
     
    Last edited: Feb 12, 2009
  6. Feb 12, 2009 #5
    I'm actually working in IDL, which apparently has many similarities to C and fortran...

    I currently have this:

    PRO READONELINE
    infile = 'LCgen/20090127/fbexp062_2.dump'
    OPENR,lun,infile,/GET_LUN
    rows = FILE_LINES(infile)

    data = fltarr(10,rows)
    WHILE NOT EOF(lun) DO BEGIN
    READF,lun,format = '(I2,1x,I2,1x,I2,2x,F0,2x,I0,5F0,/)',data
    ENDWHILE
    close,lun
    free_lun,lun
    print,data
    END

    Which is reading the following data:

    01:34:48 1.398525 1 782.147 31.966 22948.04 19.296 0.008
    01:34:48 1.398525 2 427.381 43.868 6471.269 20.670 0.025
    01:34:48 1.398525 3 455.170 60.006 6762.172 20.623 0.024
    01:34:48 1.398525 4 384.050 99.521 5200.847 20.908 0.031
    01:34:48 1.398525 5 54.744 113.028 5752.683 20.798 0.029

    in as thus (upon doing a print,data command):

    1.00000 34.0000 48.0000 1.39852 1.00000 782.147
    31.9660 22948.0 19.2960 0.00800000
    1.00000 34.0000 48.0000 1.39852 3.00000 455.170
    60.0060 6762.17 20.6230 0.0240000
    1.00000 34.0000 48.0000 1.39852 5.00000 54.7440
    113.028 5752.68 20.7980 0.0290000

    (different file from that above, but same format of data) - I can't stop it from what appears to be running over into a new line (and thus also alternating between data lines)... Hence my WHILE NOT loop, whose length is based on the number of rows in the file, is always cut short with:

    % Procedure was compiled while active: READONELINE. Returning.
    % Compiled module: READONELINE.
    % READF: End of file encountered. Unit: 107, File: LCgen/20090127/fbexp062_2.dump
    % Execution halted at: READONELINE 8 /Users/stefan/IDLWorkspace/LCgen/20090127/readoneline.pro
    % $MAIN$



    I wish I had taken any sort of programming course during my undergrad days =(
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Reading in ASCII Data in IDL with Format Codes
Loading...