Reading in ASCII Data in IDL with Format Codes

In summary, regular expressions are not supported natively in C or Fortran, but can be implemented with a library.
  • #1
polystethylene
17
0
Hi all,

I have a tab delimited ASCII file (an output from IRAF). I need to read it into IDL, but it features a timestamp column (in UT - hh:mm:ss). I assume I need to read it in with some format codes, but I'm not familiar with fortran or C (or IDL to be honest), and it seems the format code system is borried from both fortran and C.

The first few lines look this:

00:05:54 1.741252 1 788.818 38.071 22525.62 19.316 0.009
00:05:54 1.741252 2 434.052 49.973 5557.795 20.836 0.032
00:05:54 1.741252 3 461.841 66.111 6695.824 20.633 0.026
00:05:54 1.741252 4 390.721 105.630 4991.148 20.952 0.035
00:05:54 1.741252 5 61.415 119.133 5739.358 20.801 0.030

Another problem I can see is that not every value in each column is of exactly the same format (e.g. differing numbers of characters before the decimal point). The column of integers also goes from single figure to double figure.

So far I've tried this:

PRO READONELINE
infile = 'LCgen/20090127/fbexp062_2.dump'
OPENR,lun,infile,/GET_LUN
data = fltarr(10)
READF,lun,format = '(I0,":",I0,":",I0,F5,Q)',data
print,data
close,lun
END

Can anyone help me?
 
Technology news on Phys.org
  • #2
You want regular expressions. I don't know what language you're using, but you definitely want to look up its documentation and see how it implements regexps.

Your timestamp matches/\d{2}:\d{2}:\d{2}/, and your data fields match /\d+(\.\d+)?/.

Or you could go for a low-tech solution, spitting data fields along spaces. In most languages there is a string 'split' method to do this,

Code:
>>> "00:05:54 1.741252 1 788.818 38.071 22525.62 19.316 0.009".split(" ")
['00:05:54', '1.741252', '1', '788.818', '38.071', '22525.62', '19.316', '0.009']

Which you can quickly replicate in any low level language, I think.
 
  • #3
Hi signerror, thanks for the help. Never come across regular expressions before, are you suggesting them as a solution because they will be able to overcome the issue of needing different format codes for each row (to account for the changing number of characters?)
 
  • #4
If you use C, use fscanf from <stdio.h>. I don't know what the Fortran equivalent is.

http://en.wikipedia.org/wiki/Scanf

The variable string lengths of the numeric representations should not make a difference in any reading function. The different sizes don't matter: they are all parsed equally as floating-point numbers.

I take back what I said: because you are working in C and Fortran, you do not have native support for regular expressions.
 
Last edited:
  • #5
I'm actually working in IDL, which apparently has many similarities to C and fortran...

I currently have this:

PRO READONELINE
infile = 'LCgen/20090127/fbexp062_2.dump'
OPENR,lun,infile,/GET_LUN
rows = FILE_LINES(infile)

data = fltarr(10,rows)
WHILE NOT EOF(lun) DO BEGIN
READF,lun,format = '(I2,1x,I2,1x,I2,2x,F0,2x,I0,5F0,/)',data
ENDWHILE
close,lun
free_lun,lun
print,data
END

Which is reading the following data:

01:34:48 1.398525 1 782.147 31.966 22948.04 19.296 0.008
01:34:48 1.398525 2 427.381 43.868 6471.269 20.670 0.025
01:34:48 1.398525 3 455.170 60.006 6762.172 20.623 0.024
01:34:48 1.398525 4 384.050 99.521 5200.847 20.908 0.031
01:34:48 1.398525 5 54.744 113.028 5752.683 20.798 0.029

in as thus (upon doing a print,data command):

1.00000 34.0000 48.0000 1.39852 1.00000 782.147
31.9660 22948.0 19.2960 0.00800000
1.00000 34.0000 48.0000 1.39852 3.00000 455.170
60.0060 6762.17 20.6230 0.0240000
1.00000 34.0000 48.0000 1.39852 5.00000 54.7440
113.028 5752.68 20.7980 0.0290000

(different file from that above, but same format of data) - I can't stop it from what appears to be running over into a new line (and thus also alternating between data lines)... Hence my WHILE NOT loop, whose length is based on the number of rows in the file, is always cut short with:

% Procedure was compiled while active: READONELINE. Returning.
% Compiled module: READONELINE.
% READF: End of file encountered. Unit: 107, File: LCgen/20090127/fbexp062_2.dump
% Execution halted at: READONELINE 8 /Users/stefan/IDLWorkspace/LCgen/20090127/readoneline.pro
% $MAIN$
I wish I had taken any sort of programming course during my undergrad days =(
 

1. What is ASCII data and why is it important in scientific research?

ASCII (American Standard Code for Information Interchange) is a character encoding standard used for representing text in computers. It is important in scientific research because it allows data to be stored and transmitted in a readable format that can be easily manipulated and analyzed by different programs and systems.

2. How do I read ASCII data in IDL?

To read ASCII data in IDL, you can use the READ_ASCII function. This function takes in the name of the file, the number of columns, and the format codes for each column as parameters. It then reads the data into an IDL structure, which can be easily accessed and manipulated.

3. What are format codes and how do I use them in IDL?

Format codes are special symbols that specify the data type and format of each column in the ASCII data file. They are used in IDL to tell the READ_ASCII function how to read and interpret the data. Some common format codes include %d for integers, %f for floating-point numbers, and %s for strings.

4. Can I specify different format codes for different columns in my ASCII data file?

Yes, you can specify different format codes for different columns in your ASCII data file. This allows you to read in data of different types and formats, such as integers, floating-point numbers, and strings, all in the same file.

5. Are there any common errors when reading ASCII data in IDL with format codes?

One common error when reading ASCII data in IDL is using incorrect format codes for the data in your file. This can result in incorrect data being read or an error message. It is important to double-check that the format codes you are using match the data in your file to ensure accurate results.

Similar threads

  • Programming and Computer Science
Replies
8
Views
3K
  • Programming and Computer Science
Replies
2
Views
2K
Back
Top