# FORTRAN- How to read a non-uniformly formatted text file

• Fortran

## Main Question or Discussion Point

Hi all !!

I am a little new to FORTRAN and I am sorry if the title is confusing but I couldn't come up with anything better.

I have a file of which I am showing a snippet below:

Code:
*>>>>>>>>CHARMM22 All-Hydrogen Topology File for Proteins <<<<<<<
*>>>>>>>>>>>>>>>>>>>> and Nucleic Acids <<<<<<<<<<<<<<<<<<<<<<<<<
*>>>>> Includes phi, psi cross term map (CMAP) correction <<<<<<<
*>>>>>>>>>>>>>>>>>>>>>>   July, 2004    <<<<<<<<<<<<<<<<<<<<<<<<<<
* All comments to ADM jr. via the CHARMM web site: www.charmm.org
*               parameter set discussion forum
*
31  1

! references
!
!PROTEINS
!
!MacKerell, A.D., Jr,. Feig, M., Brooks, C.L., III, Extending the
!treatment of backbone energetics in protein force fields: limitations
!of gas-phase quantum mechanics in reproducing protein conformational
!distributions in molecular dynamics simulations, Journal of
!Computational Chemistry, 25: 1400-1415, 2004.
!
!MacKerell, Jr., A. D.; Bashford, D.; Bellott, M.; Dunbrack Jr., R.L.;
!Evanseck, J.D.; Field, M.J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.;
!Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F.T.K.; Mattos,
!C.; Michnick, S.; Ngo, T.; Nguyen, D.T.; Prodhom, B.; Reiher, III,
!W.E.; Roux, B.; Schlenkrich, M.; Smith, J.C.; Stote, R.; Straub, J.;
!Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M.  All-atom
!empirical potential for molecular modeling and dynamics Studies of
!proteins.  Journal of Physical Chemistry B, 1998, 102, 3586-3616.
!
!IONS (see lipid and nucleic acid topology and parameter files for
!
!ZINC
!
!Roland H. Stote and Martin Karplus, Zinc Binding in Proteins and
!Solution: A Simple but Accurate Nonbonded Representation, PROTEINS:
!Structure, Function, and Genetics 23:12-31 (1995)
!
!NUCLEIC ACIDS
!
!Foloppe, N. and MacKerell, Jr., A.D. "All-Atom Empirical Force Field for
!Nucleic Acids: 2) Parameter Optimization Based on Small Molecule and
!Condensed Phase Macromolecular Target Data. 2000, 21: 86-104.
!
!and
!
!MacKerell, Jr., A.D. and Banavali, N. "All-Atom Empirical Force Field for
!Nucleic Acids: 2) Application to Molecular Dynamics Simulations of DNA
!and RNA in Solution. 2000, 21: 105-120.
!

MASS     1 H      1.00800 H ! polar H
MASS     2 HC     1.00800 H ! N-ter H
MASS     3 HA     1.00800 H ! nonpolar H
MASS     4 HT     1.00800 H ! TIPS3P WATER HYDROGEN
.
.
.
MASS    95 F3    18.99800 F ! Fluorine, trifluoro (see toppar_all22_prot_fluoro_alkanes.str)
MASS    99 DUM    0.00000 H ! dummy atom
!see NA section          --------------NOTICE THERE ARE COMMENTS B/W MASS RECORDS ALSO
!MASS   100 SOD  22.989770 NA ! Sodium Ion
!MASS   101 MG   24.305000 MG ! Magnesium Ion
!MASS   102 POT  39.102000 K  ! Potassium Ion! check masses
!MASS   103 CES 132.900000 CS ! Cesium Ion
!MASS   104 CAL  40.080000 CA ! Calcium Ion
!MASS   105 CLA  35.450000 CL ! Chloride Ion
!MASS   106 ZN   65.370000 ZN ! zinc (II) cation
!NA section
!MASS 101    HT    1.008000 H ! TIPS3P WATER HYDROGEN
From this file, I have to read the records starting with "MASS" (which end just before records starting with "DECL") and make a list out of it. I have to ignore all other lines. What is the best way to do this??

My approach:
Code:
        line_code='    '
do while(line_code .ne. 'DECL')
read(1000,*) line_code, temp_int1, temp_cname,temp_dbl1  !! ERROR IN THIS LINE, MOST PROBABLY
if(line_code .eq. 'MASS')then
atom_type_info(tot_atom_types)%type_code=temp_int1
atom_type_info(tot_atom_types)%type_cname=temp_cname
atom_type_info(tot_atom_types)%mass=temp_dbl1
tot_atom_types=tot_atom_types+1
endif

enddo
The problem is that in the read(1000,*) line, because of the comments in the file, a string is attempted to be read into an integer, which gives an i/o error.
I thought of another way, but its too much work. (first scan lines with read(1000,*) line_code and keep a line_counter. When you encounter MASS record, REWIND and read(1000,*)some_temp till line_counter-1. Then start reading the MASS record.) All this is because we cant REWIND one line before, we can only REWIND to the beginning of the file (right??)

Is there a better way? Like reading the whole line as a character array (with tabs and spaces), then reading from that array the first string. IF that is "MASS" go on and read from that line (stored as character array) rest of the values, otherwise ignore it. This is easy in C/C++ through getline and sscanf, but is there such a way in FORTRAN??

EDIT:
I found a way to read a line and not advance read pointer to next line through ADVANCE='NO' in READ statement, but now the problem is that when I read everything as a character, and there comes an integer in between, it gives an error.
Even if someone could tell me how to read a file in fortran line by line treating each line as a string (even if it is a number), that also might be a lot of help.

Last edited:

Related Programming and Computer Science News on Phys.org
Never Mind. Got it.
For someone else, you can read each line of text file as a string in fortran as follows:

do
read(1,'(a)',END=10) line !!END tells which statement to go to if all the lines have already been read
enddo

Also, from this line you read as a string, you can read formatted input. Just put the name of string in the place you put 'unit number' in read, as below:

Yes, the way to do it is to first read the line as a character string and, THEN, do what is called an internal read.

The one trick that allows this, though, is to read the line by specifying a format long enough to accommodate all you are trying to read out of it. Or, you can specify a very long format that always read the entire line.

Because in fortran spaces work as separators, you need to specify a format in order to read a string that is not enclosed in quotes and that should include spaces in itself. And, so, the most important line in the code that follows is:
Code:
read(*,'(A26)') line
The following code does the trick; you can test it by compiling and running from the command line using re-direction (mass < inputfile) :

Code:
program mass
character line*26
character code*4, temp_cname*4
integer temp_int1
real temp_dbl1

line(1:4) = '    '
do while (line(1:4) .ne. 'DECL')
if (line(1:4) .eq. "MASS") then
write(*,*) code, temp_int1, temp_cname, temp_dbl1 ! temporary line
! assign read values to pointer
endif
enddo
end
The code above declares the variable "line" to be just long enough to read up the last value you are interested in (the double)...you know as far as the number of characters to be read from the line (26); or, you could simply read the entire line every time by declaring "line" to be something like 130 characters long, instead...safer, just in case the format of your input file changes a bit.