Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

FORTRAN- How to read a non-uniformly formatted text file

  1. Jun 18, 2012 #1
    Hi all !!

    I am a little new to FORTRAN and I am sorry if the title is confusing but I couldn't come up with anything better.

    I have a file of which I am showing a snippet below:

    Code (Text):

    *>>>>>>>>CHARMM22 All-Hydrogen Topology File for Proteins <<<<<<<
    *>>>>>>>>>>>>>>>>>>>> and Nucleic Acids <<<<<<<<<<<<<<<<<<<<<<<<<
    *>>>>> Includes phi, psi cross term map (CMAP) correction <<<<<<<
    *>>>>>>>>>>>>>>>>>>>>>>   July, 2004    <<<<<<<<<<<<<<<<<<<<<<<<<<
    * All comments to ADM jr. via the CHARMM web site: www.charmm.org
    *               parameter set discussion forum
    *
    31  1

    ! references
    !
    !PROTEINS
    !
    !MacKerell, A.D., Jr,. Feig, M., Brooks, C.L., III, Extending the
    !treatment of backbone energetics in protein force fields: limitations
    !of gas-phase quantum mechanics in reproducing protein conformational
    !distributions in molecular dynamics simulations, Journal of
    !Computational Chemistry, 25: 1400-1415, 2004.
    !
    !MacKerell, Jr., A. D.; Bashford, D.; Bellott, M.; Dunbrack Jr., R.L.;
    !Evanseck, J.D.; Field, M.J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.;
    !Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F.T.K.; Mattos,
    !C.; Michnick, S.; Ngo, T.; Nguyen, D.T.; Prodhom, B.; Reiher, III,
    !W.E.; Roux, B.; Schlenkrich, M.; Smith, J.C.; Stote, R.; Straub, J.;
    !Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M.  All-atom
    !empirical potential for molecular modeling and dynamics Studies of
    !proteins.  Journal of Physical Chemistry B, 1998, 102, 3586-3616.
    !
    !IONS (see lipid and nucleic acid topology and parameter files for
    !additional ions
    !
    !ZINC
    !
    !Roland H. Stote and Martin Karplus, Zinc Binding in Proteins and
    !Solution: A Simple but Accurate Nonbonded Representation, PROTEINS:
    !Structure, Function, and Genetics 23:12-31 (1995)
    !
    !NUCLEIC ACIDS
    !
    !Foloppe, N. and MacKerell, Jr., A.D. "All-Atom Empirical Force Field for
    !Nucleic Acids: 2) Parameter Optimization Based on Small Molecule and
    !Condensed Phase Macromolecular Target Data. 2000, 21: 86-104.
    !
    !and
    !
    !MacKerell, Jr., A.D. and Banavali, N. "All-Atom Empirical Force Field for
    !Nucleic Acids: 2) Application to Molecular Dynamics Simulations of DNA
    !and RNA in Solution. 2000, 21: 105-120.
    !

    MASS     1 H      1.00800 H ! polar H
    MASS     2 HC     1.00800 H ! N-ter H
    MASS     3 HA     1.00800 H ! nonpolar H
    MASS     4 HT     1.00800 H ! TIPS3P WATER HYDROGEN
    .
    .
    .
    MASS    95 F3    18.99800 F ! Fluorine, trifluoro (see toppar_all22_prot_fluoro_alkanes.str)
    MASS    99 DUM    0.00000 H ! dummy atom
    !see NA section          --------------NOTICE THERE ARE COMMENTS B/W MASS RECORDS ALSO
    !MASS   100 SOD  22.989770 NA ! Sodium Ion
    !MASS   101 MG   24.305000 MG ! Magnesium Ion
    !MASS   102 POT  39.102000 K  ! Potassium Ion! check masses
    !MASS   103 CES 132.900000 CS ! Cesium Ion
    !MASS   104 CAL  40.080000 CA ! Calcium Ion
    !MASS   105 CLA  35.450000 CL ! Chloride Ion
    !MASS   106 ZN   65.370000 ZN ! zinc (II) cation
    !NA section
    !MASS 101    HT    1.008000 H ! TIPS3P WATER HYDROGEN

     
    From this file, I have to read the records starting with "MASS" (which end just before records starting with "DECL") and make a list out of it. I have to ignore all other lines. What is the best way to do this??

    My approach:
    Code (Text):

            line_code='    '
        do while(line_code .ne. 'DECL')
            read(1000,*) line_code, temp_int1, temp_cname,temp_dbl1  !! ERROR IN THIS LINE, MOST PROBABLY
            if(line_code .eq. 'MASS')then
                atom_type_info(tot_atom_types)%type_code=temp_int1
                atom_type_info(tot_atom_types)%type_cname=temp_cname
                atom_type_info(tot_atom_types)%mass=temp_dbl1
                tot_atom_types=tot_atom_types+1
            endif
               
        enddo
     
    The problem is that in the read(1000,*) line, because of the comments in the file, a string is attempted to be read into an integer, which gives an i/o error.
    I thought of another way, but its too much work. (first scan lines with read(1000,*) line_code and keep a line_counter. When you encounter MASS record, REWIND and read(1000,*)some_temp till line_counter-1. Then start reading the MASS record.) All this is because we cant REWIND one line before, we can only REWIND to the beginning of the file (right??)

    Is there a better way? Like reading the whole line as a character array (with tabs and spaces), then reading from that array the first string. IF that is "MASS" go on and read from that line (stored as character array) rest of the values, otherwise ignore it. This is easy in C/C++ through getline and sscanf, but is there such a way in FORTRAN??

    EDIT:
    I found a way to read a line and not advance read pointer to next line through ADVANCE='NO' in READ statement, but now the problem is that when I read everything as a character, and there comes an integer in between, it gives an error.
    Even if someone could tell me how to read a file in fortran line by line treating each line as a string (even if it is a number), that also might be a lot of help.

    Thank you in advance !!
     
    Last edited: Jun 18, 2012
  2. jcsd
  3. Jun 18, 2012 #2
    Never Mind. Got it.
    For someone else, you can read each line of text file as a string in fortran as follows:

    do
    read(1,'(a)',END=10) line !!END tells which statement to go to if all the lines have already been read
    enddo
    10 !your next statement here

    Also, from this line you read as a string, you can read formatted input. Just put the name of string in the place you put 'unit number' in read, as below:
    read(line,*) line_code, temp_int,.....
     
  4. Jun 18, 2012 #3
    Yes, the way to do it is to first read the line as a character string and, THEN, do what is called an internal read.

    The one trick that allows this, though, is to read the line by specifying a format long enough to accommodate all you are trying to read out of it. Or, you can specify a very long format that always read the entire line.

    Because in fortran spaces work as separators, you need to specify a format in order to read a string that is not enclosed in quotes and that should include spaces in itself. And, so, the most important line in the code that follows is:
    Code (Text):

    read(*,'(A26)') line
     
    The following code does the trick; you can test it by compiling and running from the command line using re-direction (mass < inputfile) :

    Code (Text):

    program mass
       character line*26
       character code*4, temp_cname*4
       integer temp_int1
       real temp_dbl1  
       
       line(1:4) = '    '
       do while (line(1:4) .ne. 'DECL')
          read(*,'(A26)') line
          if (line(1:4) .eq. "MASS") then
             read(line,*) code, temp_int1, temp_cname, temp_dbl1
             write(*,*) code, temp_int1, temp_cname, temp_dbl1 ! temporary line
             ! assign read values to pointer
          endif
       enddo
    end
     
    The code above declares the variable "line" to be just long enough to read up the last value you are interested in (the double)...you know as far as the number of characters to be read from the line (26); or, you could simply read the entire line every time by declaring "line" to be something like 130 characters long, instead...safer, just in case the format of your input file changes a bit.
     
  5. Jun 18, 2012 #4
    Thank you for your reply !!

    My own code is very similar to yours.


    Cheers !
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook