Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Text file that is read into an array

  1. Dec 2, 2015 #1
    I have a program "probe.f90" that reads in a text file. It runs. I am having trouble reading in the second text file (when prompted by user).

    How could I turn this bit of code into a function? I was thinking about a function that strips the header from the fasta file. Then read into an array. Then compute amino acid frequency (last)!

    I looked at some examples of character functions in fortran 90, but I see mostly integer functions.
    Mod note: Edited the code so that the indentation matches the program structure.
    Code (Fortran):

        PROGRAM PROBE
        implicit none
        character(len=200) :: string
        character, allocatable :: baseq(:), tmpstr(:), newstr(:)
        integer :: nbase, errstat, tmplen, i

        nbase = 0
        errstat = 0

        allocate( baseq(0) )

        open(unit=101, file="TitinFastaFormat.txt", form="formatted", status="old")
        do
           read (101,*,iostat=errstat) string
           if (errstat .ne. 0) exit
           if ( string(1:1) .ne. ">" ) then
                tmplen = len_trim(string)
                nbase = nbase + tmplen
                allocate ( tmpstr( tmplen ) )
                do i=1,tmplen
                   tmpstr(i) = string(i:i)
                end do
                allocate( newstr( nbase ))
                newstr(1:size(baseq)) = baseq
                call move_alloc(newstr,baseq)
                baseq(nbase-tmplen+1:nbase) = tmpstr
                deallocate ( tmpstr )
           end if
        end do
        close(101)
        write (*,"(A6, tr1, 999999(A1))"),baseq
        !write (*,"(A6, tr1, i0)") "NBASE:", nbase
        deallocate (baseq)

    end program probe
     
     

    Attached Files:

    Last edited by a moderator: Dec 3, 2015
  2. jcsd
  3. Dec 3, 2015 #2

    Mark44

    Staff: Mentor

    I don't recommend doing this. A function (or subroutine) should essentially do one thing, and the name should be chosen so that the one thing it does is obvious from the name. A better strategy, I believe, would be to write one subroutine that strips the header from the file, and a second subroutine that reads data from the file into an array, and finally, a third subroutine that does the amino acid frequency calculations.

    I see all of these as subroutines, rather than functions, because each one performs some operation, but doesn't return a value, as a function should do. Each sub performs an action, so the name of each should be a verb, such as StripHeader for the first, and ReadData for the second, and maybe CalcAminoAcidFreq for the third.

    If you decide to go this route, you'll then need to figure out what parameters each sub needs so that it can perform its task efficiently and effectively.
     
  4. Dec 3, 2015 #3
    Three subroutines makes more sense. Question: could I have two more subroutines that add the characters to an array. What would be the best way to sort through the parameters?
     
  5. Dec 3, 2015 #4

    Mark44

    Staff: Mentor

    My suggestion of three subs came from you description of what seemed to me to be three separate tasks, although, after taking another look at your file, I don't think it would hurt to combine the header stripping and reading into an array into one sub, especially since Fortran doesn't really treat files as first-class objects. By that, I mean, I don't think there's an elegant way to pass the file into and out of a subroutine. Many other programming languages have that ability

    By "sorting through the parameters" I don't think you really mean actually sorting through them -- instead, just figuring out what they are. In each case, just identify what information the sub needs to do its job.

    So far, the main tasks seem to be 1) pulling the data out of the file and storing it in an array, and 2) calculating the amino acid frequencies.
    For the first task, one parameter (in) would be a string that contains the name of the file, and another would be the array (out) into which the data would be stored (omitting the header, which seems to be the first line of the text file). For the second task, the subroutine would need the array (in) , and possibly another array (out) with the frequencies.
     
  6. Dec 3, 2015 #5
    Thank you!
     
  7. Dec 3, 2015 #6

    Mark44

    Staff: Mentor

    Is there some reason you're using Fortran for this problem? For working with strings, many other languages are better choices. That's not to say you can't work with strings in Fortran, but it's easier to do this kind of work in many other languages.

    Out of curiosity, what are you planning to do with the text file? I've copied the first four lines of the file you provided. What sort of analysis is your program supposed to do with the alphabet soup of the 2nd and following lines? (I'm assuming that the first line is the header.)

    Code (Text):

    >sp|Q8WZ42|TITIN_HUMAN Titin OS=Homo sapiens GN=TTN PE=1 SV=4
    MTTQAPTFTQPLQSVVVLEGSTATFEAHISGFPVPEVSWFRDGQVISTSTLPGVQISFSD
    GRAKLTIPAVTKANSGRYSLKATNGSGQATSTAELLVKAETAPPNFVQRLQSMTVRQGSQ
    VRLQVRVTGIPTPVVKFYRDGAEIQSSLDFQISQEGDLYSLLIAEAYPEDSGTYSVNATN
     
     
  8. Dec 3, 2015 #7
    I am using fortran for the first time. Typically, I would do this type of problem in Python or BioPython. As you can see, the first line is a header. That is the first 4 lines of the FASTA file of the human gene Titin. I have another FASTA file (with a probe). I would like to read these "sequences" into an array. Find the frequency and also find the number of matches of the probe for the Titan gene. Technically, the "alphabet soup" is an amino acid sequence that codes for the Titin gene.
     
  9. Dec 3, 2015 #8
    I am having some difficulty taking my first program and converting that to a subroutine. I was only working the text file. I was trying to get the sequence to print to the screen.

    Code (Fortran):
     
            program aa
       
            character:: filename*20
            character, allocatable :: baseq(:)
            call readfile("TitinFastaFormat.txt")
           
        end program aa

          subroutine readfile(filename)
              integer ::eof, tempLength, nBase, i
              character, allocatable :: baseq(:), tempString(:), newString(:)
              character:: filename*20
            character :: stringFile*200
         
          eof=0
              open (102, file=filename, Form="formatted", status="old" )
              Read(102,*)
              do
                  !Read(102,*  iostate = eof) stringFile  ! <---------------------
              if (eof .ne. 0) exit
              if(stringFile(1:1) .ne. ">") then
                  tempLength = len_trim(stringFile)
                  nBase = nBase + tempLength
              allocate ( tempString(tempLength))
         
              do i=1, tempLength
                  tempString(i) = stringFile(i:i)
              end do
         
              allocate (newString (nBase))
              newString (1:size(baseq)) = baseq
              call move_alloc(newString, baseq)
              baseq(nbase - tempLength+1:nBase) = tempString
              deallocate(tempString)
         
              end if
            end do
            close(101)
           
            write (*,"(A6, tr1, 999999(A1))") "BASEQ:", baseq
            write (*,"(A6, tr1, i0)") "NBASE:", nbase

            deallocate (baseq)
       
        end
     
     
  10. Dec 3, 2015 #9
    I know the subroutine is long. How do I manipulate the intent to get my array to print?



    I fixed this command
    Code (Fortran):

    end program aa
    .
    .
    .
    Read(102,  iostat = eof) stringFile
     
     
  11. Dec 3, 2015 #10

    Mark44

    Staff: Mentor

    I think Python would be a good choice. I don't know anything about BioPython, so don't have any knowledge of its capabilities.

    I'll take a look at the latest code you included, and make some comments.
     
  12. Dec 3, 2015 #11

    Mark44

    Staff: Mentor

    It's not that long...
    Let me address your readfile subroutine first, and then I'll come back to this.

    Code (Fortran):

    subroutine readfile(filename)
        integer ::eof, tempLength, nBase, i
       character, allocatable :: baseq(:), tempString(:), newString(:)
       character:: filename*20
       character :: stringFile*200   ! *** stringFile is declared to hold 200 characters. That's definitely not enough to hold the entire contents of the file I saw.
         
       eof=0
       open (102, file=filename, Form="formatted", status="old" )
       Read(102,*)    ! ***  ??? This doesn't do anything -- there's no variable to read into
       do
              !Read(102,*  iostate = eof) stringFile  ! <---------------------
              if (eof .ne. 0) exit
              if(stringFile(1:1) .ne. ">") then
                  tempLength = len_trim(stringFile)
                  nBase = nBase + tempLength
                  allocate ( tempString(tempLength))
       
                  do i=1, tempLength
                      tempString(i) = stringFile(i:i)
                  end do
       
                  allocate (newString (nBase))
                  newString (1:size(baseq)) = baseq
                  call move_alloc(newString, baseq)
                  baseq(nbase - tempLength+1:nBase) = tempString
                  deallocate(tempString)
              end if
        end do
       close(101)
           
       write (*,"(A6, tr1, 999999(A1))") "BASEQ:", baseq
       write (*,"(A6, tr1, i0)") "NBASE:", nbase

       deallocate (baseq)

    end  ! *** Needs to be end subroutine
     
    I added a few comments, the ones that start with ! ***

    The subroutine should have two parameters: a string to hold the filename, and an array that will hold the contents of the file. That array would be declared in your main program. When the readfile sub finishes, the array will be initialized appropriately.

    I don't know how big the array should be, but the file you attached is about 36 KB in size. I don't know whether you program needs to be able to hold the entire contents of this textfile or not.
     
  13. Dec 4, 2015 #12

    Mark44

    Staff: Mentor

    Code (Python):

    def ReadFile(file_name):
        infile = open(file_name, "r")
        while (True):
           line = infile.readline()
           if line[0:1] == '>': continue  # Skip the header
           if line == "":
               infile.close()
               return lines
               break
           lines.append(line)

    def PrintFile(lines):
        for line in lines:
           print (line)


    inFile = "TitinFasta.txt"
    lines = []
    lines = ReadFile(inFile)
    PrintFle(lines)
    The "main" part of this Python program is the four lines at the bottom. The lines variable is a list (like an array, but not exactly the same) will hold the lines of text from the input file.

    The ReadFile() function takes as its parameter the name of the file to open, and returns the filled-in list. This function opens the file in read mode, as a text file, and then reads lines of text from the input file. It skips the entire first line. Each line that is read is appended to the lists variable. When the final line is read (when line == "", or empty string), the function returns the lines list.

    The PrintFile() function has a list parameter (lines). For each line of text in the lines list, it prints that line.

    I'm hopeful that you'll be able to follow this code to see what your Fortran program needs to do (or even abandon your attempt to use Fortran to do this...)

    The code above is short and sweet, and seems to work well.
     
  14. Dec 4, 2015 #13
    I was under the assumption that Fortran read files line by line. The empty Read(102,*) was to skip that line of the file. I may be wrong.
     
  15. Dec 4, 2015 #14

    Mark44

    Staff: Mentor

    I could be wrong (it's been many years since I had a Fortran compiler), but I don't believe that the statement above does anything if there's no variable at the end of the read statement. You could have this
    read(102, *) str
    where str is declared as a CHARACTER string variable that is large enough to hold the longest line in your file. You don't need to do anything with the str variable, so you're effectively discarding the first line.
     
  16. Dec 4, 2015 #15

    jbriggs444

    User Avatar
    Science Advisor

    My years-old recollection is that the iolist can be empty. That matches the Fortran77 documentation that I can quickly google up.

    "Input List
    iolist can be empty or can contain input items or implied DO lists"

    https://docs.oracle.com/cd/E19957-01/805-4939/6j4m0vn79/index.html

    A Fortran READ will generally read as many lines as it takes to fill in the variables. But it will always read at least one line. [I found myself often reading a line and then using internal reads to parse the result]
     
  17. Dec 5, 2015 #16

    Mark44

    Staff: Mentor

    Good to know. Thanks for setting me straight @jbriggs444...
     
  18. Dec 5, 2015 #17
    I'm not sure how to use the subroutine to take in the file and put the characters in an array. I may have some problems declaring the array in the subroutine and in the main program
     
  19. Dec 5, 2015 #18

    Mark44

    Staff: Mentor

    Something like this...
    Note that the names of the actual arguments (the variables in main and used in the call to the subroutine) can be the same as the names of the formal parameters in the subroutine, but don't have to be.
    Code (Fortran):

    ! Main program
    ! Declare variables
    CHARACTER * 20 :: fileName
    CHARACTER * 10000 :: data  ! Can contain 10,000 characters, maybe not enough for your application
    .
    .
    .
    ! Get a value for fileName
    .
    .
    .
    CALL ReadFile(fileName, data)
    .
    .
    .
    END PROGRAM

    SUBROUTINE ReadFile(fileName, data)
    IMPLICIT NONE
    CHARACTER * 20, INTENT(IN) :: fileName
    CHARACTER * 10000, INTENT(OUT) :: data

       ! Open file
       ! Strip off header line
       ! Read data from file into array

    END SUBROUTINE ReadFile
     
     
  20. Dec 5, 2015 #19
    ok. Let me see. I have been going in circles.
     
  21. Dec 5, 2015 #20
    I'm stuck

    Code (Fortran):

       
        program aa
       
            character*20:: filename  = "TitinFastaFormat.txt"
            character*10000 :: baseq
            call readfile(filename, baseq)
            !deallocate
       
        end program aa
       
          subroutine readfile(filename, baseq)
          implicit none
              integer ::eof,  nBase, i, tempLength
              character, allocatable ::  tempString(:), newString(:)
              character, intent(in):: filename*20
              character, intent(out):: baseq*10000
            character  :: stringFile*10000
         
              eof=0
              allocate( baseq)
              open (102, file=filename, Form="formatted", status="old" )
              Read(102,*) stringFile
              !Read(102,  iostate = eof) stringFile
              do
              !Read(102,  iostate = eof) stringFile  
              if (eof .ne. 0) exit
              if(stringFile(1:1) .ne. ">") then
                  tempLength = len_trim(string(1:1))
                  nBase = nBase + tempLength
                  allocate ( tempString(tempLength))
       
                  do i=1, tempLength
                      tempString(i) = stringFile(:)
                  end do
       
                  allocate (newString (nBase))
                  newString (1:size(baseq)) = baseq
                  call move_alloc(newString, baseq)
                  baseq(nbase - tempLength+1:nBase) = tempString
                  deallocate(tempString)
              end if
        end do
       close(102)
           
       !write (*,"(A6, tr1, 999999(A1))") "BASEQ:", baseq
       !write (*,"(A6, tr1, i0)") "NBASE:", nbase

       deallocate (baseq)

    end  subroutine read file
     
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Text file that is read into an array
Loading...