Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Help about Fortran

  1. Sep 17, 2012 #1
    Hi everyone
    I am new to Fortran, I am trying to write a program to read a .txt file in which i have 24480 rows and ~ 6000 columns.
    In each row (as individuals) i have genotypes indicated as 1 and 2, which if, for example, in row one, i have had 204 genotypes, first half of this genotypes (=102) belongs to individual's sire and second half of genotypes belongs to individual's dam. In addition each row has not equal values. So, how can i define for Fortran to read this file row by row and divide each row to two and put beside each element (i) to the mean+i.

    For example i show two rows of my file as brief:
    row1: 112122121112122111112121111211122121111121
    row2: 21112111112112222121112121211121221212121111121112 1212

    And so on.
    Any help would be appreciated in advance.
  2. jcsd
  3. Sep 17, 2012 #2
    Typically people first take a shot of their own...THEN we help.

    I am just going to say a couple of things.

    1.- Fortran has what is called column-oriented reading which potentially could read all those numbers straight into integers or an integer array, even though they are all together like that. The thing is that, without a test program, I am not sure if that can be use for variable length line.

    2.- So, I would probably try to read each line into a very long character variable, first, and then transfer the values to integers via internal reading...for as long as you declare your character variable long enough, it will read the entire line and necessarily stop at the end of line.
  4. Sep 17, 2012 #3
    ok, yes you are right. i have to describe more. So, i have a very big dataset of markers(genes or genotypes) which characterized as 1 and 2. in the data set i just have 1 and 2. Ok, in this data set, i have m row including m individual and n column which representing n markers. In addition each row has different number of markers (as 1,2). Ok, i need to read each row by fortran and define that each row is a chromosome of each individual which inherited by mom(dam) and dad(sire). and then divide each row into two and say that first half of row belongs dad and the other half belongs mom, and after introducing this data set to Fortran, I want to be able to choose and print interesting markers in each row and interesting individuals at the whole of data set in a new file.

    So suppose below lines as my two first lines in data set: row1: 12111221112121 which including 14 markers which first seven (1211122) markers belong to his dad and second seven markers belong to his mom. row 2: 1112221212121112212122112111121212212211122221 which including 46 markers, respectively. thank you for your patience

    i would like to share you a part of my data set as an instance. So please see this link: 4shared.com/photo/O4fWnt4w/marker_sample.html

    Please ignore those blue part of data set. those columns are ID of individual, Live status(1 alive and 0 dead), Chromosome number respectively.
  5. Sep 17, 2012 #4
    I understood what you meant the first time around...that's not the problem.

    Did YOU understand what I said? I am not going to write the first line of code...you need to take a shot at it by yourself, first and THEN we can help. I did give you a hint on how to go about it. Read up and follow on that.
  6. Sep 26, 2012 #5
    Hi guys
    Thank you for your earlier helps. using

    DO row = 1, max_rows
    PRINT*, "size A:",size(a)
    PRINT*, "size B:",size(b)
    READ(line, '(1000i1)')a(1:len(row)/2),b(((len(row)/2)+1):len(row))


    I could divide and make each line into two equal pairs and print it by above program. So, I now want to combine these two 'seq' in which every element of these rows stand by each other. for example, suppose:
    seq1: 11111111111111111111
    seq2: 22222222222222222222

    so. i want to have a vector like this. 12121212121212121212121212

    Also, to reach above order of elements, i used below Do loops, but not working.

    DO ilocus=1, nlocus
    WRITE(*, '(2i1)', ADVANCE='NO') seq(1:2, ilocus)
    END DO
    END DO
    WRITE(genotype, *)

    Also, would you please guide me how can i say to fortran that repeat this Do loop for each 30 subsequent line and put all of these lines in a new matrix.

    Thank you in advance
  7. Sep 27, 2012 #6
    You've got it a bit backwards.

    You have initialized two arrays (a and b) that are supposed to be indexed from 1 to len(row)/2, yet, in your read statement, you read into b with index starting at 1+len(row)/2...that's wrong...such index is on line, not on b:
    Code (Text):

    read(line(            1 : len(row)/2 ),'(1000i1)') a
    read(line( 1+len(row)/2 : len(row)   ),'(1000i1)') b

    For as long as you have allocated a and b to exactly what they need to be, I think you can simply do:
    Code (Text):

    read(line,'(1000i1)') a, b
    and I think fortran is intelligent enough to read enough data to fill in a and then b. Give it a try.
  8. Sep 28, 2012 #7
    thanks alot. But, when i am checking print statement of array "b" with the original line in the data set, i can see that components of this array are not following the orders in the original one, though the array "a" shows a correct order as well as original.

    thank you for your patience
  9. Sep 28, 2012 #8
    go ahead and post your latest source code (use tags, please) and sample data file.
  10. Oct 1, 2012 #9

    ok, as you can see, i put a line of my data above, which all values are in just one line in data set. this line consists of 204 value in which i must split this line into two new separate lines, then to merge lines in a new line.
    ok, first 5 consecutive values (13579) and of 103-107th value (24680) was written by myself to recognize whether my read and print statements are working properly or not. In fact, 13579 and 24680 are 11111 and 11111. so, the middle of above line is 103th value which is 2. when my program reads this line as a string, can split that in two separate and correct lines, but when i want to define that as an array (:), it does not work properly at the 103th value to end. so, would you please put that above line in a text file and test below program.

    DO row = 1, max_rows
    PRINT*, "size A:",size(a)
    PRINT*, "size B:",size(b)
    READ(line, '(1000i1)')a (1 : len(row)/2), b(( len(row)/2)+1: len(row))

    WRITE(*,'(a5,200i1)') "a", a(1:(len(row)/2))
    WRITE(*,'(a5,200i1)') "b", b(((len(row)/2)+1):len(row))


    WRITE(*,'(a8,200i1)') "seq1", seq(1,:)
    WRITE(*,'(a8,200i1)') "seq2" , seq(2,:)

    END DO

    on screen, a and b and seq1 are written correctly, but seq2 is not printed properly.

    however, after solving this issue, i should be able to merge each element of seq1 and seq2 as below:
    by the way, i intentionally put odd numbers at the beginning and even numbers at the middle for this reason that to see whether program works or not.

    Thank you
  11. Oct 1, 2012 #10
    you have not implemented the correction suggested in post #6....please read that and correct your code.

    also, if the data lines vary in length...you need to deallocate a and b and allocate them again within the loop, then.
  12. Oct 2, 2012 #11
    Thank you. yes you are right. after implementing (#6) ***read(line,'(1000i1)') a, b***, fortran wisely recognized a and b and generated seq1(:) and seq2(:). so, i have a new file in which lines doubled. (original line=111111..111111222222..222222)

    In this new file i have to merge values of each two consecutive lines. for example:

    I am going to have a new file by below lines:

    So, i have tried several program but non of them did not worked.

    I promise that his would be my final question.

    Thanks alot
  13. Oct 2, 2012 #12
    are you alternating as you glue seq1 and seq2?
    Code (Text):

    do i = 1, (len_trim(line)/2)
      seq1seq2( 2*i - 1 ) = seq1(i)
      seq1seq2( 2*i     ) = seq2(i)
    end do
    write(*,'(1000i1)') seq1seq2
    something along those lines...I did not test or anything
  14. Oct 2, 2012 #13
    Hi, Thank you. your program worked and highly appreciated for that. this is that program:

    PROGRAM gsal

    CHARACTER*1000:: line

    !Open ADAM output file for markers

    !Create a new .txt file in which paternal and maternal allels are close to each other.

    DO row = 1, max_rows
    PRINT*,"row", row
    PRINT*, "length of row:", len_trim(line)
    PRINT*, "Paternal Size:",size(a)
    PRINT*, "Maternal Size:",size(b)
    !READ(line, '(1000i1)')a(1:len(row)/2),b(((len(row)/2)+1):len(row))
    !READ(line, '(1000i1)')a (1 : len(row)/2), b(( len(row)/2)+1: len(row))

    READ(line,'(1000i1)') a, b

    ! WRITE(*,'(a5,200i1)') "a", a!( 1:(len(row)/2))
    ! WRITE(*,'(a5,200i1)') "b", b!(((1+len(row)/2)):len(row))

    ! WRITE(unit=12,fmt='(200i1)') a
    ! WRITE(unit=12,fmt='(200i1)') b

    ! PRINT*

    ! WRITE(*,'(a5,200i1)') "seq1", seq(1,:)
    ! WRITE(*,'(a5,200i1)') "seq2" , seq(2,:)



    !A DO loop from which separated paternal and maternal sequences is merged.
    DO i = 1, (len_trim(line)/2)
    seq1seq23( 2*i - 1 )=a(i)
    seq1seq23( 2*i )=b(i)
    END DO

    WRITE(unit=12,fmt='(1000i1)') seq1seq23

    END DO

    END PROGRAM gsal
  15. Oct 3, 2012 #14

    By previous program, i have a new file in which each first 30 lines must be followed in just one line. for example i show 4 lines as a small frame of my big file:

    so, i am going to blend all above lines in just one line. as below:

    any help will be appreciated.
  16. Oct 3, 2012 #15
    Hey!...back in posting #11 you promised it would be the final question!
  17. Oct 4, 2012 #16
    yes, i am sorry about that. but i need your or someone else help. So, i promise and will fulfill that will be final question.
  18. Oct 4, 2012 #17
    Well, if you know, somehow, that they are going to be 30 lines, simply declare a very long character string and keep concatenating to it every line that you read in...then, with such long line, do the same thing you are already doing.

    do 1 to 30
    read line
    long_line = long_line // trim(line)

    needless to say, the steps above are NOT fortran...just trying to illustrate to you what I mean.

    If the above does not work, for some reason, you may have to count the length of line and keep moving the index in long_line, instead

    do 1 to 30
    read line
    length= len_trim(line)
    long_line[next:next+length-1] = trim(line)
    next = next+length

    programming is about getting creative and trying a few things, no harm done. Worst case, think of what you as a human would do to do what you need to do...and simply automate that.

    hope this helps
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook