by Davoodk
Tags: fortran
 P: 11 Hi everyone I am new to Fortran, I am trying to write a program to read a .txt file in which i have 24480 rows and ~ 6000 columns. In each row (as individuals) i have genotypes indicated as 1 and 2, which if, for example, in row one, i have had 204 genotypes, first half of this genotypes (=102) belongs to individual's sire and second half of genotypes belongs to individual's dam. In addition each row has not equal values. So, how can i define for Fortran to read this file row by row and divide each row to two and put beside each element (i) to the mean+i. For example i show two rows of my file as brief: row1: 112122121112122111112121111211122121111121 row2: 21112111112112222121112121211121221212121111121112 1212 And so on. Any help would be appreciated in advance.
 P: 838 Typically people first take a shot of their own...THEN we help. I am just going to say a couple of things. 1.- Fortran has what is called column-oriented reading which potentially could read all those numbers straight into integers or an integer array, even though they are all together like that. The thing is that, without a test program, I am not sure if that can be use for variable length line. 2.- So, I would probably try to read each line into a very long character variable, first, and then transfer the values to integers via internal reading...for as long as you declare your character variable long enough, it will read the entire line and necessarily stop at the end of line.
 P: 11 ok, yes you are right. i have to describe more. So, i have a very big dataset of markers(genes or genotypes) which characterized as 1 and 2. in the data set i just have 1 and 2. Ok, in this data set, i have m row including m individual and n column which representing n markers. In addition each row has different number of markers (as 1,2). Ok, i need to read each row by fortran and define that each row is a chromosome of each individual which inherited by mom(dam) and dad(sire). and then divide each row into two and say that first half of row belongs dad and the other half belongs mom, and after introducing this data set to Fortran, I want to be able to choose and print interesting markers in each row and interesting individuals at the whole of data set in a new file. So suppose below lines as my two first lines in data set: row1: 12111221112121 which including 14 markers which first seven (1211122) markers belong to his dad and second seven markers belong to his mom. row 2: 1112221212121112212122112111121212212211122221 which including 46 markers, respectively. thank you for your patience i would like to share you a part of my data set as an instance. So please see this link: 4shared.com/photo/O4fWnt4w/marker_sample.html Please ignore those blue part of data set. those columns are ID of individual, Live status(1 alive and 0 dead), Chromosome number respectively.
P: 838

I understood what you meant the first time around...that's not the problem.

Did YOU understand what I said? I am not going to write the first line of code...you need to take a shot at it by yourself, first and THEN we can help. I did give you a hint on how to go about it. Read up and follow on that.
 P: 11 Hi guys Thank you for your earlier helps. using DO row = 1, max_rows READ(15,'(a)')line len(row)=len_trim(line) ALLOCATE(a(len_trim(line)/2)) ALLOCATE(b(len_trim(line)/2)) ALLOCATE(seq(2,len_trim(line)/2)) PRINT*, "size A:",size(a) PRINT*, "size B:",size(b) READ(line, '(1000i1)')a(1:len(row)/2),b(((len(row)/2)+1):len(row)) seq(1,:)=a(1:len(row)/2) seq(2,:)=b(((len(row)/2)+1):len(row)) I could divide and make each line into two equal pairs and print it by above program. So, I now want to combine these two 'seq' in which every element of these rows stand by each other. for example, suppose: seq1: 11111111111111111111 seq2: 22222222222222222222 so. i want to have a vector like this. 12121212121212121212121212 Also, to reach above order of elements, i used below Do loops, but not working. DO DO ilocus=1, nlocus WRITE(*, '(2i1)', ADVANCE='NO') seq(1:2, ilocus) END DO END DO WRITE(genotype, *) Also, would you please guide me how can i say to fortran that repeat this Do loop for each 30 subsequent line and put all of these lines in a new matrix. Thank you in advance
 P: 838 You've got it a bit backwards. You have initialized two arrays (a and b) that are supposed to be indexed from 1 to len(row)/2, yet, in your read statement, you read into b with index starting at 1+len(row)/2...that's wrong...such index is on line, not on b: read(line( 1 : len(row)/2 ),'(1000i1)') a read(line( 1+len(row)/2 : len(row) ),'(1000i1)') b OR For as long as you have allocated a and b to exactly what they need to be, I think you can simply do: read(line,'(1000i1)') a, b and I think fortran is intelligent enough to read enough data to fill in a and then b. Give it a try.
 P: 11 thanks alot. But, when i am checking print statement of array "b" with the original line in the data set, i can see that components of this array are not following the orders in the original one, though the array "a" shows a correct order as well as original. thank you for your patience
 P: 838 go ahead and post your latest source code (use tags, please) and sample data file.
 P: 11 13579111111111111111111111111111111111111111111111111111111111111111111 11111111111111112111111111111112468011111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111. ok, as you can see, i put a line of my data above, which all values are in just one line in data set. this line consists of 204 value in which i must split this line into two new separate lines, then to merge lines in a new line. ok, first 5 consecutive values (13579) and of 103-107th value (24680) was written by myself to recognize whether my read and print statements are working properly or not. In fact, 13579 and 24680 are 11111 and 11111. so, the middle of above line is 103th value which is 2. when my program reads this line as a string, can split that in two separate and correct lines, but when i want to define that as an array (:), it does not work properly at the 103th value to end. so, would you please put that above line in a text file and test below program. DO row = 1, max_rows READ(15,'(a)')line len(row)=len_trim(line) ALLOCATE(a(len_trim(line)/2)) ALLOCATE(b(len_trim(line)/2)) ALLOCATE(seq(2,len_trim(line)/2)) PRINT*, "size A:",size(a) PRINT*, "size B:",size(b) READ(line, '(1000i1)')a (1 : len(row)/2), b(( len(row)/2)+1: len(row)) WRITE(*,'(a5,200i1)') "a", a(1:(len(row)/2)) WRITE(*,'(a5,200i1)') "b", b(((len(row)/2)+1):len(row)) seq(1,:)=a(1:len(row)/2) seq(2,:)=b((len(row)/2+1):len(row)) WRITE(*,'(a8,200i1)') "seq1", seq(1,:) WRITE(*,'(a8,200i1)') "seq2" , seq(2,:) END DO on screen, a and b and seq1 are written correctly, but seq2 is not printed properly. however, after solving this issue, i should be able to merge each element of seq1 and seq2 as below: 123456789011111.....11. by the way, i intentionally put odd numbers at the beginning and even numbers at the middle for this reason that to see whether program works or not. Thank you
 P: 838 you have not implemented the correction suggested in post #6....please read that and correct your code. also, if the data lines vary in length...you need to deallocate a and b and allocate them again within the loop, then.
 P: 11 Thank you. yes you are right. after implementing (#6) ***read(line,'(1000i1)') a, b***, fortran wisely recognized a and b and generated seq1(:) and seq2(:). so, i have a new file in which lines doubled. (original line=111111..111111222222..222222) In this new file i have to merge values of each two consecutive lines. for example: seq1=[11111111....111] seq2=[22222222....222]. I am going to have a new file by below lines: seq1seq2=[1212121212...121212]. So, i have tried several program but non of them did not worked. I promise that his would be my final question. Thanks alot
 P: 838 are you alternating as you glue seq1 and seq2? do i = 1, (len_trim(line)/2) seq1seq2( 2*i - 1 ) = seq1(i) seq1seq2( 2*i ) = seq2(i) end do write(*,'(1000i1)') seq1seq2 something along those lines...I did not test or anything
 P: 11 Hi, Thank you. your program worked and highly appreciated for that. this is that program: PROGRAM gsal IMPLICIT NONE INTEGER::OPENSTATUS, count, i INTEGER::row INTEGER,ALLOCATABLE::len(:) INTEGER::max_rows=734400 CHARACTER*1000:: line INTEGER,ALLOCATABLE::a(:),b(:) INTEGER,ALLOCATABLE::seq(:,:) INTEGER,ALLOCATABLE::seq1(:) INTEGER,ALLOCATABLE::seq2(:) INTEGER,ALLOCATABLE::seq1seq23(:) ALLOCATE(len(734400)) !Open ADAM output file for markers OPEN(UNIT=15, FILE="3.txt", STATUS="OLD", IOSTAT=OPENSTATUS) IF (OPENSTATUS>0) STOP "***CANNOT OPEN FILE***" !Create a new .txt file in which paternal and maternal allels are close to each other. OPEN(UNIT=12, FILE="a.txt", STATUS="REPLACE", ACTION="WRITE") DO row = 1, max_rows PRINT*,"row", row READ(15,'(a)')line len(row)=len_trim(line) PRINT*, "length of row:", len_trim(line) ALLOCATE(a(len_trim(line)/2)) ALLOCATE(b(len_trim(line)/2)) ALLOCATE(seq(2,len_trim(line)/2)) ALLOCATE(seq1(size(a))) ALLOCATE(seq2(size(b))) ALLOCATE(seq1seq23(len_trim(line))) PRINT*, "Paternal Size:",size(a) PRINT*, "Maternal Size:",size(b) !READ(line, '(1000i1)')a(1:len(row)/2),b(((len(row)/2)+1):len(row)) !READ(line, '(1000i1)')a (1 : len(row)/2), b(( len(row)/2)+1: len(row)) READ(line,'(1000i1)') a, b ! WRITE(*,'(a5,200i1)') "a", a!( 1:(len(row)/2)) ! WRITE(*,'(a5,200i1)') "b", b!(((1+len(row)/2)):len(row)) ! WRITE(unit=12,fmt='(200i1)') a ! WRITE(unit=12,fmt='(200i1)') b ! PRINT* ! WRITE(*,'(a5,200i1)') "seq1", seq(1,:) ! WRITE(*,'(a5,200i1)') "seq2" , seq(2,:) seq1(:)=a(:)!(1:len(row)/2) seq2(:)=b(:)!((len(row)/2+1):len(row)) !A DO loop from which separated paternal and maternal sequences is merged. DO i = 1, (len_trim(line)/2) seq1seq23( 2*i - 1 )=a(i) seq1seq23( 2*i )=b(i) END DO WRITE(unit=12,fmt='(1000i1)') seq1seq23 DEALLOCATE(a) DEALLOCATE(b) DEALLOCATE(seq) DEALLOCATE(seq1) DEALLOCATE(seq2) DEALLOCATE(seq1seq23) END DO END PROGRAM gsal
 P: 11 Hi, By previous program, i have a new file in which each first 30 lines must be followed in just one line. for example i show 4 lines as a small frame of my big file: 121212121212121212121212121212121212 343434343434343434343434343434343434343434343434343434 56565656565656565656565656 78787878787878787878787878787878787878787878 so, i am going to blend all above lines in just one line. as below: 121212...12123434343434...34345656565656...5656...787878....78 any help will be appreciated.
 P: 838 Hey!...back in posting #11 you promised it would be the final question!
 P: 11 yes, i am sorry about that. but i need your or someone else help. So, i promise and will fulfill that will be final question.
 P: 838 Well, if you know, somehow, that they are going to be 30 lines, simply declare a very long character string and keep concatenating to it every line that you read in...then, with such long line, do the same thing you are already doing. do 1 to 30 read line long_line = long_line // trim(line) enddo needless to say, the steps above are NOT fortran...just trying to illustrate to you what I mean. If the above does not work, for some reason, you may have to count the length of line and keep moving the index in long_line, instead do 1 to 30 read line length= len_trim(line) long_line[next:next+length-1] = trim(line) next = next+length enddo programming is about getting creative and trying a few things, no harm done. Worst case, think of what you as a human would do to do what you need to do...and simply automate that. hope this helps

 Related Discussions Engineering, Comp Sci, & Technology Homework 1 Programming & Computer Science 5 Programming & Computer Science 1 Programming & Computer Science 0 Programming & Computer Science 0