Help with Fortran Programming for .txt File Reading

Davoodk · Sep 17, 2012

Hi everyone
I am new to Fortran, I am trying to write a program to read a .txt file in which i have 24480 rows and ~ 6000 columns.
In each row (as individuals) i have genotypes indicated as 1 and 2, which if, for example, in row one, i have had 204 genotypes, first half of this genotypes (=102) belongs to individual's sire and second half of genotypes belongs to individual's dam. In addition each row has not equal values. So, how can i define for Fortran to read this file row by row and divide each row to two and put beside each element (i) to the mean+i.

For example i show two rows of my file as brief:
row1: 112122121112122111112121111211122121111121
row2: 21112111112112222121112121211121221212121111121112 1212

And so on.
Any help would be appreciated in advance.

gsal · Sep 17, 2012

Typically people first take a shot of their own...THEN we help.

I am just going to say a couple of things.

1.- Fortran has what is called column-oriented reading which potentially could read all those numbers straight into integers or an integer array, even though they are all together like that. The thing is that, without a test program, I am not sure if that can be use for variable length line.

2.- So, I would probably try to read each line into a very long character variable, first, and then transfer the values to integers via internal reading...for as long as you declare your character variable long enough, it will read the entire line and necessarily stop at the end of line.

Davoodk · Sep 17, 2012

ok, yes you are right. i have to describe more. So, i have a very big dataset of markers(genes or genotypes) which characterized as 1 and 2. in the data set i just have 1 and 2. Ok, in this data set, i have m row including m individual and n column which representing n markers. In addition each row has different number of markers (as 1,2). Ok, i need to read each row by fortran and define that each row is a chromosome of each individual which inherited by mom(dam) and dad(sire). and then divide each row into two and say that first half of row belongs dad and the other half belongs mom, and after introducing this data set to Fortran, I want to be able to choose and print interesting markers in each row and interesting individuals at the whole of data set in a new file.

So suppose below lines as my two first lines in data set: row1: 12111221112121 which including 14 markers which first seven (1211122) markers belong to his dad and second seven markers belong to his mom. row 2: 1112221212121112212122112111121212212211122221 which including 46 markers, respectively. thank you for your patience

i would like to share you a part of my data set as an instance. So please see this link: 4shared.com/photo/O4fWnt4w/marker_sample.html

Please ignore those blue part of data set. those columns are ID of individual, Live status(1 alive and 0 dead), Chromosome number respectively.

gsal · Sep 17, 2012

I understood what you meant the first time around...that's not the problem.

Did YOU understand what I said? I am not going to write the first line of code...you need to take a shot at it by yourself, first and THEN we can help. I did give you a hint on how to go about it. Read up and follow on that.

Davoodk · Sep 26, 2012

Hi guys
Thank you for your earlier helps. using

DO row = 1, max_rows
READ(15,'(a)')line
len(row)=len_trim(line)
ALLOCATE(a(len_trim(line)/2))
ALLOCATE(b(len_trim(line)/2))
ALLOCATE(seq(2,len_trim(line)/2))
PRINT*, "size A:",size(a)
PRINT*, "size B:",size(b)
READ(line, '(1000i1)')a(1:len(row)/2),b(((len(row)/2)+1):len(row))

seq(1,:)=a(1:len(row)/2)
seq(2,:)=b(((len(row)/2)+1):len(row))I could divide and make each line into two equal pairs and print it by above program. So, I now want to combine these two 'seq' in which every element of these rows stand by each other. for example, suppose:
seq1: 11111111111111111111
seq2: 22222222222222222222

so. i want to have a vector like this. 12121212121212121212121212

Also, to reach above order of elements, i used below Do loops, but not working.

DO
DO ilocus=1, nlocus
WRITE(*, '(2i1)', ADVANCE='NO') seq(1:2, ilocus)
END DO
END DO
WRITE(genotype, *)

Also, would you please guide me how can i say to fortran that repeat this Do loop for each 30 subsequent line and put all of these lines in a new matrix.

Thank you in advance

gsal · Sep 27, 2012

You've got it a bit backwards.

You have initialized two arrays (a and b) that are supposed to be indexed from 1 to len(row)/2, yet, in your read statement, you read into b with index starting at 1+len(row)/2...that's wrong...such index is on line, not on b:

Code:

read(line(            1 : len(row)/2 ),'(1000i1)') a
read(line( 1+len(row)/2 : len(row)   ),'(1000i1)') b

OR

For as long as you have allocated a and b to exactly what they need to be, I think you can simply do:

Code:

read(line,'(1000i1)') a, b

and I think fortran is intelligent enough to read enough data to fill in a and then b. Give it a try.

Davoodk · Sep 28, 2012

thanks alot. But, when i am checking print statement of array "b" with the original line in the data set, i can see that components of this array are not following the orders in the original one, though the array "a" shows a correct order as well as original.

thank you for your patience

gsal · Sep 28, 2012

go ahead and post your latest source code (use tags, please) and sample data file.

Davoodk · Oct 1, 2012

135791111111111111111111111111111111111111111111111111111111111111111111111111111111111211111111111111246801111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111.

ok, as you can see, i put a line of my data above, which all values are in just one line in data set. this line consists of 204 value in which i must split this line into two new separate lines, then to merge lines in a new line.
ok, first 5 consecutive values (13579) and of 103-107th value (24680) was written by myself to recognize whether my read and print statements are working properly or not. In fact, 13579 and 24680 are 11111 and 11111. so, the middle of above line is 103th value which is 2. when my program reads this line as a string, can split that in two separate and correct lines, but when i want to define that as an array (:), it does not work properly at the 103th value to end. so, would you please put that above line in a text file and test below program. DO row = 1, max_rows
READ(15,'(a)')line
len(row)=len_trim(line)
ALLOCATE(a(len_trim(line)/2))
ALLOCATE(b(len_trim(line)/2))
ALLOCATE(seq(2,len_trim(line)/2))
PRINT*, "size A:",size(a)
PRINT*, "size B:",size(b)
READ(line, '(1000i1)')a (1 : len(row)/2), b(( len(row)/2)+1: len(row))

WRITE(*,'(a5,200i1)') "a", a(1:(len(row)/2))
WRITE(*,'(a5,200i1)') "b", b(((len(row)/2)+1):len(row))

seq(1,:)=a(1:len(row)/2)
seq(2,:)=b((len(row)/2+1):len(row))

WRITE(*,'(a8,200i1)') "seq1", seq(1,:)
WRITE(*,'(a8,200i1)') "seq2" , seq(2,:)

END DO

on screen, a and b and seq1 are written correctly, but seq2 is not printed properly.

however, after solving this issue, i should be able to merge each element of seq1 and seq2 as below:
123456789011111...11.
by the way, i intentionally put odd numbers at the beginning and even numbers at the middle for this reason that to see whether program works or not.

Thank you

gsal · Oct 1, 2012

you have not implemented the correction suggested in post #6...please read that and correct your code.

also, if the data lines vary in length...you need to deallocate a and b and allocate them again within the loop, then.

Davoodk · Oct 2, 2012

Thank you. yes you are right. after implementing (#6) ***read(line,'(1000i1)') a, b***, fortran wisely recognized a and b and generated seq1(:) and seq2(:). so, i have a new file in which lines doubled. (original line=111111..111111222222..222222)

In this new file i have to merge values of each two consecutive lines. for example:
seq1=[11111111...111]
seq2=[22222222...222].

I am going to have a new file by below lines:
seq1seq2=[1212121212...121212].

So, i have tried several program but non of them did not worked.

I promise that his would be my final question.

Thanks alot

gsal · Oct 2, 2012

are you alternating as you glue seq1 and seq2?

Code:

do i = 1, (len_trim(line)/2)
  seq1seq2( 2*i - 1 ) = seq1(i)
  seq1seq2( 2*i     ) = seq2(i)
end do
write(*,'(1000i1)') seq1seq2

something along those lines...I did not test or anything

Davoodk · Oct 2, 2012

Hi, Thank you. your program worked and highly appreciated for that. this is that program:

PROGRAM gsal

IMPLICIT NONE
INTEGER::OPENSTATUS, count, i
INTEGER::row
INTEGER,ALLOCATABLE::len(:)
INTEGER::max_rows=734400
CHARACTER*1000:: line
INTEGER,ALLOCATABLE::a(:),b(:)
INTEGER,ALLOCATABLE::seq(:,:)
INTEGER,ALLOCATABLE::seq1(:)
INTEGER,ALLOCATABLE::seq2(:)
INTEGER,ALLOCATABLE::seq1seq23(:)
ALLOCATE(len(734400))

!Open ADAM output file for markers
OPEN(UNIT=15, FILE="3.txt", STATUS="OLD", IOSTAT=OPENSTATUS)
IF (OPENSTATUS>0) STOP "***CANNOT OPEN FILE***"

!Create a new .txt file in which paternal and maternal allels are close to each other.
OPEN(UNIT=12, FILE="a.txt", STATUS="REPLACE", ACTION="WRITE")

DO row = 1, max_rows
PRINT*,"row", row
READ(15,'(a)')line
len(row)=len_trim(line)
PRINT*, "length of row:", len_trim(line)
ALLOCATE(a(len_trim(line)/2))
ALLOCATE(b(len_trim(line)/2))
ALLOCATE(seq(2,len_trim(line)/2))
ALLOCATE(seq1(size(a)))
ALLOCATE(seq2(size(b)))
ALLOCATE(seq1seq23(len_trim(line)))
PRINT*, "Paternal Size:",size(a)
PRINT*, "Maternal Size:",size(b)
!READ(line, '(1000i1)')a(1:len(row)/2),b(((len(row)/2)+1):len(row))
!READ(line, '(1000i1)')a (1 : len(row)/2), b(( len(row)/2)+1: len(row))

READ(line,'(1000i1)') a, b

! WRITE(*,'(a5,200i1)') "a", a!( 1:(len(row)/2))
! WRITE(*,'(a5,200i1)') "b", b!(((1+len(row)/2)):len(row))

! WRITE(unit=12,fmt='(200i1)') a
! WRITE(unit=12,fmt='(200i1)') b

! PRINT*

! WRITE(*,'(a5,200i1)') "seq1", seq(1,:)
! WRITE(*,'(a5,200i1)') "seq2" , seq(2,:)

seq1(:)=a(:)!(1:len(row)/2)

seq2(:)=b(:)!((len(row)/2+1):len(row))

!A DO loop from which separated paternal and maternal sequences is merged.
DO i = 1, (len_trim(line)/2)
seq1seq23( 2*i - 1 )=a(i)
seq1seq23( 2*i )=b(i)
END DO

WRITE(unit=12,fmt='(1000i1)') seq1seq23

DEALLOCATE(a)
DEALLOCATE(b)
DEALLOCATE(seq)
DEALLOCATE(seq1)
DEALLOCATE(seq2)
DEALLOCATE(seq1seq23)
END DO

END PROGRAM gsal

Davoodk · Oct 3, 2012

Hi,

By previous program, i have a new file in which each first 30 lines must be followed in just one line. for example i show 4 lines as a small frame of my big file:
121212121212121212121212121212121212
343434343434343434343434343434343434343434343434343434
56565656565656565656565656
78787878787878787878787878787878787878787878

so, i am going to blend all above lines in just one line. as below:
121212...12123434343434...34345656565656...5656...787878...78

any help will be appreciated.

gsal · Oct 3, 2012

Hey!...back in posting #11 you promised it would be the final question!

Davoodk · Oct 4, 2012

yes, i am sorry about that. but i need your or someone else help. So, i promise and will fulfill that will be final question.

gsal · Oct 4, 2012

Well, if you know, somehow, that they are going to be 30 lines, simply declare a very long character string and keep concatenating to it every line that you read in...then, with such long line, do the same thing you are already doing.

do 1 to 30
read line
long_line = long_line // trim(line)
enddo

needless to say, the steps above are NOT fortran...just trying to illustrate to you what I mean.

If the above does not work, for some reason, you may have to count the length of line and keep moving the index in long_line, instead

do 1 to 30
read line
length= len_trim(line)
long_line[next:next+length-1] = trim(line)
next = next+length
enddo

programming is about getting creative and trying a few things, no harm done. Worst case, think of what you as a human would do to do what you need to do...and simply automate that.

hope this helps

Help with Fortran Programming for .txt File Reading

1. How do I read data from a .txt file in Fortran?

2. How do I open and close a .txt file in Fortran?

3. How do I handle errors when reading a .txt file in Fortran?

4. Can I read data from multiple .txt files in Fortran?

5. How do I read specific data from a .txt file in Fortran?

Similar threads

Hot Threads

Recent Insights