• Fortran

Main Question or Discussion Point

Hi everyone
I am new to Fortran, I am trying to write a program to read a .txt file in which i have 24480 rows and ~ 6000 columns.
In each row (as individuals) i have genotypes indicated as 1 and 2, which if, for example, in row one, i have had 204 genotypes, first half of this genotypes (=102) belongs to individual's sire and second half of genotypes belongs to individual's dam. In addition each row has not equal values. So, how can i define for Fortran to read this file row by row and divide each row to two and put beside each element (i) to the mean+i.

For example i show two rows of my file as brief:
row1: 112122121112122111112121111211122121111121
row2: 21112111112112222121112121211121221212121111121112 1212

And so on.
Any help would be appreciated in advance.

Related Programming and Computer Science News on Phys.org
Typically people first take a shot of their own...THEN we help.

I am just going to say a couple of things.

1.- Fortran has what is called column-oriented reading which potentially could read all those numbers straight into integers or an integer array, even though they are all together like that. The thing is that, without a test program, I am not sure if that can be use for variable length line.

2.- So, I would probably try to read each line into a very long character variable, first, and then transfer the values to integers via internal reading...for as long as you declare your character variable long enough, it will read the entire line and necessarily stop at the end of line.

ok, yes you are right. i have to describe more. So, i have a very big dataset of markers(genes or genotypes) which characterized as 1 and 2. in the data set i just have 1 and 2. Ok, in this data set, i have m row including m individual and n column which representing n markers. In addition each row has different number of markers (as 1,2). Ok, i need to read each row by fortran and define that each row is a chromosome of each individual which inherited by mom(dam) and dad(sire). and then divide each row into two and say that first half of row belongs dad and the other half belongs mom, and after introducing this data set to Fortran, I want to be able to choose and print interesting markers in each row and interesting individuals at the whole of data set in a new file.

So suppose below lines as my two first lines in data set: row1: 12111221112121 which including 14 markers which first seven (1211122) markers belong to his dad and second seven markers belong to his mom. row 2: 1112221212121112212122112111121212212211122221 which including 46 markers, respectively. thank you for your patience

i would like to share you a part of my data set as an instance. So please see this link: 4shared.com/photo/O4fWnt4w/marker_sample.html

Please ignore those blue part of data set. those columns are ID of individual, Live status(1 alive and 0 dead), Chromosome number respectively.

I understood what you meant the first time around...that's not the problem.

Did YOU understand what I said? I am not going to write the first line of code...you need to take a shot at it by yourself, first and THEN we can help. I did give you a hint on how to go about it. Read up and follow on that.

Hi guys
Thank you for your earlier helps. using

DO row = 1, max_rows
len(row)=len_trim(line)
ALLOCATE(a(len_trim(line)/2))
ALLOCATE(b(len_trim(line)/2))
ALLOCATE(seq(2,len_trim(line)/2))
PRINT*, "size A:",size(a)
PRINT*, "size B:",size(b)

seq(1,:)=a(1:len(row)/2)
seq(2,:)=b(((len(row)/2)+1):len(row))

I could divide and make each line into two equal pairs and print it by above program. So, I now want to combine these two 'seq' in which every element of these rows stand by each other. for example, suppose:
seq1: 11111111111111111111
seq2: 22222222222222222222

so. i want to have a vector like this. 12121212121212121212121212

Also, to reach above order of elements, i used below Do loops, but not working.

DO
DO ilocus=1, nlocus
WRITE(*, '(2i1)', ADVANCE='NO') seq(1:2, ilocus)
END DO
END DO
WRITE(genotype, *)

Also, would you please guide me how can i say to fortran that repeat this Do loop for each 30 subsequent line and put all of these lines in a new matrix.

Thank you in advance

You've got it a bit backwards.

You have initialized two arrays (a and b) that are supposed to be indexed from 1 to len(row)/2, yet, in your read statement, you read into b with index starting at 1+len(row)/2...that's wrong...such index is on line, not on b:
Code:
read(line(            1 : len(row)/2 ),'(1000i1)') a
read(line( 1+len(row)/2 : len(row)   ),'(1000i1)') b
OR

For as long as you have allocated a and b to exactly what they need to be, I think you can simply do:
Code:
read(line,'(1000i1)') a, b
and I think fortran is intelligent enough to read enough data to fill in a and then b. Give it a try.

thanks alot. But, when i am checking print statement of array "b" with the original line in the data set, i can see that components of this array are not following the orders in the original one, though the array "a" shows a correct order as well as original.

thank you for your patience

go ahead and post your latest source code (use tags, please) and sample data file.

135791111111111111111111111111111111111111111111111111111111111111111111111111111111111211111111111111246801111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111.

ok, as you can see, i put a line of my data above, which all values are in just one line in data set. this line consists of 204 value in which i must split this line into two new separate lines, then to merge lines in a new line.
ok, first 5 consecutive values (13579) and of 103-107th value (24680) was written by myself to recognize whether my read and print statements are working properly or not. In fact, 13579 and 24680 are 11111 and 11111. so, the middle of above line is 103th value which is 2. when my program reads this line as a string, can split that in two separate and correct lines, but when i want to define that as an array (:), it does not work properly at the 103th value to end. so, would you please put that above line in a text file and test below program.

DO row = 1, max_rows
len(row)=len_trim(line)
ALLOCATE(a(len_trim(line)/2))
ALLOCATE(b(len_trim(line)/2))
ALLOCATE(seq(2,len_trim(line)/2))
PRINT*, "size A:",size(a)
PRINT*, "size B:",size(b)
READ(line, '(1000i1)')a (1 : len(row)/2), b(( len(row)/2)+1: len(row))

WRITE(*,'(a5,200i1)') "a", a(1:(len(row)/2))
WRITE(*,'(a5,200i1)') "b", b(((len(row)/2)+1):len(row))

seq(1,:)=a(1:len(row)/2)
seq(2,:)=b((len(row)/2+1):len(row))

WRITE(*,'(a8,200i1)') "seq1", seq(1,:)
WRITE(*,'(a8,200i1)') "seq2" , seq(2,:)

END DO

on screen, a and b and seq1 are written correctly, but seq2 is not printed properly.

however, after solving this issue, i should be able to merge each element of seq1 and seq2 as below:
123456789011111.....11.
by the way, i intentionally put odd numbers at the beginning and even numbers at the middle for this reason that to see whether program works or not.

Thank you

you have not implemented the correction suggested in post #6....please read that and correct your code.

also, if the data lines vary in length...you need to deallocate a and b and allocate them again within the loop, then.

Thank you. yes you are right. after implementing (#6) ***read(line,'(1000i1)') a, b***, fortran wisely recognized a and b and generated seq1(:) and seq2(:). so, i have a new file in which lines doubled. (original line=111111..111111222222..222222)

In this new file i have to merge values of each two consecutive lines. for example:
seq1=[11111111....111]
seq2=[22222222....222].

I am going to have a new file by below lines:
seq1seq2=[1212121212...121212].

So, i have tried several program but non of them did not worked.

I promise that his would be my final question.

Thanks alot

are you alternating as you glue seq1 and seq2?
Code:
do i = 1, (len_trim(line)/2)
seq1seq2( 2*i - 1 ) = seq1(i)
seq1seq2( 2*i     ) = seq2(i)
end do
write(*,'(1000i1)') seq1seq2
something along those lines...I did not test or anything

Hi, Thank you. your program worked and highly appreciated for that. this is that program:

PROGRAM gsal

IMPLICIT NONE
INTEGER::OPENSTATUS, count, i
INTEGER::row
INTEGER,ALLOCATABLE::len(:)
INTEGER::max_rows=734400
CHARACTER*1000:: line
INTEGER,ALLOCATABLE::a(:),b(:)
INTEGER,ALLOCATABLE::seq(:,:)
INTEGER,ALLOCATABLE::seq1(:)
INTEGER,ALLOCATABLE::seq2(:)
INTEGER,ALLOCATABLE::seq1seq23(:)
ALLOCATE(len(734400))

!Open ADAM output file for markers
OPEN(UNIT=15, FILE="3.txt", STATUS="OLD", IOSTAT=OPENSTATUS)
IF (OPENSTATUS>0) STOP "***CANNOT OPEN FILE***"

!Create a new .txt file in which paternal and maternal allels are close to each other.
OPEN(UNIT=12, FILE="a.txt", STATUS="REPLACE", ACTION="WRITE")

DO row = 1, max_rows
PRINT*,"row", row
len(row)=len_trim(line)
PRINT*, "length of row:", len_trim(line)
ALLOCATE(a(len_trim(line)/2))
ALLOCATE(b(len_trim(line)/2))
ALLOCATE(seq(2,len_trim(line)/2))
ALLOCATE(seq1(size(a)))
ALLOCATE(seq2(size(b)))
ALLOCATE(seq1seq23(len_trim(line)))
PRINT*, "Paternal Size:",size(a)
PRINT*, "Maternal Size:",size(b)
!READ(line, '(1000i1)')a (1 : len(row)/2), b(( len(row)/2)+1: len(row))

! WRITE(*,'(a5,200i1)') "a", a!( 1:(len(row)/2))
! WRITE(*,'(a5,200i1)') "b", b!(((1+len(row)/2)):len(row))

! WRITE(unit=12,fmt='(200i1)') a
! WRITE(unit=12,fmt='(200i1)') b

! PRINT*

! WRITE(*,'(a5,200i1)') "seq1", seq(1,:)
! WRITE(*,'(a5,200i1)') "seq2" , seq(2,:)

seq1(:)=a(:)!(1:len(row)/2)

seq2(:)=b(:)!((len(row)/2+1):len(row))

!A DO loop from which separated paternal and maternal sequences is merged.
DO i = 1, (len_trim(line)/2)
seq1seq23( 2*i - 1 )=a(i)
seq1seq23( 2*i )=b(i)
END DO

WRITE(unit=12,fmt='(1000i1)') seq1seq23

DEALLOCATE(a)
DEALLOCATE(b)
DEALLOCATE(seq)
DEALLOCATE(seq1)
DEALLOCATE(seq2)
DEALLOCATE(seq1seq23)
END DO

END PROGRAM gsal

Hi,

By previous program, i have a new file in which each first 30 lines must be followed in just one line. for example i show 4 lines as a small frame of my big file:
121212121212121212121212121212121212
343434343434343434343434343434343434343434343434343434
56565656565656565656565656
78787878787878787878787878787878787878787878

so, i am going to blend all above lines in just one line. as below:
121212...12123434343434...34345656565656...5656...787878....78

any help will be appreciated.

Hey!...back in posting #11 you promised it would be the final question!

yes, i am sorry about that. but i need your or someone else help. So, i promise and will fulfill that will be final question.

Well, if you know, somehow, that they are going to be 30 lines, simply declare a very long character string and keep concatenating to it every line that you read in...then, with such long line, do the same thing you are already doing.

do 1 to 30
long_line = long_line // trim(line)
enddo

needless to say, the steps above are NOT fortran...just trying to illustrate to you what I mean.

If the above does not work, for some reason, you may have to count the length of line and keep moving the index in long_line, instead

do 1 to 30