Text file that is read into an array

  • Thread starter Thread starter Sue Parks
  • Start date Start date
  • Tags Tags
    Array File Text
Click For Summary

Discussion Overview

The discussion revolves around reading a text file in Fortran, specifically a FASTA file, into an array and processing the data. Participants explore how to structure the code into functions or subroutines for better organization and functionality, including stripping headers and calculating amino acid frequencies.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant seeks to convert a segment of code into a function that strips headers from a FASTA file and reads data into an array, followed by computing amino acid frequency.
  • Another participant suggests using three subroutines instead of a single function, emphasizing that each subroutine should perform a distinct task, such as stripping headers, reading data, and calculating frequencies.
  • A later reply questions the need for additional subroutines to add characters to an array and discusses the parameters needed for each subroutine.
  • Some participants express uncertainty about the best way to handle file parameters in Fortran, noting that it may not be as straightforward as in other programming languages.
  • One participant mentions their inexperience with Fortran and contrasts it with their usual use of Python or BioPython for similar tasks.
  • Another participant inquires about the intended analysis of the text file and the significance of the data being processed.
  • A participant shares their struggle with converting their initial program into a subroutine and seeks advice on reading the sequence from the file.

Areas of Agreement / Disagreement

Participants generally agree that structuring the code into multiple subroutines makes sense, but there is no consensus on the exact implementation details or the necessity of combining certain tasks into fewer subroutines.

Contextual Notes

Participants discuss the challenges of handling file input in Fortran, particularly regarding the treatment of files as first-class objects and the complexity of passing file parameters between subroutines. There is also a mention of the need to identify the specific parameters required for each subroutine to function effectively.

Who May Find This Useful

This discussion may be useful for individuals interested in programming in Fortran, particularly those working with biological data formats like FASTA files, as well as those looking to improve their code organization through the use of subroutines.

  • #31
In post #24 you said this:
Sue Parks said:
! Here is my Data (list of ALL possible amino acids
DATA /'A,C,D,E,F,G,H,I,K,L,M,N,P,Q,E,A,T,V,W,Y'/
Might have been a copy/paste mistake, but the list above has duplicates for E and A, and is missing R and S. The data file that you attached at the beginning of this thread contains numerous R's and S's. The output from my Python code in post #28 doesn't have entries for R and S, which I presume are valid amino acids. I didn't include these two because you didn't list them above.

I have to ask: Is there some reason you're doing this with Fortran? To me, using Fortran in the context of this problem is something like trying to make fine furniture using only a hammer.
 
Technology news on Phys.org
  • #32
Adding R and S as possible amino acids, and tweaking the output format slightly, this is what I'm now getting. The 'Misc' category now consists only of the newline characters that were in your input textfile.
Code:
Amino acid  Count Proportion
  E    3193  9.143 %
  N    1111  3.181 %
  H    478  1.369 %
  M    398  1.140 %
  W    466  1.334 %
  I    2062  5.905 %
  Q    942  2.697 %
  A    2084  5.968 %
  K    2943  8.427 %
  Y    999  2.861 %
  D    1720  4.925 %
  L    2117  6.062 %
  C    513  1.469 %
  S    2463  7.053 %
  Misc   572  1.638 %
  G    2066  5.916 %
  T    2546  7.291 %
  F    908  2.600 %
  P    2517  7.207 %
  V    3184  9.117 %
  R    1640  4.696 %
Cumulative total percentages: 100.00%
Characters processed:  34922
 
  • #33
This is a practice simulation in fortran. I have a good foundation in Python. We (YOU & I) know Fortran is not the best way to go about solving this problem, but it can be done.
 
  • #34
Sue Parks said:
This is a practice simulation in fortran. I have a good foundation in Python. We (YOU & I) know Fortran is not the best way to go about solving this problem, but it can be done.
Sure.

Here's what you posted earlier:
Fortran:
subroutine AAFrequencyTitin

Integer:: countA  , averageA  , countC , countD, average D
do i =1, len(String)
    if ( String == 'A')
        countA = countA + 1
        averageA = countA/sumAminoAcids
       
    else if (String == 'D')
        countD = countD + 1
        averageD = countD/sumAminoAcids
    
end do
end subroutine AAFrequencyTitin[/quote]
Something like this will work, but it needs some work.
1. The subroutine should have at least one parameter, the string (CHARACTER* xxxxx) that was read earlier in another subroutine.
2. You could use countA, countC, countD, etc to store the counts of the various amino acids, but you DON'T need separate variables for the relative frequencies. Just keep track of the total number of amino acids, and display countA/totalCount for the relative proportion of A, and so on.
3. The string (passed as a parameter) can be read one character at a time, by baseq[i:i]. Get the character in the i-th position, and run it through a chain of IF... ELSE IF ... ELSE IF ... statements, incrementing the appropriate countX when the IF clause is matched.
Once you have cycled through the string, and all of the countX variables are set, you could store these values in an array of suitable size (one-dimensional, with one cell for each amino acid count). That array could be an OUT parameter in your function, that could be used by some other subroutine, similar to what I did in my Python code.
 

Similar threads

Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
12K
  • · Replies 10 ·
Replies
10
Views
26K
  • · Replies 12 ·
Replies
12
Views
16K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 10 ·
Replies
10
Views
7K
  • · Replies 11 ·
Replies
11
Views
35K
  • · Replies 13 ·
Replies
13
Views
4K