Mathematica Mathematica Slows Down with Repeated Evaluation

  • Thread starter Thread starter michaelp7
  • Start date Start date
  • Tags Tags
    Mathematica
AI Thread Summary
The discussion centers on analyzing a large collection of PDB files to identify cation-pi interactions by parsing each file into a structured table of atom positions. The initial code implementation successfully reads the files but experiences significant slowdowns after multiple runs, increasing from under 0.2 seconds to over 25 seconds by the fifth run. The slowdown is attributed to not closing input streams, which accumulate and degrade performance. Suggestions include using the Streams[] command to identify and close all open streams, except for standard output and error streams, to maintain efficiency. It is recommended to open, read, and close each file individually to enhance processing speed and prevent the buildup of open streams. This approach is expected to improve the overall performance of the code.
michaelp7
Messages
6
Reaction score
0
I'm trying to analyze a fairly large (order 10^3) collection of PDB files to look for cation-pi interactions for a class. This requires me to parse each PDB file into a table which gives the position of each atom in the file. To do this, I've written the following code:

Timing[Open[AllFiles[[2]]]; (*AllFiles is a list of the filenames in the directory. I intend to replace the 2 with an iteration index when the code is working*)
Lines = Import[AllFiles[[2]], "Lines"];
FullTable = {};
Do[
LineStream = StringToStream[Lines[[j]]];
QATOM = Read[LineStream, Word];
If[QATOM == "ATOM", (*This condition looks for lines that actually describe atoms, instead of other information*)
ThisLine =
Read[LineStream, {Number, Word, Word, Word,
Number, {Number, Number, Number}}];
If[Or[StringLength[ThisLine[[3]]] == 3, StringTake[ThisLine[[3]], 1] == "A" (*This condition eliminates duplicate listings*)],
FullTable = Append[FullTable, ThisLine]]
]
, {j, 1, Length[Lines]}];
]

The code does what it's supposed to, but it slows down significantly each time I run it. The first run takes less than .2 seconds, but by the fifth run, it's already above 25 seconds to parse the same file. Quitting the kernel session solves the speed problem, but of course this deletes all my data. CleanSlate, ClearAll, and adjusting $HistoryLength all had no effect. I haven't come across a solution on this forum yet, so I would appreciate any suggestions.
 
Physics news on Phys.org
Update-- I think the problem is that I need to close all these input streams. This post seems to address the same issue:

http://groups.google.com/group/comp...a/browse_thread/thread/5dc2bf7e4793418d?pli=1

I get some improvements when I close the streams in a modified version of this program, but when I try it on the original code, it still slows down. I think I'm missing some streams. Is there a command that would let me close ALL open streams?
 
If you evaluate

s = Streams[]

you will see that s is the list of the open streams you have.

It looks like, unless you are fiddling around with stdout and stderr, that

Map[Close, Drop[s, 2]]

will close everything except the first two, stdout and stderr.

BUT I would be very cautious with that. You might want to do more processing on the result from Streams[] to make sure you weren't closing something you didn't want to.
 
Thanks! That sped it right up.
 
You would likely find your code would be faster if you had

Open[]
Read[]
Close[]

and you did that for each individual file, rather than opening thousands of streams, crunching the data and then closing the thousands of streams.
 

Similar threads

Replies
6
Views
4K
Replies
1
Views
2K
Replies
22
Views
3K
Replies
2
Views
4K
Replies
2
Views
3K
Replies
52
Views
12K
Replies
3
Views
2K
Replies
6
Views
6K
Back
Top