Mathematica Slows Down with Repeated Evaluation

  • Context: Mathematica 
  • Thread starter Thread starter michaelp7
  • Start date Start date
  • Tags Tags
    Mathematica
Click For Summary

Discussion Overview

The discussion revolves around performance issues encountered when repeatedly evaluating a Mathematica script designed to parse PDB files for cation-pi interactions. Participants explore potential causes of the slowdown and suggest various solutions related to stream management and code structure.

Discussion Character

  • Technical explanation
  • Exploratory
  • Debate/contested

Main Points Raised

  • One participant describes a Mathematica code that parses PDB files but experiences significant slowdowns after multiple evaluations.
  • Another participant suggests that the slowdown may be due to not closing input streams properly, referencing an external discussion for context.
  • A third participant provides a method to list open streams and suggests a cautious approach to closing them, specifically recommending to exclude stdout and stderr.
  • A later reply indicates that closing streams improved performance but still did not fully resolve the issue in the original code.
  • Another participant proposes that the code could be optimized by opening, reading, and closing streams for each individual file rather than handling thousands of streams at once.

Areas of Agreement / Disagreement

Participants generally agree that stream management is a key factor in the performance issues, but there is no consensus on the best approach to fully resolve the slowdown. Multiple suggestions are presented, indicating varying opinions on the optimal coding strategy.

Contextual Notes

Participants express uncertainty regarding the completeness of stream closure and the potential impact of handling multiple streams simultaneously. There are also unresolved questions about the efficiency of different coding practices in this context.

michaelp7
Messages
6
Reaction score
0
I'm trying to analyze a fairly large (order 10^3) collection of PDB files to look for cation-pi interactions for a class. This requires me to parse each PDB file into a table which gives the position of each atom in the file. To do this, I've written the following code:

Timing[Open[AllFiles[[2]]]; (*AllFiles is a list of the filenames in the directory. I intend to replace the 2 with an iteration index when the code is working*)
Lines = Import[AllFiles[[2]], "Lines"];
FullTable = {};
Do[
LineStream = StringToStream[Lines[[j]]];
QATOM = Read[LineStream, Word];
If[QATOM == "ATOM", (*This condition looks for lines that actually describe atoms, instead of other information*)
ThisLine =
Read[LineStream, {Number, Word, Word, Word,
Number, {Number, Number, Number}}];
If[Or[StringLength[ThisLine[[3]]] == 3, StringTake[ThisLine[[3]], 1] == "A" (*This condition eliminates duplicate listings*)],
FullTable = Append[FullTable, ThisLine]]
]
, {j, 1, Length[Lines]}];
]

The code does what it's supposed to, but it slows down significantly each time I run it. The first run takes less than .2 seconds, but by the fifth run, it's already above 25 seconds to parse the same file. Quitting the kernel session solves the speed problem, but of course this deletes all my data. CleanSlate, ClearAll, and adjusting $HistoryLength all had no effect. I haven't come across a solution on this forum yet, so I would appreciate any suggestions.
 
Physics news on Phys.org
Update-- I think the problem is that I need to close all these input streams. This post seems to address the same issue:

http://groups.google.com/group/comp...a/browse_thread/thread/5dc2bf7e4793418d?pli=1

I get some improvements when I close the streams in a modified version of this program, but when I try it on the original code, it still slows down. I think I'm missing some streams. Is there a command that would let me close ALL open streams?
 
If you evaluate

s = Streams[]

you will see that s is the list of the open streams you have.

It looks like, unless you are fiddling around with stdout and stderr, that

Map[Close, Drop[s, 2]]

will close everything except the first two, stdout and stderr.

BUT I would be very cautious with that. You might want to do more processing on the result from Streams[] to make sure you weren't closing something you didn't want to.
 
Thanks! That sped it right up.
 
You would likely find your code would be faster if you had

Open[]
Read[]
Close[]

and you did that for each individual file, rather than opening thousands of streams, crunching the data and then closing the thousands of streams.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
7K
  • · Replies 22 ·
Replies
22
Views
4K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 52 ·
2
Replies
52
Views
13K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 13 ·
Replies
13
Views
8K
  • · Replies 1 ·
Replies
1
Views
2K