quetzalcoatl9 said:
the real power of parallelization comes with higher dimensional diffusion. in which case, a more efficient way of calculating diffusion is through Monte Carlo integration (since the error scales as 1/sqrt(N) rather than with the dimensionality), which indeed parallelizes trivially..
quetzalcoatal,
That Monte Carlo parallelizes trivially is only trivial when one doesn't think about it too much.
It would be trivial to parallelize Monte Carlo if one can devote a processor to tracking the entire
lifecycle of a simulation particle - from birth to death. Unfortunately, that is seldom the case.
Since a simulation particle can theoretically go anywhere in the problem domain, in order for a
processor to track a simulation particle from birth to death; it must have a complete description of
the entire problem geometry. In a machine like Blue Gene/L which has 131,000 processors; full
utilization of the machine would require 131,000 duplicate representations to reside in memory;
since each processor must have the complete geometry.
This would be wasteful of computer memory. No - large Monte Carlo simulations are run on large
parallel machines using a technique called "domain decomposition". In essence, each processor
"owns" only a piece of the entire problem. As simulation particles reach the boundary of the domain
on a given processor, it transits to another domain on another processor, and the simulation particle
must be "handed off" to the processor that owns the neighboring domain.
Once processors have to communicate particles back and forth - then one has the same problems
that one has with a deterministic system. Once the "cradle to grave on a single processor" paradigm
is broken, one loses the easy parallelism.
In fact the parallelism problems are
WORSE for Monte Carlo. In a deterministic code, if each
processor has equivalent sized pieces of the problem; then the processors have roughly equivalent
work. The amount of work a given processor has to do depends on how much of the problem it
"owns"; and the amount of work does not depend on the solution.
Therefore, if the domains are parcelled out roughly equally to all processors; then the all the processors
should arrive at the "sync points" where they need to communicate at roughly equivalent times. This
makes a "sychronous communication" model viable. That is all processors arrive at the communication
point at roughly the same time. Therefore, they wait until all processors have arrived, and can then
exchange the data that is needed across processor boundaries. Additionally, the amount of data
being shared is symmetric. If processor "i" borders processor "j" at "n" faces; then "i" needs to send
"j" "n" values for each variable being shared; and vise-versa - "j" needs to send "i" the equivalent
amount. Each processor needs to know 1 value per face per variable in a 1st order approximation.
Each processor would have to go 2 values into the adjacent domain for a 2nd order approximation...
Regardless of order, there's always a symmetric exchange of data.
That isn't the case in Monte Carlo. In Monte Carlo, there may be a very active region bordering a
very inactive region. The amount of data that needs to flow is not symmetric. Because processors
don't have an equal amount of work to do - given varying number of simulation particles; then one
can't assume that they will all arrrive at a "sync point" nearly simultaneously.
Therefore, one can't use a "synchronous" communication paradigm unless one is willing to have
processors waiting a large portion of the time. In this case, an "asynchronous" communication
model is used. Processors exchange data, not in a determinate way; but in an indeterminate way.
Processors "buffer" particles as needed and neighoring processors flush these buffers when more
particles are needed to process.
The asynchronous communication model also opens the door to potential deadlock conditions if one
is not careful.
Additionally, unlike the deterministic case, where processors have roughly equal amounts of work
to do since the are solving the same equations on roughly equal numbers of zones; Monte Carlo
simulations inherently are not "load balanced". The workload among the processors may not be
shared equally.
One could even the workload and "load balance" by redistributing the regions of space for which a
given processor "owns". As regions of the problem are transferred between processors in order to
balance the load; so too the simulation praticles that currently occupy those regions must also be
transferred. This extensive communication may offset much of the gains to be had from balancing
the load. Therefore, one should only load balance when it will yield a net improvement in computer
efficiency. Some of my collegues have figured out how to do that; and have written presented papers
on the subject. For example, one that was presented in Avignon, France last year is available from
the Dept. of Energy at:
http://www.osti.gov/bridge/purl.cover.jsp?purl=/878214-gj95NC/
One property that one wishes a parallel simulation to have, is that the answers not depend on the
number of processors on which the simulation was run. If processors "owned" the random number
seeds, then the history of a given particle would depend on the set of processors that it visited and
when it did so. Clearly, this would impart a depedence of the simulation results to the number of
processors involved. In order to sidestep this issue, it is useful for each simulation particle to carry
its own random number seed. This gives each simulation particle its own pseudo-random number
sequence, and it won't matter which processor calculates the next number in the sequence.
However, at the beginning of the problem, the initialization of the random number seeds has to be
coordinated among the processors in order to avoid duplication.
All in all, I have coded both deterministic methods, as well as Monte Carlo methods. The complexity
of the parallelism of the Monte Carlo methods is at least one or two orders of magnitude more complex
than the deterministic methods.
Dr. Gregory Greenman
Physicist