## Developing a simple program to solve the diffusion equation

Hi there,
I'm looking for an algorithm which describes the numeric solution to solve the diffusion equation (1D or 2D).
I've taken a look on some text book such as Hamilton and Henry ones but I didn't find a simple solution.
anybody knows about it? Where can I find what I'm looking for?
Thanks.
 PhysOrg.com science news on PhysOrg.com >> City-life changes blackbird personalities, study shows>> Origins of 'The Hoff' crab revealed (w/ Video)>> Older males make better fathers: Mature male beetles work harder, care less about female infidelity

Recognitions:
Science Advisor
 Quote by libertad Hi there, I'm looking for an algorithm which describes the numeric solution to solve the diffusion equation (1D or 2D). I've taken a look on some text book such as Hamilton and Henry ones but I didn't find a simple solution. anybody knows about it? Where can I find what I'm looking for? Thanks.
libertad,

It's trivial to write an algorithm to solve the diffusion equation. In 1-D, the diffusion equation will
couple together 3 adjacent zones - the diffusion term giving the leakage out the left face of the
center zone involves a difference between the flux in the zone to the left and the flux in the center
zone.

Likewise, the diffusion term giving leakage out the right side involves the flux to the right, and the
flux in the center zone. Therefore, the total balance equation couples 3 adjacent zones, left, center,
right for each center zone. If you write the equations for all the zones in matrix form; you get a
tri-diagonal matrix. You solve that with a simple Gaussian reduction - two sweeps - a forward
elimination of the lower diagonal, and a backward substitution that solves the equation.

If you have 2-D; then you couple together 5 zones - left, right, front, back, and center. Again in
matrix form; this yields a penta-diagonal matrix. Two forward elimination sweeps - one for each of
the now 2 lower diagonals; followed by a backward substitution.

Duck soup!!

Dr. Gregory Greenman
Physicist
 It's slightly more complicated than Morbius says. I recently had to do a 1-D program and it took me about a week. The way to attack the problem of making a program is to divide it into parts. First determine the composition and geometery of the system you're looking at, this will allow you to set up the matrix for the calculations you need to do. Duderstalt and Hamiltion should have a section on numerical solutions, I know Stacey's does since I'm using his; those will show you how to set the matrix up. After you have this, you can go on and do the calculations Morbius mentioned above. It is pretty straight foward, but if you're like me and not great at programming, it will take a while. Incidentally, I'm now working on a 2-D code. I'm teaching myself FORTRAN from the basics.

Recognitions:
Science Advisor

## Developing a simple program to solve the diffusion equation

 Quote by theCandyman It's slightly more complicated than Morbius says. I recently had to do a 1-D program and it took me about a week.
Candyman,

Actually it is EXACTLY as easy as I stated. If you can program the matrix setup, and the Gaussian
reduction - that's all there really is to it. You don't need to go looking in books for the solution strategy;
it's really easy to figure it out.

Dr. Gregory Greenman
Physicist

 Quote by Morbius Candyman, Actually it is EXACTLY as easy as I stated. If you can program the matrix setup, and the Gaussian reduction - that's all there really is to it. You don't need to go looking in books for the solution strategy; it's really easy to figure it out. Dr. Gregory Greenman Physicist
Hi Dr. Gregory,
I actually wanna test some of my parallel computing techniques in coding a program to solve the diffusion equations.
I'm a comparatively professional developer. If you have any problems in developing software applications don't hesitate to contact me.

 Quote by libertad Hi Dr. Gregory, I actually wanna test some of my parallel computing techniques in coding a program to solve the diffusion equations. I'm a comparatively professional developer. If you have any problems in developing software applications don't hesitate to contact me.
the real power of parallelization comes with higher dimensional diffusion. in which case, a more efficient way of calculating diffusion is through Monte Carlo integration (since the error scales as 1/sqrt(N) rather than with the dimensionality), which indeed parallelizes trivially. for a 1D case i can't see any reason why you would parallelize.

if you still want to solve it via matrix methods, then look at the ESSL/BLAS/LAPACK libraries and MPI.

Recognitions:
Science Advisor
 Quote by libertad Hi Dr. Gregory, I actually wanna test some of my parallel computing techniques in coding a program to solve the diffusion equations.
libertad,

If you want something that runs in parallel [ the Gaussian elimination is not a parallel algorithm ], then
I would suggest some form of Krylov-subspace solver; e.g. like a conjugate gradient method.

Mehtods such as Krylov-subspace solvers can be made to run in parallel. Much of the work is
doing the matrix multiplies.

However, there will be one or two "dot products" of the solution and/or correction vectors.

These dot products require a "sync point" - basically an operation called an "all reduce".

Each processor computes its own portion of the dot product, a "partial sum" for the part of the
vector that the processor "owns". Then all the processors have to sum all their partial sums to
the "Boss" processor. The Boss processor then has to broadcast the dot product to all the workers.

The "step length" in the Krylov solve is typically the quotient of two dot products.

 I'm a comparatively professional developer. If you have any problems in developing software applications don't hesitate to contact me.
That's my profession also. I'm a "computational physicist". I write physics simulation programs
for Lawrence Livermore National Laboratory. All of our work is done in parallel since we have
computers with as many as 131,000 processors:

https://asc.llnl.gov/computing_resources/bluegenel/

LLNL's Center for Advanced Scientific Computing just won one of R&D Magazine's
"Top 100" Awards in 2007 for the "Hypre" solver library:

http://www.rdmag.com/ShowPR.aspx?PUB...&CommonCount=0

http://www.llnl.gov/str/Oct07/pdfs/10_07.4.pdf

http://www.llnl.gov/CASC/linear_solvers/

I thought you might be interested in reading about Hypre.

Dr. Gregory Greenman
Physicist
 Dr. Gregory Greenman Thanks for your recommendations. As I got from your note you and your team are doing great paralellization based on distributed processors. What I intend to do is paralellization in a local computer with a single processor. I wanna divide my computation tasks to 2 or more seperated threads which are running simultaneously. After doing so I will doing some paralellization on a dual core processor. Because of the fact that I'm a nuclear engineer, I intended to test my task on the diffusion equation. Nevertheless, It's my honor to get to know more about you and your projects in paralellization. Thanks.
 so what does one typically use as a scattering function? (im assuming that we're talking about solving the Boltzmann-Poisson eqn for neutron diffusion?)

Recognitions:
Science Advisor
 Quote by quetzalcoatl9 the real power of parallelization comes with higher dimensional diffusion. in which case, a more efficient way of calculating diffusion is through Monte Carlo integration (since the error scales as 1/sqrt(N) rather than with the dimensionality), which indeed parallelizes trivially..
quetzalcoatal,

First - the VARIANCE scales as 1 / sqrt(N) - NOT the error; and that's for a 0-dimensional quantity
like the k-effective.

The question as to whether it is more efficient to use Monte Carlo or deterministic methods depends
on what you want to calculate - and for a typical reactor problem, deterministic methods beat
Monte Carlo HANDS DOWN!!!

This was covered in a session at the American Nuclear Society Topical Meeting in Computational
Methods held last April in Monterey, California. The session was hosted by Prof. William Martin,
Chairman of the Nuclear Engineering Program at the University of Michigan.

The question under discussion was a conjecture at a previous meeting by Dr. Kord Smith of
Studsvik Scanpower, Idaho Falls, ID. The most used codes for reactor analysis is the suite from
Studsvik; SIMULATE, CASMO,... authored by Dr. Smith and his colleagues. [ Kord and I both
did our graduate studies together with Professor Alan F. Henry at MIT in the 1970s ]

Kord's conjecture is that it will be about 15-20 years before one can use Monte Carlo as the basis
for reactor reload calculations. The consensus of the session hosted by Professor Martin is that
Kord's conjecture is about right; maybe 10-15 years at best.

The reason that deterministic methods win over Monte Carlo has to do with what I said above about
it depending on what you want to calculate. For something simple like a k-eff calculation; or some
other 0-dimensional quantity - the scaling is 1 / sqrt(N) for the VARIANCE!!

However, if one is going to do something like a reactor reload calculation, then the quantity of interest
is the peak heat flux or some other limit that the calculation has to show to be within the tech specs
for the reactor.

The peak heat flux occurs at point in space; which is NOT known a priori. Therefore, although the
variance scales with the inverse root of the number of particles; there is a very low probability that
the neutron histories that include the peak heat flux point in any number large enough to give one
anything approaching an acceptable error.

That's why the 1 / sqrt(N) is NOT the error. That only gives the scaling; NOT the error.
There is another coefficient out front of that inverse root N scaling. The scaling with N is
only half the story.

For something like a point power, or a point heat flux; that coefficient is BIG. So even though
you can decrease the error by sqrt(N) by making N - the number of neutron histories large - it will take
you an absolutely HUGE number of histories before you make a dent in that large coefficient.

The problem is that the quantity of interest - a peak power or peak heat flux - is a very small part
of the phase space. Zero-dimensional quantities like k-effective which are integrals over the entire
problem space converge quite fast with Monte Carlo.

However, point quantities DO NOT. If one knew the point in question a priori - that is - if the
question was "How many neutrons will be absorbed by a detector at location (x,y,z)?" - then such a
question can be solved readily by Adjoint Monte Carlo.

One uses the Monte Carlo technique to solve the Adjoint Boltzmann Transport Equation. When
you do that - the roles of sources and sinks of neutrons reverse. That is the number of neutrons that
will be absorbed by a detector in the forward problem; is equal to an integral over the whole problem
from "adjoint neutrons" sourced at the detector location. Thus in adjoint space, one again is solving
for an integral over a large volume of the problem.

However, one has to know the location of the detector a priori, because one has to source
"adjoint neutrons" in at that location. In the reactor problem, one doesn't know that a priori.

Therefore, even with the fastest computers; Monte Carlo loses very badly to deterministic methods
even if the deterministic methods don't take advantage of parallelism. If the deterministic methods
are coded to take advantage of what parallelism they can; Monte Carlo loses even more.

I am currently working on a Monte Carlo code; so if anything, my "bias" should be in favor of the
Monte Carlo. However, facts are facts; for a reactor calculation, deterministic methods beat
Monte Carlo hands down, and will continue to do so for many years to come.

Dr. Gregory Greenman
Physicist

 Quote by Morbius That's why the 1 / sqrt(N) is NOT the error. That only gives the scaling; NOT the error. There is another coefficient out front of that inverse root N scaling. The scaling with N is only half the story.
Dr. Greenman,

ok, so you are saying that for this particular problem there is a large scaling prefactor infront of the 1/sqrt(N), and so this currently makes non-stochastic methods desirable. (However, since you say that you are developing a stochastic code I take this to mean that the high levels of parallelism available cheaply now allow you to "make a dent" in that prefactor otherwise you wouldn't bother).

Unfortunately, we are not always so fortunate...for example, in Quantum Monte Carlo we have no hope of diagonalizing the Hamiltonian, and so we must learn to live with the sampling issues inherent in fermionic systems (due to the antisymmetry of the wavefunction, we run into a "sign problem" whereby calculating averages from the density matrix requires a delicate cancellation of two very large quantities - a situation that is numerically unstable). Ceperely and others have developed methods of nodal sampling (essentially umbrella sampling) that makes this somewhat more tractable, but not much.

Take path integral MC for example, it truly is taunting since we have a small scaling prefactor, the problem factorizes linearly, and the MC algorithm is trivially parallel. Unfortunately, as i said above, the sampling space is so large (and method of summation is inherently unstable) that unless we can focus around the nodes we are doomed - but if you know where the nodes are (where all of the interesting chemistry happens) then why bother!

Q

P.S. - i have used the blue gene at SDSC extensively, but given the SUs for their slower hardware i found it somewhat unattractive for what i am doing. i understand the purpose for which the BG was supposedly designed (FLOPS/kW and FLOPS/m^2) but (a) neither of these things really help me and (b) power efficiency doesn't seem to be stopping anyone else from building highly parallel systems either, i.e. look at TACC's Ranger (which at 504 TFlops will displace LLNL's BG/L from the #1 spot on the top500). i think this spells the end of future bluegene systems. the opteron-based Cray XT3 is also an excellent system. i have nothing against the BG per se, but PPC is largely a failure (i.e. no 64-bit) and IBM is trying to spin it a different way.

http://www.tacc.utexas.edu/resources/hpcsystems/

Recognitions:
Science Advisor
 Quote by quetzalcoatl9 ok, so you are saying that for this particular problem there is a large scaling prefactor infront of the 1/sqrt(N), and so this currently makes non-stochastic methods desirable. (However, since you say that you are developing a stochastic code I take this to mean that the high levels of parallelism available cheaply now allow you to "make a dent" in that prefactor otherwise you wouldn't bother).
quetzalcoatal,

Your assumption that our reason for developing the Monte Carlo code is to take advantage of
high levels of parallelism. That's incorrect.

The reason for using the Monte Carlo code is that for the applications that we are interested in;
there are certain effects that are more easily implemented in a Monte Carlo code. It's not the
parallelism that gives us an advantage over deterministic methods - it's that certain effects are
more difficult to implement in the deterministic code.

Additionally, for our application; unlike the reactor application; the important constraint is not a
local one; unlike the reactor problem

Dr. Gregory Greenman
Physicist

Recognitions:
Science Advisor
[QUOTE=quetzalcoatl9;1475528] i.e. look at TACC's Ranger (which at 504 TFlops will displace LLNL's BG/L from the #1 spot on the top500). i think this spells the end of future bluegene systems. /QUOTE]
quetzalcoatl,

The next occupant of the #1 spot on the Top500 is another BlueGene/L class machine currently in
the pipeline called "PetaFlop".

"PetaFlop" will be a 1,000 Teraflop machine - i.e. 1 Petaflop.
 i understand the purpose for which the BG was supposedly designed (FLOPS/kW and FLOPS/m^2)
That wasn't the purpose of BG/L at all. Power consumption and floor space is NOT a concern.

Blue Gene/L takes up about 1/3rd of one of the two big computer rooms in LLNL's Terascale Simulation
Facility - the TSF. There was no need to scrimp on floor space; LLNL has PLENTY of room for
computers.

Power is not a factor either. In the other big computer room that neighbors the one that Blue Gene/L's
room in the TSF is another computer called ASC Purple. ASC Purple is currently #6 on the Top 500
list; down from when it debuted at #3. ASC Purple consumes about 15 Megawatts of power.

Power consumption is not a factor either.

Dr. Gregory Greenman
Physicist

Dr. Gregory Greenman
Physicist

 Quote by Morbius quetzalcoatal, The reason for using the Monte Carlo code is that for the applications that we are interested in; there are certain effects that are more easily implemented in a Monte Carlo code. It's not the parallelism that gives us an advantage over deterministic methods - it's that certain effects are more difficult to implement in the deterministic code.
Morbius,

you and i have something in common then: I also have implemented a Monte Carlo code in order to avoid doing many-body molecular dynamics with a potential energy function whose derivative is numerically ill-behaved! (Although, I do not get away completely unscathed since the systems that I am studying via this method exhibit long correlation times, which means lots of "wasted" CPU time - but at least the many-body problem is now tractable!)

Q

 Quote by Morbius That wasn't the purpose of BG/L at all. Power consumption and floor space is NOT a concern. Blue Gene/L takes up about 1/3rd of one of the two big computer rooms in LLNL's Terascale Simulation Facility - the TSF. There was no need to scrimp on floor space; LLNL has PLENTY of room for computers. Power is not a factor either. In the other big computer room that neighbors the one that Blue Gene/L's room in the TSF is another computer called ASC Purple. ASC Purple is currently #6 on the Top 500 list; down from when it debuted at #3. ASC Purple consumes about 15 Megawatts of power. Power consumption is not a factor either.
I kindly disagree. If you read the BG white papers those are EXACTLY the reasons that they have used to achieve the high parallelism.

At 440 Mhz with 512 MB memory (BG/L), the motive was certainly NOT performance: it was to achieve a low power consumption footprint thus allowing higher levels of parallelism.

From IBM's architectural white-paper:

 The reasons why we chose an SoC design point included a high level of integration, low power, and low design cost. We chose a processor with modest performance because of the clear performance/power advantage of such a core. Low-power design is the key enabler to the Blue Gene family.
Please see the following for an overview of the blue gene architecture:

http://www.research.ibm.com/journal/rd/492/gara.html

Q
P.S. - note that the BG node consume only 20 W, that is note by accident.

 Quote by Morbius The next occupant of the #1 spot on the Top500 is another BlueGene/L class machine currently in the pipeline called "PetaFlop".
the first BG/P is not expected to be operational until Spring 2008 (and only 111 TFlops at that, petascale systems will follow later) - Ranger will be up and in use sometime in December, making it #1 at least for a little while :)

Q

P.S. - i do not work for TACC, Ionly like their systems :)

Recognitions:
Science Advisor
 Quote by quetzalcoatl9 the real power of parallelization comes with higher dimensional diffusion. in which case, a more efficient way of calculating diffusion is through Monte Carlo integration (since the error scales as 1/sqrt(N) rather than with the dimensionality), which indeed parallelizes trivially..
quetzalcoatal,

That Monte Carlo parallelizes trivially is only trivial when one doesn't think about it too much.

It would be trivial to parallelize Monte Carlo if one can devote a processor to tracking the entire
lifecycle of a simulation particle - from birth to death. Unfortunately, that is seldom the case.

Since a simulation particle can theoretically go anywhere in the problem domain, in order for a
processor to track a simulation particle from birth to death; it must have a complete description of
the entire problem geometry. In a machine like Blue Gene/L which has 131,000 processors; full
utilization of the machine would require 131,000 duplicate representations to reside in memory;
since each processor must have the complete geometry.

This would be wasteful of computer memory. No - large Monte Carlo simulations are run on large
parallel machines using a technique called "domain decomposition". In essence, each processor
"owns" only a piece of the entire problem. As simulation particles reach the boundary of the domain
on a given processor, it transits to another domain on another processor, and the simulation particle
must be "handed off" to the processor that owns the neighboring domain.

Once processors have to communicate particles back and forth - then one has the same problems
that one has with a deterministic system. Once the "cradle to grave on a single processor" paradigm
is broken, one loses the easy parallelism.

In fact the parallelism problems are WORSE for Monte Carlo. In a deterministic code, if each
processor has equivalent sized pieces of the problem; then the processors have roughly equivalent
work. The amount of work a given processor has to do depends on how much of the problem it
"owns"; and the amount of work does not depend on the solution.

Therefore, if the domains are parcelled out roughly equally to all processors; then the all the processors
should arrive at the "sync points" where they need to communicate at roughly equivalent times. This
makes a "sychronous communication" model viable. That is all processors arrive at the communication
point at roughly the same time. Therefore, they wait until all processors have arrived, and can then
exchange the data that is needed across processor boundaries. Additionally, the amount of data
being shared is symmetric. If processor "i" borders processor "j" at "n" faces; then "i" needs to send
"j" "n" values for each variable being shared; and vise-versa - "j" needs to send "i" the equivalent
amount. Each processor needs to know 1 value per face per variable in a 1st order approximation.
Each processor would have to go 2 values into the adjacent domain for a 2nd order approximation...
Regardless of order, there's always a symmetric exchange of data.

That isn't the case in Monte Carlo. In Monte Carlo, there may be a very active region bordering a
very inactive region. The amount of data that needs to flow is not symmetric. Because processors
don't have an equal amount of work to do - given varying number of simulation particles; then one
can't assume that they will all arrrive at a "sync point" nearly simultaneously.

Therefore, one can't use a "synchronous" communication paradigm unless one is willing to have
processors waiting a large portion of the time. In this case, an "asynchronous" communication
model is used. Processors exchange data, not in a determinate way; but in an indeterminate way.
Processors "buffer" particles as needed and neighoring processors flush these buffers when more
particles are needed to process.

The asynchronous communication model also opens the door to potential deadlock conditions if one
is not careful.

Additionally, unlike the deterministic case, where processors have roughly equal amounts of work
to do since the are solving the same equations on roughly equal numbers of zones; Monte Carlo
simulations inherently are not "load balanced". The workload among the processors may not be
shared equally.

One could even the workload and "load balance" by redistributing the regions of space for which a
given processor "owns". As regions of the problem are transferred between processors in order to
balance the load; so too the simulation praticles that currently occupy those regions must also be
transferred. This extensive communication may offset much of the gains to be had from balancing
the load. Therefore, one should only load balance when it will yield a net improvement in computer
efficiency. Some of my collegues have figured out how to do that; and have written presented papers
on the subject. For example, one that was presented in Avignon, France last year is available from
the Dept. of Energy at:

http://www.osti.gov/bridge/purl.cove...878214-gj95NC/

One property that one wishes a parallel simulation to have, is that the answers not depend on the
number of processors on which the simulation was run. If processors "owned" the random number
seeds, then the history of a given particle would depend on the set of processors that it visited and
when it did so. Clearly, this would impart a depedence of the simulation results to the number of
processors involved. In order to sidestep this issue, it is useful for each simulation particle to carry
its own random number seed. This gives each simulation particle its own pseudo-random number
sequence, and it won't matter which processor calculates the next number in the sequence.

However, at the beginning of the problem, the initialization of the random number seeds has to be
coordinated among the processors in order to avoid duplication.

All in all, I have coded both deterministic methods, as well as Monte Carlo methods. The complexity
of the parallelism of the Monte Carlo methods is at least one or two orders of magnitude more complex
than the deterministic methods.

Dr. Gregory Greenman
Physicist

 Similar discussions for: Developing a simple program to solve the diffusion equation Thread Forum Replies Nuclear Engineering 15 Introductory Physics Homework 4 General Math 7 Calculus & Beyond Homework 0 Differential Equations 1