Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Developing a simple program to solve the diffusion equation

  1. Oct 16, 2007 #1
    Hi there,
    I'm looking for an algorithm which describes the numeric solution to solve the diffusion equation (1D or 2D).
    I've taken a look on some text book such as Hamilton and Henry ones but I didn't find a simple solution.
    anybody knows about it? Where can I find what I'm looking for?
    Last edited: Oct 17, 2007
  2. jcsd
  3. Oct 17, 2007 #2


    User Avatar
    Science Advisor


    It's trivial to write an algorithm to solve the diffusion equation. In 1-D, the diffusion equation will
    couple together 3 adjacent zones - the diffusion term giving the leakage out the left face of the
    center zone involves a difference between the flux in the zone to the left and the flux in the center

    Likewise, the diffusion term giving leakage out the right side involves the flux to the right, and the
    flux in the center zone. Therefore, the total balance equation couples 3 adjacent zones, left, center,
    right for each center zone. If you write the equations for all the zones in matrix form; you get a
    tri-diagonal matrix. You solve that with a simple Gaussian reduction - two sweeps - a forward
    elimination of the lower diagonal, and a backward substitution that solves the equation.

    If you have 2-D; then you couple together 5 zones - left, right, front, back, and center. Again in
    matrix form; this yields a penta-diagonal matrix. Two forward elimination sweeps - one for each of
    the now 2 lower diagonals; followed by a backward substitution.

    Duck soup!!

    Dr. Gregory Greenman
  4. Oct 17, 2007 #3
    It's slightly more complicated than Morbius says. I recently had to do a 1-D program and it took me about a week.

    The way to attack the problem of making a program is to divide it into parts. First determine the composition and geometery of the system you're looking at, this will allow you to set up the matrix for the calculations you need to do. Duderstalt and Hamiltion should have a section on numerical solutions, I know Stacey's does since I'm using his; those will show you how to set the matrix up. After you have this, you can go on and do the calculations Morbius mentioned above. It is pretty straight foward, but if you're like me and not great at programming, it will take a while.

    Incidentally, I'm now working on a 2-D code. I'm teaching myself FORTRAN from the basics.
  5. Oct 18, 2007 #4


    User Avatar
    Science Advisor


    Actually it is EXACTLY as easy as I stated. If you can program the matrix setup, and the Gaussian
    reduction - that's all there really is to it. You don't need to go looking in books for the solution strategy;
    it's really easy to figure it out.

    Dr. Gregory Greenman
  6. Oct 19, 2007 #5
    Hi Dr. Gregory,
    I actually wanna test some of my parallel computing techniques in coding a program to solve the diffusion equations.
    I'm a comparatively professional developer. If you have any problems in developing software applications don't hesitate to contact me.
  7. Oct 19, 2007 #6
    the real power of parallelization comes with higher dimensional diffusion. in which case, a more efficient way of calculating diffusion is through Monte Carlo integration (since the error scales as 1/sqrt(N) rather than with the dimensionality), which indeed parallelizes trivially. for a 1D case i can't see any reason why you would parallelize.

    if you still want to solve it via matrix methods, then look at the ESSL/BLAS/LAPACK libraries and MPI.
  8. Oct 19, 2007 #7


    User Avatar
    Science Advisor


    If you want something that runs in parallel [ the Gaussian elimination is not a parallel algorithm ], then
    I would suggest some form of Krylov-subspace solver; e.g. like a conjugate gradient method.

    Mehtods such as Krylov-subspace solvers can be made to run in parallel. Much of the work is
    doing the matrix multiplies.

    However, there will be one or two "dot products" of the solution and/or correction vectors.

    These dot products require a "sync point" - basically an operation called an "all reduce".

    Each processor computes its own portion of the dot product, a "partial sum" for the part of the
    vector that the processor "owns". Then all the processors have to sum all their partial sums to
    the "Boss" processor. The Boss processor then has to broadcast the dot product to all the workers.

    The "step length" in the Krylov solve is typically the quotient of two dot products.

    That's my profession also. I'm a "computational physicist". I write physics simulation programs
    for Lawrence Livermore National Laboratory. All of our work is done in parallel since we have
    computers with as many as 131,000 processors:


    LLNL's Center for Advanced Scientific Computing just won one of R&D Magazine's
    "Top 100" Awards in 2007 for the "Hypre" solver library:

    http://www.rdmag.com/ShowPR.aspx?PUBCODE=014&ACCT=1400000100&ISSUE=0707&RELTYPE=PR&ORIGRELTYPE=FE&PRODCODE=00000000&PRODLETT=G&CommonCount=0 [Broken]



    I thought you might be interested in reading about Hypre.

    Dr. Gregory Greenman
    Last edited by a moderator: May 3, 2017
  9. Oct 20, 2007 #8
    Dr. Gregory Greenman

    Thanks for your recommendations. As I got from your note you and your team are doing great paralellization based on distributed processors.
    What I intend to do is paralellization in a local computer with a single processor. I wanna divide my computation tasks to 2 or more seperated threads which are running simultaneously. After doing so I will doing some paralellization on a dual core processor.
    Because of the fact that I'm a nuclear engineer, I intended to test my task on the diffusion equation.
    Nevertheless, It's my honor to get to know more about you and your projects in paralellization.

  10. Oct 20, 2007 #9
    so what does one typically use as a scattering function? (im assuming that we're talking about solving the Boltzmann-Poisson eqn for neutron diffusion?)
  11. Oct 20, 2007 #10


    User Avatar
    Science Advisor


    First - the VARIANCE scales as 1 / sqrt(N) - NOT the error; and that's for a 0-dimensional quantity
    like the k-effective.

    The question as to whether it is more efficient to use Monte Carlo or deterministic methods depends
    on what you want to calculate - and for a typical reactor problem, deterministic methods beat
    Monte Carlo HANDS DOWN!!!

    This was covered in a session at the American Nuclear Society Topical Meeting in Computational
    Methods held last April in Monterey, California. The session was hosted by Prof. William Martin,
    Chairman of the Nuclear Engineering Program at the University of Michigan.

    The question under discussion was a conjecture at a previous meeting by Dr. Kord Smith of
    Studsvik Scanpower, Idaho Falls, ID. The most used codes for reactor analysis is the suite from
    Studsvik; SIMULATE, CASMO,... authored by Dr. Smith and his colleagues. [ Kord and I both
    did our graduate studies together with Professor Alan F. Henry at MIT in the 1970s ]

    Kord's conjecture is that it will be about 15-20 years before one can use Monte Carlo as the basis
    for reactor reload calculations. The consensus of the session hosted by Professor Martin is that
    Kord's conjecture is about right; maybe 10-15 years at best.

    The reason that deterministic methods win over Monte Carlo has to do with what I said above about
    it depending on what you want to calculate. For something simple like a k-eff calculation; or some
    other 0-dimensional quantity - the scaling is 1 / sqrt(N) for the VARIANCE!!

    However, if one is going to do something like a reactor reload calculation, then the quantity of interest
    is the peak heat flux or some other limit that the calculation has to show to be within the tech specs
    for the reactor.

    The peak heat flux occurs at point in space; which is NOT known a priori. Therefore, although the
    variance scales with the inverse root of the number of particles; there is a very low probability that
    the neutron histories that include the peak heat flux point in any number large enough to give one
    anything approaching an acceptable error.

    That's why the 1 / sqrt(N) is NOT the error. That only gives the scaling; NOT the error.
    There is another coefficient out front of that inverse root N scaling. The scaling with N is
    only half the story.

    For something like a point power, or a point heat flux; that coefficient is BIG. So even though
    you can decrease the error by sqrt(N) by making N - the number of neutron histories large - it will take
    you an absolutely HUGE number of histories before you make a dent in that large coefficient.

    The problem is that the quantity of interest - a peak power or peak heat flux - is a very small part
    of the phase space. Zero-dimensional quantities like k-effective which are integrals over the entire
    problem space converge quite fast with Monte Carlo.

    However, point quantities DO NOT. If one knew the point in question a priori - that is - if the
    question was "How many neutrons will be absorbed by a detector at location (x,y,z)?" - then such a
    question can be solved readily by Adjoint Monte Carlo.

    One uses the Monte Carlo technique to solve the Adjoint Boltzmann Transport Equation. When
    you do that - the roles of sources and sinks of neutrons reverse. That is the number of neutrons that
    will be absorbed by a detector in the forward problem; is equal to an integral over the whole problem
    from "adjoint neutrons" sourced at the detector location. Thus in adjoint space, one again is solving
    for an integral over a large volume of the problem.

    However, one has to know the location of the detector a priori, because one has to source
    "adjoint neutrons" in at that location. In the reactor problem, one doesn't know that a priori.

    Therefore, even with the fastest computers; Monte Carlo loses very badly to deterministic methods
    even if the deterministic methods don't take advantage of parallelism. If the deterministic methods
    are coded to take advantage of what parallelism they can; Monte Carlo loses even more.

    I am currently working on a Monte Carlo code; so if anything, my "bias" should be in favor of the
    Monte Carlo. However, facts are facts; for a reactor calculation, deterministic methods beat
    Monte Carlo hands down, and will continue to do so for many years to come.

    Dr. Gregory Greenman
  12. Oct 21, 2007 #11
    Dr. Greenman,

    ok, so you are saying that for this particular problem there is a large scaling prefactor infront of the 1/sqrt(N), and so this currently makes non-stochastic methods desirable. (However, since you say that you are developing a stochastic code I take this to mean that the high levels of parallelism available cheaply now allow you to "make a dent" in that prefactor otherwise you wouldn't bother).

    Unfortunately, we are not always so fortunate...for example, in Quantum Monte Carlo we have no hope of diagonalizing the Hamiltonian, and so we must learn to live with the sampling issues inherent in fermionic systems (due to the antisymmetry of the wavefunction, we run into a "sign problem" whereby calculating averages from the density matrix requires a delicate cancellation of two very large quantities - a situation that is numerically unstable). Ceperely and others have developed methods of nodal sampling (essentially umbrella sampling) that makes this somewhat more tractable, but not much.

    Take path integral MC for example, it truly is taunting since we have a small scaling prefactor, the problem factorizes linearly, and the MC algorithm is trivially parallel. Unfortunately, as i said above, the sampling space is so large (and method of summation is inherently unstable) that unless we can focus around the nodes we are doomed - but if you know where the nodes are (where all of the interesting chemistry happens) then why bother!


    P.S. - i have used the blue gene at SDSC extensively, but given the SUs for their slower hardware i found it somewhat unattractive for what i am doing. i understand the purpose for which the BG was supposedly designed (FLOPS/kW and FLOPS/m^2) but (a) neither of these things really help me and (b) power efficiency doesn't seem to be stopping anyone else from building highly parallel systems either, i.e. look at TACC's Ranger (which at 504 TFlops will displace LLNL's BG/L from the #1 spot on the top500). i think this spells the end of future bluegene systems. the opteron-based Cray XT3 is also an excellent system. i have nothing against the BG per se, but PPC is largely a failure (i.e. no 64-bit) and IBM is trying to spin it a different way.

    http://www.tacc.utexas.edu/resources/hpcsystems/ [Broken]
    Last edited by a moderator: May 3, 2017
  13. Oct 21, 2007 #12


    User Avatar
    Science Advisor


    Your assumption that our reason for developing the Monte Carlo code is to take advantage of
    high levels of parallelism. That's incorrect.

    The reason for using the Monte Carlo code is that for the applications that we are interested in;
    there are certain effects that are more easily implemented in a Monte Carlo code. It's not the
    parallelism that gives us an advantage over deterministic methods - it's that certain effects are
    more difficult to implement in the deterministic code.

    Additionally, for our application; unlike the reactor application; the important constraint is not a
    local one; unlike the reactor problem

    Dr. Gregory Greenman
  14. Oct 21, 2007 #13


    User Avatar
    Science Advisor

    Last edited: Oct 21, 2007
  15. Oct 21, 2007 #14

    you and i have something in common then: I also have implemented a Monte Carlo code in order to avoid doing many-body molecular dynamics with a potential energy function whose derivative is numerically ill-behaved! (Although, I do not get away completely unscathed since the systems that I am studying via this method exhibit long correlation times, which means lots of "wasted" CPU time - but at least the many-body problem is now tractable!)

  16. Oct 21, 2007 #15
    I kindly disagree. If you read the BG white papers those are EXACTLY the reasons that they have used to achieve the high parallelism.

    At 440 Mhz with 512 MB memory (BG/L), the motive was certainly NOT performance: it was to achieve a low power consumption footprint thus allowing higher levels of parallelism.

    From IBM's architectural white-paper:

    Please see the following for an overview of the blue gene architecture:

    http://www.research.ibm.com/journal/rd/492/gara.html [Broken]

    P.S. - note that the BG node consume only 20 W, that is note by accident.
    Last edited by a moderator: May 3, 2017
  17. Oct 21, 2007 #16
    the first BG/P is not expected to be operational until Spring 2008 (and only 111 TFlops at that, petascale systems will follow later) - Ranger will be up and in use sometime in December, making it #1 at least for a little while :)


    P.S. - i do not work for TACC, Ionly like their systems :)
  18. Oct 21, 2007 #17


    User Avatar
    Science Advisor


    That Monte Carlo parallelizes trivially is only trivial when one doesn't think about it too much.

    It would be trivial to parallelize Monte Carlo if one can devote a processor to tracking the entire
    lifecycle of a simulation particle - from birth to death. Unfortunately, that is seldom the case.

    Since a simulation particle can theoretically go anywhere in the problem domain, in order for a
    processor to track a simulation particle from birth to death; it must have a complete description of
    the entire problem geometry. In a machine like Blue Gene/L which has 131,000 processors; full
    utilization of the machine would require 131,000 duplicate representations to reside in memory;
    since each processor must have the complete geometry.

    This would be wasteful of computer memory. No - large Monte Carlo simulations are run on large
    parallel machines using a technique called "domain decomposition". In essence, each processor
    "owns" only a piece of the entire problem. As simulation particles reach the boundary of the domain
    on a given processor, it transits to another domain on another processor, and the simulation particle
    must be "handed off" to the processor that owns the neighboring domain.

    Once processors have to communicate particles back and forth - then one has the same problems
    that one has with a deterministic system. Once the "cradle to grave on a single processor" paradigm
    is broken, one loses the easy parallelism.

    In fact the parallelism problems are WORSE for Monte Carlo. In a deterministic code, if each
    processor has equivalent sized pieces of the problem; then the processors have roughly equivalent
    work. The amount of work a given processor has to do depends on how much of the problem it
    "owns"; and the amount of work does not depend on the solution.

    Therefore, if the domains are parcelled out roughly equally to all processors; then the all the processors
    should arrive at the "sync points" where they need to communicate at roughly equivalent times. This
    makes a "sychronous communication" model viable. That is all processors arrive at the communication
    point at roughly the same time. Therefore, they wait until all processors have arrived, and can then
    exchange the data that is needed across processor boundaries. Additionally, the amount of data
    being shared is symmetric. If processor "i" borders processor "j" at "n" faces; then "i" needs to send
    "j" "n" values for each variable being shared; and vise-versa - "j" needs to send "i" the equivalent
    amount. Each processor needs to know 1 value per face per variable in a 1st order approximation.
    Each processor would have to go 2 values into the adjacent domain for a 2nd order approximation...
    Regardless of order, there's always a symmetric exchange of data.

    That isn't the case in Monte Carlo. In Monte Carlo, there may be a very active region bordering a
    very inactive region. The amount of data that needs to flow is not symmetric. Because processors
    don't have an equal amount of work to do - given varying number of simulation particles; then one
    can't assume that they will all arrrive at a "sync point" nearly simultaneously.

    Therefore, one can't use a "synchronous" communication paradigm unless one is willing to have
    processors waiting a large portion of the time. In this case, an "asynchronous" communication
    model is used. Processors exchange data, not in a determinate way; but in an indeterminate way.
    Processors "buffer" particles as needed and neighoring processors flush these buffers when more
    particles are needed to process.

    The asynchronous communication model also opens the door to potential deadlock conditions if one
    is not careful.

    Additionally, unlike the deterministic case, where processors have roughly equal amounts of work
    to do since the are solving the same equations on roughly equal numbers of zones; Monte Carlo
    simulations inherently are not "load balanced". The workload among the processors may not be
    shared equally.

    One could even the workload and "load balance" by redistributing the regions of space for which a
    given processor "owns". As regions of the problem are transferred between processors in order to
    balance the load; so too the simulation praticles that currently occupy those regions must also be
    transferred. This extensive communication may offset much of the gains to be had from balancing
    the load. Therefore, one should only load balance when it will yield a net improvement in computer
    efficiency. Some of my collegues have figured out how to do that; and have written presented papers
    on the subject. For example, one that was presented in Avignon, France last year is available from
    the Dept. of Energy at:


    One property that one wishes a parallel simulation to have, is that the answers not depend on the
    number of processors on which the simulation was run. If processors "owned" the random number
    seeds, then the history of a given particle would depend on the set of processors that it visited and
    when it did so. Clearly, this would impart a depedence of the simulation results to the number of
    processors involved. In order to sidestep this issue, it is useful for each simulation particle to carry
    its own random number seed. This gives each simulation particle its own pseudo-random number
    sequence, and it won't matter which processor calculates the next number in the sequence.

    However, at the beginning of the problem, the initialization of the random number seeds has to be
    coordinated among the processors in order to avoid duplication.

    All in all, I have coded both deterministic methods, as well as Monte Carlo methods. The complexity
    of the parallelism of the Monte Carlo methods is at least one or two orders of magnitude more complex
    than the deterministic methods.

    Dr. Gregory Greenman
    Last edited: Oct 21, 2007
  19. Oct 21, 2007 #18


    User Avatar
    Science Advisor


    That's for the UNCLASSIFED systems. You don't know what the CLASSIFED ones are doing.

    Dr. Gregory Greenman
  20. Oct 21, 2007 #19

    Yes, I am familiar with domain decomposition methods - infact I utilize decompositions in solving for the many-bodied polarization states (iterative gauss-seidel solver with preconditioning) of my MC code.

    It sounds like you are focusing on a very particular type of problem - neutron diffusion in some fissile material. However, keep in mind that if one was, for example, solving for a statistical mechanical observable, then doing the trivial parallelization where each processor handles a seperate (indepedent) sample is appropriate - infact, desireable since the goal is to sample as many possible states from the ensemble and this method scales linearly; one could hardly do better for this type of problem! The same is true in path integral MC since one wishes to calculate the density matrix accurately - far from being wasteful, starting independent configurations on many processors separately is the best way to do this!

    Molecular dynamics is more of the case that you describe, since we are propagating the entire system along a trajectory in phase space - in which case the forces/potential are decomposed by domain..there are very effective libraries for doing so (CHARM++ being one) that can take advantage of the various interconnect topologies.

    Keep in mind that "Monte Carlo" does not refer to neutron diffusion - it is nothing more than an efficient way of numerically doing integrals!


  21. Oct 21, 2007 #20

    I do not know anything about classified systems - I would never have guessed that such large systems would even be classified, since the current top #1 BG at LLNL is certainly not classified (actually, #1, #3 and #6, all NNSA are online at top500.org).

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook