When solving differential equations numerically with finite difference methods, textbooks get to the point of solving: A psi_(n+1) = B psi_n (with A, B some matrices, typically complex conjugate of each other) and advise on using LU decomposition to do so. My question is, why not solving it as: psi_(n+1) = A^{-1} B psi_n which, by the way, shows that the method is explicit, you compute psi at time n+1 from psi at time n. The alleged deep difference between Euler method and, say, Crank-Nicolson, is mystifying to me, if you rewrite A^{-1} B = C, they are formally identical, just using different matrix elements for C. Not really a big deal. What am I missing?
Welcome to Physics Forums. In general, explicit methods aren't always stable for all values of parameters. However, one can often construct implicit methods that are stable for all, or a larger range of parameters than the explicit method. Using you example above, the explicit regime would fail for values where A is singular.
Algorithms care about things like running time and numerical accuracy. Matrix inversion is relatively slow and often fairly inaccurate, so it's often good to avoid it.
For large problems usually the matrix A is very sparse and often has a regular structure of zero and non-zero terms. The LU decomposition usually retains most of the sparseness in L and U (especailly if you use modern methods like symbolic decomposition to re-order the rows and column in A). On the other hand A^{-1} will most likely be a full matrix For a problem with many degrees of freedom, solving using the LU decomposition may be several orders of magnitude faster (i.e 1,000 or even 100,000 times faster) than explicitly finding A^{-1}. I think you are confusing "explicit vs implicit finite difference methods" with "direct vs iterative solvers". The designation of "explicit" or "implicit" depends on the difference equations themselves, not on the numerical method you use to solve them.
Dear all, Thanks for your replies. To Hootenanny & Hurkyl, are you saying it's not equivalent to solve A psi_{n+1} = B psi_n and psi_{n+1} = A^{-1} B psi_n ? (I've never met the case where A is singular). If so, in which sense is it not? I compared both and I see no discrepancy in computation time or result (on solving heat equation on large grids). In more details to AlephZero: > For a problem with many degrees of freedom, solving using the LU decomposition may be several orders of magnitude faster (i.e 1,000 or even 100,000 times faster) than explicitly finding A-1. It's not my experience, comparing LU decomposition and inverting matrices, I see no difference in time and really negligible in numerical deviation on grids of 2^10 in space and time. In fact algorithms of matrices inversion are so good that even in high-level languages like mathematica it takes a fraction of a second for such large spaces. Actually, if you use a multi-step procedure where: A_s psi_{n+s/L} = B_s psi_{n+(s-1)/L} invokes L times the LU decomposition, then I find it much faster to compute once and for all M = A_1^{-1} B_1 A_2^{-1} B_2 ... A_L^{-1} B_L and then iterate: psi_{n+1} = M psi_{n} than LU decompose L times in a row. So why people always present the scheme by repeated applications of LU decomposition? Taking into account that: * they do so even if M is not sparse (e.g., from a 2r-1 Padé approximants that fill very much the matrices) * they do so even if L repeated application of the LU is needed (with L typically of the order of 10), rather than a single product. It is also much easier to code direct explicit products than a chain of LU solvers. The other point is less important for me, but still: > The designation of "explicit" or "implicit" depends on the difference equations themselves, not on the numerical method you use to solve them. if you take, say, Wikipedia's definition for implicit & explicit: Explicit methods calculate the state of a system at a later time from the state of the system at the current time, while implicit methods find a solution by solving an equation involving both the current state of the system and the later one. and if you translate this to say that, psi_{n+1} = M psi_{n} is explicit in that it doesn't need the values of psi_{n+1} (later time) but only of psi_{n} (current time), then Crank-Nicolson is explicit [unless A is singular which is usually not the case]! As you all said, the scheme is however known to be implicit with characteristic features like stability. Still I only need psi_{n} to compute psi_{n+1}. Note that I'm not defending my understanding, the fact is, I don't understand, this all goes against what I read (like, CN is stable because implicit... but it seems explicit to me in the way I compute next time-step from the previous one and only the previous one).
I assert that you can't really understand numerical calculation until you have fully digested and internalized this fact: [itex]x + (y + z) \neq (x + y) + z[/itex] for most x, y, and z
Go to matlab and type A=eye(4000); lu(A); inv(A); compare results (I have 3.8 seconds vs 22.2 seconds using tic toc)
You're right that LU decomposition is faster than matrix inversion, but as it has to be done once only, and takes seconds on larger spaces than use in practise, the gain is not compelling, given that it's much easier to write the code that gives psi_{n+1} in a single computation of C psi_n than to write repeated applications of LU decomposition. So is time the only reason? Is there no degradation of numerical accuracy as well from matrix inversion (I have grids of 2^10 and see no appreciable time difference, but there is little numerical variation although it's at the level of noise). I mean if the gain is only quantitative (and small) in time, it's odd to me this appears as a sacred step, with people starting from the Cayleigh form of the propagator and, even if not going to do any numerical step, just for the purpose of illustration, right away transform that into a form for LU decomposition. To Hurkyl, I appreciate your attempts to be as cryptic as possible, you are indeed making your comments completely incomprehensible. I don't see your point with non-associative algebra and "numerical calculation" (just in case the confusion is on your side, my matrices all commute).
maybe by this you mean inversion of a matrix propagates numerical error in a way LU doesn't? But if this so, why isn't it documented? Why is the fact not spelt out to say that LU is required, rather than matrix inversion (and thus time is not the relevant parameter but some numerical stability or error propagation). Also, actual numerical integration show very small discrepancy (but it might become worst in other cases).
The point is that floating point calculations are non-associative -- even floating-point addition is non-associative. For example (calculated in python 2.6): Code (Text): >>> (1.0 + -1.0) + 0.001 0.001 >>> 1.0 + (-1.0 + 0.001) 9.9999999999988987e-05 (I had assumed that you had seen facts like this before and just hadn't really thought through the implications, rather than this being effectively a totally new idea to digest) Similarly Matrices that would have commuted when doing computations exactly rarely commute when doing computations numerically. They are only approximately commutative. If you're doing a small calculation with well-conditioned matrices and without a great need for precision, then issues of numerical stability probably aren't very relevant. Of course, issues of time probably aren't very relevant either. Many algorithms (I don't know if the algorithms you're specifically asking about are of this type) work by finding the way to measure the error of a putative solution and refining it into a better solution. For these, the end result will generally be similar no matter how accurate the individual steps are (assuming they are accurate enough for the algorithm to actually finish) -- but if those steps have better accuracy, it means that the algorithm will finish more quickly. As an aside, many languages (such as Mathematica) are quite inefficient at doing small calculations -- the computer spends more time reading and interpreting the program than it does actually executing the individual steps. In these languages, it is usually counter-productive to break a small computation into several tiny computations, even if that would result in less computational work. (because it results in more work being spent reading and interpreting the program)
I don't follow exactly what you are saying about multistep or multigrid methods, but consider a 1-dimensional heat transfer/diffusion finite difference scheme using Crank Nicholson. To get rid of any complications because of boundary conditions, assume the temperature of both end points are prescribed functions of time. The matrix equations are tri-diagonal. If there are N grid points, doing the LU decomposition takes O(N) operations and solving the equatinos at each time step also takes O(N) operations. On the other hand, your M matrix is fuly populatedl, and the matrix multiply for each step takes O(N^{2}) operations. For N = 1000, it would be quite reasonable to expect the LU decomposition to run 100 times faster than your method. Your LU decomposition and equation solving code must be doing something inefficient. Possibly it is treating all the matrices as full. Ignoring the sparsity, it would waste most of its run time calculating that 0*0 + 0 = 0. For a 2D (NxN) or 3D (NxNxN) mesh, if you use what is probably the simplest sparse matrix option (just treat the system matrix as a banded matrix) again you will get a speed up of about N times. But you can do much better than that because the matrix is much more sparse. There are only 5 or 7 non-zero terms in each row, independent of N, for the standard 2D or 3D difference schemes. For large N, a really efficient solver based on LU decomposition will get close to being O(N^{2}) tiimes faster than times faster than your matrix-multiply method for 2D, and O(N^{3}) times faster for 3D.
Dear AlephZero and Hurkyl, > [...] 1-dimensional heat transfer/diffusion finite difference scheme using Crank Nicholson. To get rid of any complications because of boundary conditions, assume the temperature of both end points are prescribed functions of time. Okay, the general idea is now clear, in principle, you are right. In practise, not using CN but something which will be L-diagonal with L that could be 10 or more, and using boundary conditions, and using multi-steps, so that you don't calculate psi_{n+1} directly but S intermediate time steps (S is also typically 10 or so in the literature), then if you compute the propagating matrix M = A_1^{-1} B_1 A_2^{-1} B_2 ... A_S^{-1} B_S only once (takes typically less than a minute for 2^9 or so, which is big enough to provide a smooth function not-distinguishable by eye from the actual solution), then you just have to iterate once: psi_{n+1} = M psi_{n} whereas with LU, you have to repeat the procedure S times: A_1 psi_{n+1/S} = B_1 psi_{n} ... A_S psi_{n+1} = B_S psi_{n+(S-1)/S} and this turns out to be possibly longer in time than the previous approach, not counting it's more difficult to code. The algorithm for LU decomposition is Mathematica, which I assume is quite efficient. I wouldn't call 2^9 or 2^10 grids "small" problems, and time doesn't really enter here, but I understand in principle if I push to 2^12, 2^20, etc., it could become exponentially different in favour of LU, you could be right, I will check. Hurkyl seems to make a more important point to me, one that I worry more about, namely, that of performing operations that are not numerically fragile, where errors get cancelled, etc., and my entire question could reduce to whether repeated LU decompositions are numerically better than the single-step that involves indeed a fully occupied matrix, but for which one runs a single product once. Again without systematic investigations, both procedures (S LU in a row & single product) yield different results, but both are different from the analytical solutions, and both at the level of noise, so it's not clear either if one is superior to the other in this respect. My point is, it looks more straightforward to me to get directly psi_(n+1) in a single operation than following a long series of steps, and comparison of both shows no notable difference on decent cases (large grids and long times). At the same time all numerical methods take considerable care in putting the problem in LU form, as if it's very bad to do otherwise. So the whole thing looks a bit mystifying to me (nobody either says: "and now we put it in LU form because (x+y)+z != x+(y+z); they just do it as if it's the only thing to do). I didn't make systematic tests on larger grids, but will, and if something notable comes out will post it again here. Last thing is Hurkyl's comment that Mathematica is not good at small problems, which is quite notable because usually people say the opposite, it's not good at large problems. More or less, you clarified the situation: in principle, on large problems, LU is much faster and looks numerically more stable, so is the favoured solution.
Most systems of equations from finite difference schemes on regular (structured) grids are well conditioned, so unless you are trying to solve a large system using 32-bit floating point rather than 64-bit, I wouldn't lose much sleep worrying about that. An error and stability analysis of the finite difference scheme itself (for example what size time steps are a good trade off between high accuracy and short run times) is usually more important than numerical rounding errors.
Okay but then I am completely confused with respect to the time gain with LU. By using the highest Padé expansion & five time steps method from this text [http://goo.gl/nLB6X], I find, for a 2^8 real-space grid: Computation of the propagator: 0.26s Calculation of 2^9 time steps: 0.1s that is, completely negligible. Then, by repeated applications of the LU decomposition: 140s. Both methods give the same result up to numerical noise, namely, they both differ of order 10^-9 with the exact solution (particle in a quadratic potential) while they differ of order 10^-13 between each other, which makes you right that rounding errors are negligible as compared to those of the finite difference method itself. There has to be something wrong with the way I do the LU. I use Mathematica's LinearSolve, which, in the notes of technical implementation, specify: LUDecomposition, Inverse, RowReduce and Det use Gaussian elimination with partial pivoting. LinearSolve uses the same methods, together with adaptive iterative improvement for high-precision numbers. Still, if numerical stability of the inverse is not bad, we come back to my first question. Why not just using the single-step product? Actual case on large grids (I can use larger grids, I took 2^8 only because LU becomes slower still) give me no hint whatsoever why all finite difference scheme are specifically brought away from the single iteration, natural form: psi_(n+1) = M psi_n to the LU form.
Sorry, but I can't read the paper in your link without paying for it. If the answers are the same to 13 figures, the maths of the algorithm must be right (or you got VERY liucky!) Your description of the Mathematica solver seems to be complete overkill for this. You shouldn't need any pivoting, iterative improvement, etc. Try using the code from a library like LINPACK or LAPACK, and pick the routines that are the best match to the form of your equations (i.e. if you have banded symmetric positive definite equations, don't use the general purpose full matrix solver). I already answered that - Operation counts. Actially, Office Shredder gave a nice example of the pitfalls of using "packaged" algorithms to write efficient code by posting Clearly Matlab didn't figure out that it was decomposing and inverting a diagonal matrix, otherwise it woudl have taken microseconds, not seconds! Note, I'm not complaining it didn't figure out it was a UNIT matrix, only that it didn't figure out it was a (very) SPARSE matrix.
It's just a generalization of CN with effect of: * having L-diagonal matrices (rather than 3-diagonal for CN) * having s time-steps iterations (rather than 1 for CN) The later form they put as repeated applications of the LU, so you get psi_{n+i/s} for i =1, ..., s, as intermediate steps. By computing the propagator once, you step directly to psi_{n+1}. I fail to see the advantage of the first case so I came here and asked. I don't think it's luck because it works with different potentials, different boundary conditions, different wavefunctions, etc. That's the method it uses, it's based on LU, because that's what is most efficient. I copied this bit so that the answer doesn't come out to be "Mathematica just use completely inadequate algorithms in one case". It's using LU. I'm not trying to get the thing working, I've got it running to the accuracy I want. The problem I have is that I see a complete contradiction with the literature that insists on doing it in a given way, by repeated applications of the LU decompositions, whereas a naive approach (easier to write down) turns out to be much more efficient. So I am puzzled why people insists on one way and not the other. You tell me it's not numerical accuracy (which I don't see as a problem either) and that in principle, from the complexity of the algorithm, LU is orders of magnitude better than inverting the matrix. The only logical explanation is that Mathematica turns out to be very inefficient to do the LU (which I find very spooky, their linear algorithm are really good, they automatically detect sparse matrices and so on) and if I would do it in low-level language, I would go from 140s to 0.001s and get better time with LU as compared to inversion of matrix which would stay at 0.1s or so. I'll have to take that on faith, or test more with low-level code like c or python. Still it's perplexing that LU is systematically put as *the* way to do it, when in some cases at least, it turns out to be orders of magnitude slower for the same accuracy. If I am correct, which I won't take as granted yet, people should say: "in some cases, like if you do this with a high-level language like Mathematica, it turns out to be better to compute the propagator once and merely iterate it rather than going through a series of LU decompositions".