Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Different types of matrices

  1. May 26, 2006 #1
    I have been doing a lot of research and using a lot of various programs and in order to try and understand how these programs work, I've been reading about various solvers.

    And in the documentation, they mention a series of different types of solvers and matrices and I was wondering if someone would be able to explain them to me?

    - LU
    - Sparse

    (to name a few).

    (in particular in Ansys CFX, and COMSOL)
  2. jcsd
  3. May 27, 2006 #2
    For computational/modeling tools like Ansys CFX, everything boils down to how fast can you solve a linear system Ax = b, where A is a known NxN matrix, b is a known Nx1 vector, and x an unknown Nx1 vector that you're solving for.

    The first way to solve this you learn in Linear Algebra class is Gaussian elimination, and one implementation of this is LU factorization. In this method, you factor the matrix A into L and U such that A = L*U where L is a lower triangular matrix and U is upper triangular. Now if you plug L*U into Ax=b and define y = U*x, you get two linear systems
    Ax=b = L*U*x = b --> Ux = y , Ly = b
    and first solve Ly=b for y, and then solve Ux=y for x. And since L and U are triangular, they are very cheap to solve (back substitution). So now the systems are really cheap to solve, but factoring A is really expensive - if A is dense (non-zeroes in almost every spot), the cost is O(N^3). However, if A is sparse (mostly filled with zeroes), the cost of factoring drops almost to O(N) (I think it can be as low as O(N^1.2)). So if A is sparse, LU is a great method. Also, if you are solving Ax=b many times for the same A but different b, you only have to factor A once, and then just solve Ly=b and Ux=y for different b, which is trivially cheap. So in that case, this method is super. One final benefit of this method is that the storage is relatively cheap. It's possible to factor A "in place", meaning if A is huge (N=100,000) and you only have enough RAM to store one copy of A, you can perform the factorization directly on A and store both L and U in what used to be the A matrix without having to store another matrix in the process of factoring.

    A similar method for solving Ax=b is QR factorization. In this case you factor A such that A = Q*R where R is upper triangular and Q is orthogonal. Again, you get two systems to solve
    Ax = Q*R*x = Qy = b
    So first you solve Qy=b for y, then use y to solve Rx=y. So what's the benefit of this factorization? Since Q is orthogonal, Q*Q' = I, meaning Q' = inverse(Q) ==> y = Q'*b (where ' denotes transpose). So solving the first equation for y is trivial beause it's just a matrix vector product, and then solving Rx=y is easy because R is triangular. Unlike LU you can't do QR in place so it requires more storage, but QR has the ability to solve singular systems without freaking out, which LU can't do. But QR also stinks for dense systems.

    So what do you do for dense systems? You turn to iterative methods (GCR, GMRES, MINRES,...) . In these methods you are trying to solve Ax = b by guessing x, computing Ax, and then checking if it is equal to b. If not, repeat. So the non-technical description would be: you start by guessing x0, compute Ax0, check if Ax0-b=0, and if not, guess x1, compute Ax1, check (Ax1-b), guess x2, ... and so on. The clever thing here is that you can use information in (Ax0-b) to cleverly pick x1, and then use information in (Ax1-b) to cleverly pick x2, and so on, and if you do it right, it turns out that your guesses converge to the solution x quite quickly. How fast you find the solution depends on the clustering of the eigenvalues of A. Solving a dense system using this method is O(kN^2), where k depends on the eigenvalue distribution and in the worst case k=N. In these methods all of the cost comes from computing matrix-vector products, and if you have a clever/fast way to do matrix-vector products, you can drop the cost to O(k Nlog(N)).

    If your A matrix is symmetric, there's other things you can do make things even more efficient.

    Anyway, the bottom line is, what most modeling tools do is look at the sparsity and structure of A, and then pick some method to efficiently solve Ax=b for that A.
  4. Jun 2, 2006 #3
    Thank you for your very helpful reply.

    I am actually starting to get SOME clue/understanding as to what goes on behind these programs.


    Does A have to be a square matrix (i.e. dimensions M x M) in order for either LU or QR to work?

    What does it mean: "O(N^3)"? what's O?

    Would that also mean that LU and QR are direct solvers?

    Is there such a thing as Gaussian factorization? (possibly used the FEA solver in CATIA)

    What's eigenvalue?

    What's multigrid? (be it algebraic multigrid or geometric multigrid, use in Ansys CFX as couple algebraic multigrid)

    Once again, thank you for all your help.
  5. Jun 8, 2006 #4
    Well, as in standard linear algebra stuff, if A is MxM and non-singular, then there is a unique solution and the algorithm will find it. If A is a tall rectangle (more equations than unknowns), then the system is overdetermined and there is probably no solution, but the aglorithms can be modified to find you some sort of least-squares best approximation to the solution. I believe the iterative methods don't care if b lies in the span of the columns of A, it will always return the best solution it can find. If A is a wide recatangle (more unknowns than equations), then the system is under-determined and the algorithms can be easily modified to return you one solution of the many.

    O(N^3) is read as "order N cubed", where N is the size of the system. This means N^3 is the dominating term in the cost of the algorithm. And the cost is usually measure in multiplications. So if you wanted to compute y in y = Ax where A is NxN and x is Nx1, then the first component of y requires N multiplications and adds (it's the first row of A inner producted with x), and the second component of y needs N multiplies to compute, ... and there are N entries in y, so it requires N^2 multiplications to compute y. But if it actually cost N^2+25*N + 7000, we would stil say it's O(N^2), because cost is only important when the system gets really large (N = 10,000), and in that case N^2 >> 25*N + 7000. Usually if you have the algorithms written out in algorithm-speak, you can find the order by counting the number of "for" loops. If you have 3 nested loops which each go from 1 to N, then you will have N^3 operations going on.


    LU decomposition is what is normally implied by Gaussian factorization, since you factor A into L*U, and you use triangularity to eliminate the unknowns (a la Gaussian elimination).

    Just some important stuff about the matrix A. Not important at the moment if you never encountered them before. Although I think they do affect the condition number of the system. Ok, they define the condition number of the system (largest e-value / smallest e-value), which is a number that basically tells you how hard it will be to solve the system (from a numerical standpoint). One big problem comes from numerical errors, because in LU you are always dividing rows by numbers and adding them to other numbers. And if you aren't careful you can lose lots of information in rounding. There are all sorts of schemes for LU that try to prevent this by keeping the biggest terms on the diagonal so you never have to divide by a tiny number and add it to a small number (because to the computer 10^20 + 15 = 10^20, but that may mess up your system). But I'm rambling..

    The multigrid I am familiar with is a technique in either Finite Element or finite difference methods (or maybe both, I can't remember). When you are trying to solve an ODE on some domain (like find the stress on an airplane wing in a wind tunnel), you have to discretize the ODE and the domain (the airplane wing). But when you do this, it turns out that you get this weird error that has a frequency based on the spacing of the grid domain. So to make the method converge faster, you change the grid size (to a coarser discretization I think) and iterate on the new grid to eliminate the error frequencies from the other grid, then go back to the finer grid and do it again. If none of that made any sense, don't worry because it's probably all wrong. I never use FEM but I learned about multigrid once, and it had something to do with switching between a coarse and a fine mesh to eliminate some part of the error, and this supposedly helps you converge to the actual solution faster.

    I hope this was helpful for you.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook