Couple of days ago I downloaded a book on numerical optimization, hoping to get clearer picture on some techniques. But, I'm surprised that some of the concepts were not precisely separated from one another. Namely, in the part of "coordinate descent methods" (cdm), I found that, in the case the contribution of each point to the function in question is known, one applied the Gauss-Southwell approach to select the point to be updated. In case the update is done is a sequence from 1 to n, the the approach is referred to as the "cyclic descent method" (cydm). However, on some other places I've found that Gauss-Seidl method's principle is close to cydm's. Is Gauss-Seidl actually another name for cydm? I hope someone will clarify this for me. Another question is regarding the Jacobi method, which is a "hold on" version of Gauss-Seidl, in my understanding (regarding the point update at the end of iteration). Given certain function f(x) which is quadratic in x, in order to obtain its minimal, I set the derivative to 0, and obtain the following linear system Ax=b which I decide to solve by Jacobi iteration. Given that A is strictly diagonally dominant, is it true that the value of f(x1), where x1 is obtained by a single Jacobi iteration on some arbitrary x0 (but not the stationary point) satisfies f(x1)<f(x0)?
Let me clarify what I'm looking for : given a function [tex] f(x)=\frac{1}{2}x^TAx - b^Tx, [/tex] and the update [tex] x_1=D^{-1}(b-Rx_0) [/tex] where A=D+R, with the being the diagonal matrix with diagonal entries of A (and R containing the off-diagonal entries of A), how could I show [tex] f(x_0)>f(x_1) [/tex]
What you want to show is false, in general. For example take ##A = \begin{bmatrix}1 & 2 \\ 2 & 1 \end{bmatrix}, \quad b = \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \quad x_0 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}##
Sorry, I was reading post #2 and missed "strictly diagonal dominant" from post #1. One way to prove this is to show the eigenvalues of ##D^{-1}R## have modulus < 1, using the http://en.wikipedia.org/wiki/Gershgorin_circle_theorem.
Many thanks for the suggestion. Note that following: [tex] D^{-1}R=D^{-1}(V-D)=D^{-1}V-I [/tex] In my choice of V, the matrix [tex]D^{-1}V[/tex] reduces to a symmetric matrix with the only non-negative elements on the diagonal begin equal to 1: [tex] D^{-1}V=\left(\begin{array}{ccccc} 1 & & & & \\ & \ddots & & -c_{ij} \\ & & \ddots & & \\ & -c_{ij} & & \ddots & \\ & & & & 1 \end{array}\right) \in\mathbb{R}^{n\times n} [/tex] such that the sum of the absolute values of all the off-diagonal elements in each row and column sum to 1, which is the corresponding diagonal element. By matrix difference $D^{-1}V-I$, this implies that the diagonal elements of the matrix $D^{-1}R$ are zero, and in terms of the proposed Gershgorin circle theorem, this would correspond to n disks D(0, 1), meaning that eigenvalues of [tex]eigs(D^{-1}R)\leq 1[/tex] (correct me if I'm wrong). How to proceed with the proof further?
You can get rid of the vector b by shifting the origin so the solution vector is zero, i.e ##y = x - \bar x## where ##A \bar x - b = 0##. Then use the fact that any vector y is a linear combination of the eigenvectors of ##D^{-1}R##.