I'm under the impression that the singular value decomposition is generally superior to gaussian elimination when working over the reals or complexes. (but the algorithm I've seen simply won't work over a finite field)
And, I suspect that there ought to be a good way to do it over a finite field.
The algorithm I saw involved finding a subprocedure called "House" -- something relating to something called a Householder matrix.
The algorithm for house involved dividing by something positive -- when I naively tried to apply it to a finite field, it required a division by zero.
I think the house algorithm is, given a vector x, supposed to return b and v such that [itex](I - \beta \nu \nu^T) x = ||x|| e_1[/itex]. The algorithm blew up on the computation of b... I tried assuming it had v right, and found that such a beta cannot exist.
Furthermore, I don't really have a norm on a vector space over a finite field -- many of the interesting vectors have a zero inner product with themselves! So, I strongly suspect that in such cases, even if house could be properly tweaked, I don't know if it would give interesting results at all!
There is an efficient parallel algorithm (in which you allow many processors but far less time allowed for each of them) which performs significantly better than the row reduction, gaussian elimination etc. The result is due to Ketan Mulmuley and was published in 1986. http://portal.acm.org/citation.cfm?id=12164
It is known that the determinant can be computed in parallel. In fact, the similar methods can be extended to the coefficients of the characteristic polynomial of the matrix. Determinant is zero means that characteristic polynomial has its constant term as zero. Now can one generalize this to rank ? Namely is the rank of an nxn matrix, exactly n-k where k is the highest power of x which divides the characterestic polynomial ? Not really, in general one can easily construct counter examples for this.
Mulmuley constructs one family of matrices(M) for which the above generalization is actually true. Then he shows that every matrix A can be transformed to a matrix B in M, where the rank of the matrix A is easily recoverable from the rank of B. That gives the parallel algorithm.