This is a good question (perhaps better than you realize). You really have to view this as two questions, namely 1) Why GTOs are used in wave function methods and 2) Why GTOs are used in DFT methods.
For 1) DrDu's answer is right. It greatly simplifies the integrals. (See for instance Appendix A of Szabo and Ostlund, where they derive the analytical expressions for all the Hartree-Fock integrals of an s-type orbital)
Showing that they form a complete set is fairly simple. Hand-wavingly, you could just remember that the spherical harmonics (which are used for the angular part) form a complete set, and also recall that the eigenfunctions of the harmonic oscillator are gaussians and form a complete set. (Or arrive at the same by solving the 3d QHO) Of course, in reality a basis set is truncated and does not form a complete set, but there do exist complete basis sets (CBSes), such as G2, and also other methods of interpolating to the CBS limit. But it's not a major issue; the larger 'ordinary' basis sets are usually quite enough to bring (relative) basis set errors down within the error of the method. Anyway, so your basis is typically not orthonormal, and so you have to enforce orthonormality through some orthogonalization procedure. (e.g. canonical, symmetric, Gram-Schmidt)
I'm not sure what cgk means with "nothing more efficient is known". I agree nothing more computationally efficient is known, but they're certainly not more efficient 'mathematically', i.e in terms of number of functions required for a given accuracy. For instance, the Slater-Type functions which gaussians replaced, were more accurate in that respect. (e.g. for a HF calc on Helium, a single STO is the basis-set limit!) It's just that the increased number of basis functions with gaussians was more than compensated for by the faster evaluation of the integrals. From the number-of-functions standpoint, gaussians are a stupid choice. They're continuous at r=0, and they do not decay exponentially. So they fail to satisfy the few exact properties we know about the true wave function/density. This means it will always take a relatively large number of gaussians for a good approximation, especially due to the nuclear cusp. (since it always requires many continuous functions to correctly approximate a discontinuity)
2) Now, as for DFT, the situation is rather different. The function you are approximating (the density) is of course very similar to the true wave function that the basis sets were created to approximate, so in that respect they're a good choice. Since existing ab initio QC programs had HF/SCF methods implemented, as well as the basis sets, they had much of the code required to do DFT that way once it started getting popular, so to begin with it was just the most convenient choice.
But rationale about integrals does not hold as well anymore. It's just as good and fine for the Kohn-Sham one- and two-electron integrals as it is for Hartree-Fock. But you also need to evaluate the density functional, which means integrals of the form:
\int f(\rho_\alpha, \rho_\beta) dV and \int f(\rho_\alpha, \rho_\beta, \nabla\rho_\alpha, \nabla\rho_\beta) dV
For LSDA and GGA-type functionals, respectively, where the functional f often depends on \rho^{\frac{4}{3}} and such. These integrals can't be calculated analytically at all, so all DFT codes (to the best of my knowledge) have to do some amount of numerical integration. At the same time you don't (always) have to perform the exchange integrals used in Hartree-Fock, which allows for different approaches to the Coulomb integral. So the rationale for using GTOs in wave function methods doesn't really hold for DFT. Which isn't to say GTOs don't work - they work very well, better than with wave function methods even, since DFT is more resistant to basis set errors. But the field is more open for trying other approaches. The immediately obvious idea would be to bring back STOs - which was done in the Amsterdam DFT program. Another is to go in the opposite direction and move to fully numerical, FEM-type basis sets, which has been done with e.g DMol.
I haven't looked at any recent benchmarks, so I can't comment on the success of these approaches, but at the very least, they're competitive. Bearing in mind that molecular DFT is relatively young (becoming practical circa 1990, I'd say) and that a lot more research effort has been spent on GTO basis sets, I wouldn't categorically state that GTOs will remain the best choice for DFT. On the other hand, absent any remarkable new developments, I think they will remain the standard for wave function methods for the foreseeable future.