First question:
Why is g large? Why is g in QED small? When you start computing these diagrams, you end up with integrals that look like they are infinite. But our theory never claims to be a complete theory (valid at arbitrarily short distances), so we hope that we can cut off our integrals at some energy scale \Lambda. If the theory is to be predictive (renormalizable), we might be able to do calculations without knowing what the high energy behaviour is. This isn't so bizarre as it sounds. We can do fluid dynamics because our equations are insensitive to the properties of atoms, save for one or two parameters like viscosity. In the same way, our QCD is insensitive to short-distances, except that we need to specify one or two parameters, like g. Since we don't have a more complete theory, we have to go out and make some measurements in the lab. In the same way, if you didn't know anything about atoms, you could only determine the density of a fluid from experimental measurement. If you knew about atoms, you could, in principle, calculate it.
Ok, now, the fun part. Renormalizable theories like QCD behave well when you scale \Lambda. What I mean by that is, if you change \Lambda, you end up with the same theory, but different values of g. This is reminiscent of fluids also. If you change the size of atoms, you still end up with fluid dynamics, but your viscosities and densities change. Therefore, since we don't actually know what value of \Lambda to use, we pick one at random, and go measure g in the lab. Any prediction you make shouldn't depend on which \Lambda you picked. This means that there is a relationship between g and \Lambda such that the theory is invariant under changes in \Lambda (they are accompanied by changes in g), but predictions like scattering cross-sections are unchanged.
Now, g is only a parameter in your theory, like viscosity. But you could measure the strength of an interaction between two particles in a lab, and call the strength g_{lab}. This isn't the same as g, (because, in principle, it is the result of infinitely many Feynman diagrams instead of just one), and is called the renormalized coupling constant. However, it turns out that because g and \Lambda obey a certain differential equation that reflects the insensitivity of the theory to changes in \Lambda, the renormalized coupling obeys a similar differential equation that contains the energy scale of the *experiment*. This equation tells you how strong g_{lab} gets if you crank up the particle energies.
Furthermore, it turns out that g_{lab} is small when energies are large, and large when energies are small. Quite the opposite of QED, and is very sensitive to the number of colours and flavours of quarks.
For a (much) better explanation, I suggest you take a look at the field theory textbook by Zee.
Second question:
Like what has been said, Feynman diagrams are a way of enumerating terms in an expansion of something more complicated. The problem with g being large is not so much that the diagrams don't converge, (since they don't converge anyway), but that diagrams at any order contribute in essentially non-trivial ways, whereas, for example, in QED, high order diagrams contribute very little by themselves. In QED, it's the fact that you have infinitely many orders to sum that makes it diverge. But as long as you only want an approximate answer, it's ok.
It is often possible to re-sum these series so that they converge. This is when the series is so-called Borel resummable. I believe this isn't the case when the theory has instantons, in which case you have to take care of those separately.
To get a sensible answer out of QCD, then you would need to sum all the diagrams. Or, at least, staggeringly many. This can be done by using tools in string theory, for example. Another possibility is taking the large-N limit, where the number of colours of gluons is taken to be large. Diagrams that can be drawn on a plane without any intersections are the ones that contribute most, and diagrams with more complicated topologies are suppressed by a factor of (g^2 N)^{-1}. In two dimensions, this can be done exactly, as was done by 't Hooft I believe.