Let's take the simplest possible case. You've got a plane, a flat plane (no curvature). On the plane you have cartesian coordinates (x,y) and some more general coordinates. Rather than making them truly general, at this point we'll just use polar coordinates, r and ##\theta##.
Suppose you have two points on the plane, P and a nearby point Z. Note we are imagining that P and Z are the same set of points - we've only changed the description of them, i.e. the coordinates.
Now we can create a vector that represents the generalized displacement from P to Z by subtracting the coordinates. We will have in one case ##\delta_x## and ##\delta_y## as components of the vector, in the other case we will have ##\delta_r## and ##\delta_\theta##
There will be some linear transformation between the two descriptions of nearby points, if you remember the chain rule of calculus you might write for instance
\delta_x = \frac{dx}{dr} \delta_r + \frac{dx}{d\theta} \delta_\theta \quad \delta_y = \frac{dy}{dr} \delta_r + \frac{dy}{d\theta} \delta_\theta<br />The important thing is that there is a linear relationship between the two descriptions, the particular notation I used assumes that you have some function x(r, ##\theta##) and another function y(r, ##\theta##) in order to calculate the values of the derivatives which give the linear relationship.
The linear part is more in line of an assumption, by the way - we are assuming that we always stick close enough to P that only the linear terms matter, and that we can ignore any second order terms that might exist.
This part is a super-brief description of what we'd describe in abstract mathematical terms as the existence of a "vector space" near P. And we've also implicitly specified a specific set of what are called "basis vectors" near P, which are the coordinate basis. I didn't introduce the necessary notation, rather than talk around the issue let me just illustrate a modernish non-tensor form of the notation that may not actually match any particular textbook you might use. My textbooks all use tensor notation, because I don't have any textbooks to consult for non-tensor notation, I'm hoping that my possibly non-standard notation will make sense and get the point across.
##\delta_x## and ##\delta_y## are just numbers. To have a true vector, we notationally write ##\delta_x \vec{x} + \delta_y \vec{y}##. The things with the arrows over them are the actual vectors, what I called the basis vectors. The numbers that multiply them are really scalars that multiply the value of the basis vectors.
There's a different set of basis vectors for every coordinate system, we have one set for our cartesian coordinates, another set for our polar coordinates.
Now on to the connection, where things gets a bit more complicated.
Suppose we have a vector near P, we want to map it into a parallel vector near Q. Because we are on a plane, there is a natural notion of parallelism, so given P and Q there is exactly one vector at Q that is parallel to P and has the same length as the vector near P.
In cartesian coordinates this process is easy. The process of mapping a vector near P to a vector near Q while keeping them parallel and their length constant involves not allowing the components of the vector to change.
In polar coordinates, this simple prescription won't work. We are trying to provide a coordinate independent description of the physics, though, so we want a coordinate independent way of writing down the prescription for transporting vectors.
What we do know that if Q is near P, we know that the desired result, what I will call the "output vector" near Q, will be some bi-linear function of the input vector (near P) and the displacement from Q to P, which (because we did things on a flat plane to keep things simple) is another vector, which we will call D.
It takes 8 numbers to write the most general possible bi-linear relation between the output vector Q, and the input vectors P and D. It's a rank 3 tensor, if you are unfamiliar with tensors you can think of it as a sort-of 3 dimensional matrix. By bilinear, we mean that the relationship between Q and P holding D fixed is linear, and the relationship between Q and D holding P fixed is linear.
It takes 8 numbers because our simplified problem is 2 dimensional. If it were 4 dimensional, it'd take 4x4x4 = 64, as you've noted previously.
Those 8 numbers are the connection coefficients in two dimensions, which are notationally give the representation ##\Gamma^{i}{}_{jk}##. While the most general possible set of connection coefficients have 8 elements, in order to preserve distances and angles as we have described, given the metric, there is only one set of connection coefficients that will work. Knowing the metric specifies the connection coefficients, though I haven't explained why. Honestly, I'd have to think quite a bit about "why", but I know that it does.
Onto the metric. Hopefully, this part is easy. The metric for the cartesian coordinates is just dx^2 + dy^2. The metric for the polar coordinates is dr^2 + r^2 d\theta^2. That's it for the metric! It seems short, but I'm not sure that more needs to be said, hopefully this was already familiar.
If you want the formula for how to compute the Christoffel symbols from the metric, wiki has a brief though perhaps hard to follow description at
http://en.wikipedia.org/wiki/Christoffel_symbols that doesn't explain how it got the answer either :-).