Visually, if you look at a map of the level curves of f in the x,y plane, you can see that the direction of zero increase (to first order), is tangent to the level curve, i.e. the direction in which the function remains constant to 1st order. hence to obtain zero as the rate of increase in that direction, you must dot with a vector perpendicular to the level curve, hence the gradient points either towards the greatest or least rate of increase. Since one obtains a positive result from dotting a vector with itself, it must be perpendicular to the level curve and point towards the direction in which the increase is (most) positive, as Delta2 says.
The thing that is confusing to me is the fact that I tend to think of the gradient as defined by the coordinates (∂f/∂x, ∂f,∂y), as you emphasized in your original post. These coordinates actually have no intrinsic meaning at all. It is just a fact that if we want to know a vector, it suffices to know its projections onto any two independent axes. To understand the gradient, we need to use its definition as giving a linear approximation, as Delta2 focused on. This makes sense as soon as one has a good notion of length in the space, even before choosing coordinates. Then one introduces coordinates simply to make computations.
as another argument that partials don't necessarily tell you that much about the rate of change of the function, recall that both partials can exist even for a function that does not even have a gradient! i.e. the gradient is a vector that has the approximation property alluded to by Delta_2. So even if the partials exist, the vector they define may not have that property.
i.e. the graph of a function f(x,y) of 2 variables is a surface in 3-space. given a point on that surface, and a direction in the x,y plane, we say the directional derivative exists in that direction iff the curve we get by cutting the surface along that direction is a smooth curve with a tangent line. But even if this is true in all directions, there is no guarantee that all these tangent lines lie in the same plane. if not, there is no tangent plane, and no gradient. if there is, the gradient of f is obtained by projecting the gradient of the function f(x,y)-z, which is perpendicular to the graph of f, into the x,y plane.