My understanding was that
\partial^\mu \phi = g^{\mu\nu}\partial_\nu \phi
is used to define the contravariant derivative \nabla^{\mu} (\nabla rather than \partial since we're in GR, though as you've said it reduces to \partial in the case of a scalar function).
Given a differentiable manifold, we can consider one-form fields induced upon the manifold by scalar fields: given a scalar field \phi, we get a one-form field with components \partial_\mu \phi. We can also consider vector fields induced by curves through the manifold: given a curve \gamma: \lambda \in \Re \rightarrow p\in M, we get a vector field with components \frac{\partial x^\mu}{\partial \lambda}.
From this starting point, tensor products can be used to build tensor fields of higher valences (so in addition to the (0,1) fields - the one-forms - and the (1,0) fields - the vectors - we can get tensor fields of valence (m,n)). We then select some particular (0,2) tensor field, and decide that this shall be our metric g. g will take two vectors as arguments, and deliver a scalar. That is,
g(X,Y) = \chi
or if you prefer,
g_{\mu\nu} X^\mu Y^\nu = \chi
But that means that g_{\mu\nu}X^\mu has, in effect, an empty argument place which could be filled by a vector; i.e. it is something which will map vectors to scalars - in other words, a one-form. So the notation X_\nu is introduced as shorthand for g_{\mu\nu}X^\mu.
The same trick, using the inverse of the metric (i.e. the (2,0) tensor field such that g_{\mu\nu}g^{\nu\rho} = \delta^{\rho}_{\mu}) will allow you to link any one-form (components p_{\mu}) with a particular vector (components p^\mu). In particular, the one-form field with components \partial_\mu \phi has an associated vector field \partial^\mu \phi, defined by
\partial^\mu \phi = g^{\mu\nu}\partial_\nu \phi