There are two different meanings of "gradient" whose differences are glossed over when you're dealing with cartesian coordinates, but which need to be kept distinct when you're dealing with general coordinates and coordinate changes.
Most fundamentally,
- Let ##\phi(\vec{r})## be a scalar field---a function that returns a scalar (a real or complex number) for each point in space.
- Let ##\vec{r}(t)## be a path through space as a function of a real-valued parameter ##t## (this doesn't have to be time, it could be any quantity that increases continuously along the path).
- Let ##\overrightarrow{V}(t)## be the corresponding "velocity vector", ##\overrightarrow{V} = \frac{d \vec{r}}{dt}##.
Then we can define the "directional derivative" of ##\phi## along vector ##\overrightarrow{V}## to be the rate of change of ##\phi## along the path ##\vec{r}(t)##:
Directional derivative: ##\nabla_{\overrightarrow{V}}(\phi) \equiv \frac{d}{dt} \phi(\vec{r}(t))##
In terms of the directional derivative, we can define two different mathematical objects that might be called the "gradient" of ##\phi##:
Covector gradient: Define ##(\nabla \phi)## to be an operator that takes a vector ##\overrightarrow{V}## and returns the directional derivative ##\nabla_{\overrightarrow{V}}(\phi)##
Vector gradient: Define ##\overrightarrow{\nabla \phi}## to be a vector such that ##\overrightarrow{\nabla \phi} \cdot \vec{V} = \nabla_{\overrightarrow{V}}(\phi)##
The components of the covector gradient are given by: ##(\nabla \phi)_j = \frac{\partial \phi}{\partial x^j}##. The components of the vector gradient are given by: ##(\overrightarrow{\nabla \phi})^j = \sum_{i} g^{ij} \frac{\partial \phi}{\partial x^i} ##, where ##g^{ij}## is the metric tensor. In Cartesian components, ##g^{ij} = 0## unless ##i=j##, and ##g^{jj} = 1##. So there's not much difference between the covector gradient and the vector gradient. However, under a coordinate change, the components of the covector gradient transform differently than the components of the vector gradient.