You may want to consider what the direction derivative is doing geometrically which is finding the rate of change of some function with respect to a particular basis vector.
So usually you find derivatives with respect to the common basis vectors like x,y,z (or e1, e2, e3) but you don't have to do it this way.
You can actually find the derivative with respect to an arbitrary vector and this means you have to find the "projection" of the function with respect to that vector.
This involves what is called an inner product and it is usually denoted <v,ex> where you are projecting v to the basis vector ex.
Finding this component requires you to subtract the component perpendicular to that vector and then normalizing it (if it needs normalization). You don't typically normalize the tangent vector but you do typically normalize the vector you are projecting to.
After this it's a matter of taking the appropriate limit and you will have the definition for a grad_v(f).
There is a subtraction of each component with respect to a particular variable and what I've mentioned in this post can be put into further context.