Whenever one has a sufficiently smooth vector field,Y, there is an associated "flow". One imagines dropping a leaf in a stream at the point,p, and watching it flow downstream along a curve that passes through p. This curve,c(t) is called the characteristic through p and the velocity of the leaf after a time,t, is the vector##Y_{c(t)}##. If one starts with a domain and watches the flow of all of the points in the domain, then for each time, ##t##, the domain is mapped onto another domain. One can represent this map by a function. ##ψ##, where ##ψ(p,t)## is the point along the characteristic starting at p after an elapsed time,t.
For two vector fields, ##X## and ##Y##, there are two mapping functions, ##φ## and ##ψ##. If one moves along the characteristic through p associated to ##Y## for a small time increment ,##t##, one arrives at the point, ##ψ(p,t)##. Then flowing from this point along the flow associated to ##X## for a time increment,##s## ,one arrives at the point, ##φ(ψ(p,t),s)## . This is what your book is referring to in the calculation that you posted.
If ## f ## is a function then its value at ##φ(ψ(p,t),s)## may be approximated by the first term of the Taylor series expanded from the point, ##ψ(p,t)## along the characteristic associated to ##X## through ##ψ(p,t)## . This is
##f(φ(ψ(p,t),s)) ≈ f(ψ(p,t)) + sX_{ψ(p,t)}.f ##
- Here ##X_{ψ(p,t)}.f ## is the derivative of along the characteristic through ##ψ(p,t)## that is tangent to ##X##.
Next approximate these two terms along the characteristic along ##Y## through p to get
##f(φ(ψ(p,t),s)) ≈ f(ψ(p,0)) + tY_{ψ(p,0)}.f + sX_{ψ(p,0)}.f + stY_{ψ(p,0)}.X_{ψ(p,t)}.f##.
- Here ##f## and ##ψ(p,t)## are functions of ##t## along the characteristic through ##p## that is tangent to ##Y##.
Similary, one approximates ##f(ψ(φ(p,s),t))## to get ##f(φ(p,0)) + tY_{φ(p,0)}.f + sX_{φ(p,0)}.f + stX_{φ(p,0)}.Y_{φ(p,s)}.f##. Note that ##ψ(φ(p,s),t)## and ##φ(ψ(p,t),s)## are generally different points.
The difference between the values of ##f## at these two points is approximately
##st(X_p.(Y.f) - Y_p.(X.f))## .
This is ##st[X,Y]_{p}.f##.
I think that this is what your book means when it says that the difference in the values of the coordinate functions is approximately proportional to the bracket of the two vector fields.
In general, the operation, ##f→X.Y.f## at a point on a manifold is not a tangent vector. If ##x_i## is a local coordinate system and ##Y=Σa_i∂/∂x_i,X=Σb_i∂/∂x_i## then ##X.Y.f=Σ_{i,j}b_j∂a_i/∂x_j∂f/∂x_i+b_ja_i∂^2f/∂x_j∂x_i##. So ##X.Y.f## involves second order partial derivatives.
However, in the operation,##[X.Y].f=X.Y.f−Y.X.f##,the second order terms cancel. One is left with ##Σ_{i,j}(b_j∂a_i/∂x_j−a_j∂b_i/∂x_j)∂f/∂x_i## so##[X,Y]## is the tangent vector, ##Σ_{i,j}(b_j∂a_i/∂x_j−a_j∂b_i/∂x_j)∂/∂x_i##(Note that one does not need to check if this expression transforms consistently with a change of coordinates because it is already defined intrinsically in terms of directional derivatives with respect to X and Y.)
One can also see from this formula that ##[X,Y]## varies smoothly from point to point and so is a vector field.* A well written reference for fundamentals of ordinary differential equations is Lectures on Ordinary Differential Equations by Hurewicz. Here is a link to the PDF file.
http://www.staff.science.uu.nl/~caval101/homepage/Differential_geometry_2011_files/Hurewicz.pdf )