pervect said:
And in GR, we can take advantage of the metric compatibility condition that says that if we parallel transport two (or more) vectors along a curve, the angle between the vectors doesn't change.
This is true for any metric compatible connection as it is a direct consequence of the connection being metric compatible, i.e. ##\nabla_Z^{} g = 0## for all ##Z##. In particular, if ##X## and ##Y## are parallel along a curve ##\gamma##, then
$$
\frac{d(g(X,Y))}{ds} = \nabla_{\dot\gamma} g(X,Y) = (\nabla_{\dot\gamma} g)(X,Y) + g(\nabla_{\dot\gamma} X,Y) + g(X,\nabla_{\dot\gamma} Y)
= 0 + 0 + 0,
$$
where ##\dot\gamma## is the tangent vector of the curve, the first term vanishes due to metric compatibility and the two latter due to ##X## and ##Y## being parallel transported along ##\gamma##. Hence, parallel transport using a metric compatible connection preserves the inner product between vectors. In GR we typically use the Levi-Civita connection, which apart from being metric compatible is also torsion free.
pervect said:
I believe it's possible to convince oneself by drawing some diagrams to relate the sum of the exterior angle of the triangles to the amount of rotation of a vector parallel transported around the geodesic triangle.
This is true only in two dimensions. When you are dealing with more than two dimensions you can rotate around the tangent vector of the geodesics in addition to the fixed angle relative to the geodesic, which will generally change the final rotation.
I also do not see how these statements, or the one in #4, give an answer to the question in the OP:
t_r_theta_phi said:
Is there a way to understand why these two derivations are related? In other words, is there an intuitive way to get to the commutator definition of the Riemann tensor directly from the idea of parallel transport?
I.e., the OP wants to know why the definition of the Riemann tensor in terms of the connection
$$
R(X,Y)Z = \nabla_X \nabla_Y Z - \nabla _Y \nabla_X Z - \nabla_{[X,Y]} Z
$$
is intuitively related to the change in ##Z## when parallel transported around a loop. Any explanation of that must start from the interpretation of ##\nabla_X Z## being the difference between ##Z## at a nearby point and the vector you would obtain by parallel transporting ##Z## from the original point. In addition, the definition of the Riemann tensor, and therefore also its geometrical interpretation, is completely independent of the existence of a metric as it only relates to the connection that is imposed on the manifold.