[f(g(x)]' = f'[g(x)].g'(x)
Apart from formalities. On a piece of paper, draw rectangular axes. Right top quadrant is going to represent g (example of positive x and positive g(x) ). Sketch a smooth function in that quadrant. Horizontal axis is x, vertical is g(x).
Now turn paper 90° clockwise. The g axis is still the g axis but now it is horizontal. Draw another smooth curve in the now top quadrant – that represents f. The now vertical axis (think of it as a different piece of paper) is the f of the corresponding points of the now horizontal axis. I.e. is f(g).
So, turning the paper back to its original orientation, starting from a point x, if you draw a vertical line to the first curve, and from the meeting point a horizontal line to the second curve, you get to f[g(x)] on the second curve (turning the paper again). Take a point just above x and do the same thing again. You get a thin strip of width dx (think as) and then another thin strip of width g’(x).dx and rotating the paper again you should be able to see that it corresponds to an increase of f (i.e. df or in full d[f(g(x))]) of [f(g(x)]' = f'[g(x)].g'(x).dx
It should be more apparent when you do it than these words sound.
I ought to do a fig. maybe I will. My point is that this ought to be obvious, if it is just a thing you only learn and can demonstrate by a formal procedure it is not understood in my opinion.