Given certain function f(x), a standard way to minimize it is to set its derivative to zero, and solve for x. However, in certain cases the method of gradient descent is used; compared to the previous method (call it 'method I')that simply sets the derivative to zero and solves for x, the gradient descent takes multiple steps.

Why could not one use only the 'method I' for minimization? Could you give an example illustrating the difficulty of applying 'mehtod I'?

# Use of a derivative or a gradient to minimize a function

