By the way the basic facts in that subject you are studying are these:
1) know the definition of the total derivative (differential) as a linear transformation.
2) know the definition of the directional derivatiove, and know that that the directional derivative can exist in every direction while the function still may not have a total derivative,
3) know that if the total derivative does exist, then the directional derivative in every direction does also, and i fact the directional derivative in the direction v is just the value of the total derivative (as linear map) evaluated at v.
4) know that if the directional derivatives exists in the axis directions (partial derivatives) and exist on a whole open neighborhood of the point, and if the partial derivatives are continuous at the point, then also the total derivative does exist there and is represented by the matrix of partial derivatives.
The moral is the total derivative is the conceptual version of the derivative and the partial derivatives are a computational tool enabling you to calculate the total derivative.
5) The chain rule says that the total derivative of a composition of two differentiable maps is the composition of their total derivatives, as linear maps, and its matrix of partials is the matrix product of the matrices of partials of the two components mappings.
6) The gradient is nothing but the derivative in the special case of a real valued function, hence is represented by a one rowed matrix of partials.
7) In case a differentiable real valued function has a local extremum when restricted to a curve or line in it domain, the derivative will equal zero in that direction. If there is a local extremum for the function itelf at the point then all partials and all directional derivatives will be zero there. and thus the gradient matrix or gradient vector, will equal zero there.
8) A continuous function defined on a closed bounded set will have a global maximum and minimum there, either on the boundary of the set, or at an interior point where the gradient is either zero or does not exist.
9) E.g. to find the max of function defined on a disc, one checks the points inside the disc where he gradient is zero, and then one restricts the function to the boundary circle, reparametrizing it by polar coordinates and again finds the points where he one variable derivative is zero.
10) Another way to approach the previous problem of examining the values on the boundary, is to find the gradient of the original function and look for points of the boundary where that graident is perpendicular to the boundary, i.e. parallel to the radius of the boundary circle.
11) More generally the explanation for the trick mentioned in the previous no.#10, is called La Grange's method. It is the trivial observation that the derivative of the restriction of the map, is the restriction of the derivative. Thus the derivative of the restriction of f to the boundary circle is the restriction to the boundary circle of the gradient. Now the gradient acts by dot product, hence its restriction to the boundary circle is zero precisely when it is perpendicular to the boundary circle. In particular the normal, or perpendicular, vector to the surface or curve defined by g=0, is the gradient of g, since the derivative of the g must be zero when restricted to a set where g is constant.
Repeat" basic fact: the gradient of g is perpendicular to the set where g is constant.
12) By the previous reasoning, the method of lagrange is often taught in the following mindless way: to maximize the restriction of a function f to the surface or curve defined by g = 0, find the points of g=0 where the gradient of f is perpendicular to the surface g=0. I.e. find the points of g=0 where the gradient of f is parallel to the gradient of g.
I.e. solve g=0, and gradg = c.gradf, for some scalar c, called the "lagrange multiplier". This gives a name, the multiplier, to the least meaningful element of the whole procedure, namely the unimportant scalar c. It is very difficult to give good example questions for this subject, and most of them are solvable without finding the scalar c.That's about all I recall at the moment about your question. good luck.