One way to think about Lagrange Multipliers is this: In order to find a maximum point of a function of several variable, pick some "starting point" at random, find the gradient vector of the function, and move in the direction it points (for minimum move in the opposite direction). Keep doing that until you get gradient equal to 0 and have no direction to follow.
If you are required to stay on a given surface, and so can't "follow" the gradient vector, take its projection onto the surface and move in that direction. You can keep doing that until there is no projection: the gradient vector is perpendicular to the surface and so is parallel to the normal vector of the surface- one must be a scalar multiple of the other.
The two vectors Kummer uses are exactly the gradient of the distance (squared) function and the normal vector of the surface (if z= f(x,y), then F(x,y,z)= z- f(x,y) = 0 gives a "level surface" of F(x,y,z) and its gradient is normal to the level surface.)