A more precise way of thinking about it is this: if f(x,y) is a function of two variables, the its "derivative" is NOT just the partial derivatives, \partial f/\partial x and \partial f/\partial y (it is possible for those partial derivatives to exist at a point while f itself is not even continuous, much less differentiable), but the gradient \nabla f= (\partial f/\partial x)\vec{i}+ (\partial f/\partial y)\vec{j}. (Even more precisely, the derivative is the linear transformation, at each point, given by the dot product (\partial f/\partial x)\vec{i}+ (\partial f/\partial y)\vec{j})\cdot ((x- x_0)\vec{i}+ (y- y_0)\vec{j}) but that is "represented" by the gradient vector.)
In that sense the second derivative is given by the linear transformation "represented", at each point, by the matrix
\begin{bmatrix}\frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x\partial y} \\ \frac{\partial^2 f}{\partial y\partial x} & \frac{\partial^2 f}{\partial y^2}\end{bmatrix}
Because, as long as f has continuous derivatives, the "mixed" second derivatives are the same, that is a symmetric matrix and so has real eigenvalues and two independent eigenvectors. If we were to use the directions of those eigenvectors as coordinate lines, x' and y', the matrix representing the second derivative would be "diagonal":
\begin{bmatrix}\frac{\partial^2 f}{\partial x'^2} & 0 \\ 0 &\frac{\partial^2 f}{\partial y^2}\end{bmatrix}
where those two derivatives (evaluated at the given point) are the "eigenvalues" of the original second derivative matrix.
Now, at a point where the first derivatives are 0 (a critical point) and the "mixed" second derivatives are 0, as in the x', y' coordinate system, we can write f(x)= f(x_0, y_0)+ f_{xx}(x_0,y_0)(x- x_0)^2+ f_{yy}(x_0, y_0)(y- y_0)^2 to second degree. And it is easy to see from this that:
1) if f_{xx}(x_0, y_0)= a and f_{yy}(x_0, y_0)= b are both positive, we have f(x, y)= f(x_0, y_0)+ a(x- x_0)^2+ (y- y_0)^2 so that (x_0, y_0) is a local "minimum".
2) if f_{xx}(x_0, y_0)= -a and f_{yy}(x_0, y_0)= -b are both negative, we have f(x, y)= f(x_0, y_0)- a(x- x_0)^2- b(y- y_0)^2 so that (x_0, y_0) is a local "maximum".
3) if f_{xx}(x_0, y_0)= a and f_{yy}(x_0, y_0)= -b are one positive and the other negative, we have f(x, y)= f(x_0, y_0)+ a(x- x_0)^2- b(y- y_0)^2 so that (x_0, y_0) is a local "saddle point".
So the question is about the eigenvalues of that two by two matrices. If both are positive, the point is a local minimum, if both are negative, a local maximum, and if they are of different sign, a saddle point (of course, just like in the one variable situation, if either is 0, this does not tell us). Further the determinant is independent of the coordinate system- the two determinants:
\left|\begin{array}{cc}\frac{\partial^2 f}{\partial x'^2} & 0 \\ 0 & \frac{\partial^2 f}{\partial y'^2}\end{array}\right|= f_{x'x'}f_{y'y'}
\left|\begin{array}{cc}\frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x\partial y} \\ \frac{\partial^2 f}{\partial y\partial x} & \frac{\partial^2 f}{\partial y^2}\end{array}\right|= f_{xx}f_{xy}- (f_{xy})^2
are the same- both eigenvalues, and so second derivatives, are the same sign and so we have either a minimum or a maximum, if and only if f_{xx}f_{yy}- (f_{xy})^2> 0 and the second derivatives are of different sign, and so we have a saddle point, if and only if f_{xx}f_{yy}- (f_{xy})^2< 0.
This also shows why we do not have a similar formula for three or more variables- all the analysis, to the diagonal matrix goes through but the determinant is a product of three or more numbers and its sign does not tell us about the sign of the individual eigenvalues. If, in the three variable case, the product is positive, it might be that all three eigenvalues are positive or that one is positive and the other two negative.