This problem leads naturally to the "Lagrange multiplier" method. If we want to find the point on the surface closest to point p, we can calculate the gradient, the vector pointing in the direction of fastest increase, of the distance function. Since we want "decrease" to get closest, we would want to move in the direction opposite to the gradient. But that will point off the surface so the best we can do is move in the direction of the "projection" of the (reverse) gradient on the surface. We can do that until there is NO projection. That occurs, as you said initially, when the gradient is perpendicular to the plane. And if we write the plane in the form F(x,y,z)= constant, we know that \nabla F is normal to the surface. That is, to minimize the distance function (or its square) to a point not on the plane, the two gradients must point in the same direction- one must be a multiple of the other.
In this case, it is easier to work with distance squared, D2(x,y,z)= x^2+ y^2+ (z- 1)^2 for a any point (x,y,z) on the surface. Its gradient is 2x\vec{i}+ 2y\vec{j}+ 2(z-1)\vec{k}. We can write the surface as f(x,y,z)= xy- z= 0, a constant. Its gradient is \vec{i}+ \vec{j}- \vec{k}.
The two gradients must be parallel so one must be a multiple of the other:
2x\vec{i}+ 2y\vec{j}+ 2(z- 1)\vec{k}= \lambda(\vec{i}+ \vec{j}- \vec{k})
where \lambda, the "Lagrange multiplier" is the constant multiple.
That gives three equations: 2x= \lambda, 2y= \lambda, 2(z- 1)= -\lambda.
I find that the simplest way to eliminate \lambda, which is not relevant to the solution, and solve the equations is to divide one equation by another. For example, if we divide the first equation by the second we get 2x/2y= x/y= 1 or just y= x. If we divide the first equation by the third, we get 2x/2(z-1)= x/(z-1)= 1 so x= z- 1.
Of course, the point (x,y,z) lies on the plane so we can use the equation of the plane, xy= z together with y= x= z- 1 to find the specific values of x, y, and z.