Good questions!
1) Here's a way to think about it: remember that a function is basically a rule that transforms one number (x) into another number (y). When you minimize a function, what you get is a numeric value of x. The action, however is what we call a functional - it's a rule that transforms an entire function (x(t)) into a number (S). So when you minimize a functional, you get not just a number, but an entire function that you can plug back into the functional. The principle of least action states that you need to find the function x(t) that makes the action a minimum, which is a lot more involved than just finding one number.
2) The calculus of variations is just the mathematical technique that allows you to solve this problem. Think about the following way of doing a regular minimization problem:
You know that if a regular function has a minimum, at that minimum point it momentarily has a slope of zero. What this means is that if you move a teeny tiny bit away from the minimum, you should be able to ignore the change in the value of the function. Mathematically, at a minimum, the following condition should be true:
f(x + \epsilon) = f(x) + \mathcal{O}(\epsilon^2)
If you compare this to the first order Taylor expansion of f,
f(x + \epsilon) = f(x) + \frac{d f}{d x} \epsilon + \mathcal{O}(\epsilon^2)
you'll see that the condition is equivalent to setting df/dx equal to zero, which is good because if it didn't, you'd know that this method doesn't work ;-)
Now imagine generalizing this to minimize a functional, like the action. As I said above, a functional is just a rule that takes a function as input and produces a number. So let's make these changes: f becomes the functional S, x the number becomes x(t) the function, and logically the slight change \epsilon must also become a function, let's say \delta x(t).
S[x(t) + \delta x(t)] = S[x(t)] + \mathcal{O}(\delta x(t)^2) (**)
We need to impose the condition that \delta x(t) is zero at the endpoints of the integral, because those are fixed by the problem or physical situation - you're told "a particle starts at x=(0,0,0) at t=0" or some such thing.
Now what to do? We need to figure out a way to apply Taylor expansion to a functional. The functional, of course, can be written
S[x(t) + \delta x(t)] = \int_{t_0}^{t_1} L(x + \delta x(t), \dot{x} + \delta\dot{x}(t), t) \mathrm{d}t
Here's a trick you can use: rewrite the tiny change \delta x(t) as a tiny number times a normal finite function \epsilon \delta x(t).
S[x(t) + \epsilon\delta x(t)] = \int_{t_0}^{t_1} L(x + \epsilon\delta x(t), \dot{x} + \epsilon\delta\dot{x}(t), t) \mathrm{d}t
This way, the tiny parameter will be a plain old number and you can use regular Taylor expansion on the function L.
S[x(t) + \epsilon\delta x(t)] = \int_{t_0}^{t_1} \left(L(x, \dot{x}, t) + \epsilon \delta x(t) \frac{\partial L}{\partial x} + \epsilon \delta \dot{x}(t) \frac{\partial L}{\partial \dot{x}} + \mathcal{O}(\epsilon^2)\right) \mathrm{d}t
Now if you integrate the third term in the integral by parts, you get
S[x(t) + \epsilon\delta x(t)] = \left.\epsilon \delta x(t) \frac{\partial L}{\partial x}\right|_{t_0}^{t_1} + \int_{t_0}^{t_1} \left(L(x, \dot{x}, t) + \epsilon \delta x(t) \frac{\partial L}{\partial x} - \epsilon \delta x(t) \frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial L}{\partial \dot{x}} + \mathcal{O}(\epsilon^2)\right) \mathrm{d}t
This is where the condition that \delta x(t) = 0 at the endpoints comes in handy: that boundary term that appears in front of the integral is just equal to zero.
Anyway, now compare this to the original condition (**) that I said was necessary for the functional to be extremized. You'll notice that the only difference is the two terms
\epsilon \delta x(t) \frac{\partial L}{\partial x} - \epsilon \delta x(t) \frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial L}{\partial \dot{x}}
So just like with ordinary minimization, if we set this equal to zero, we'll have the condition on x(t) that needs to be fulfilled for the functional to be minimized. That condition is, of course, the Euler-Lagrange equation
\frac{\partial L}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial L}{\partial \dot{x}} = 0
The reason we call this procedure "calculus of variations" is because it uses a "variation" of the function x(t). \delta x(t) is the variation. By the same token, you could call regular old minimization an example of the "calculus of differentials" (or "differential calculus" - sound familiar?) because it uses things like dx and dy, which are called differentials.