I guess the gist of it is that when you eliminate variables using constraint equations, you are incorporating information about your system into the problem. When extremizing the result, the information is utilized, consequently, during differentiation. i.e. You no longer regard certain variables as having independent variations with respect to the varying parameter, rather, you acknowledge the constraint, substitute in for the variable in question and observe the differential changes in the substituted expression.
A physical example of this would be to take the Lagrangian of a 2d pendulum system, written in Cartesian (not polar!) coordinates and substitute length = x^2 + y^2 for x in the Lagrangian. We know that this system has 1 degree of freedom and, now, the Lagrangian is expressed using one coordinate, y. When we look at variations of L with respect to some parameter, we will also be looking at variations in the expression sqrt(length - y^2), wherever x had appeared. Thus we are asserting the truth of both the constraint equation and L when we extremize.
The method of Lagrange multipliers is a little different, procedurally and conceptually, yet with similar results. Returning to the 2D pendulum, this method suggests that we set the variational derivative of L - \lambda * (x^2 + y^2) to 0. The reason for this is often explained geometrically, where, at a stationary point, the gradient of the two terms should be parallel (hence the lambda and the 0).
A good way to get a sense for these equations is to break them, in a controlled way. So, for example, what happens when you don't include any constraint information into the 2D Lagrangian? You get a constant momentum in the x-direction and the point mass either hits the V=0 level with a thud (if we asserted there is a ground) or oscillates about it in this odd mathematical construction. It is as if we snipped the string connecting the point mass to the pendulum pivot point. The first point to make is that we were successful in extremizing the Lagrangian without the constraint equation. We, in return, were given the EOMs (because these are the extremums) of a system that had no constraints. The second point is that this result is not consistent with the pendulum system we had in mind when we began, as it should be, for there was no way our Lagrangian could acquire the knowledge that "there is a string."