Happiness said:
The reason given for why we can't use Lagrange multiplier in non-holonomic constraint is that the varied paths do not satisfy the equation of constraint. I do not understand this reason. I would think that only the actual path needs to satisfy the equation of constraint and the varied path can be any path as long as it has the same end points.
Of course, you use Lagrange multipliers, but not as part of the Lagrangian but as part of the variation of the action. So let's do it again. Suppose we have given our problem in terms of generalized coordinates ##(q^k)##, ##k \in \{1,\ldots,f \}## and suppose we have only proper nonholonomic constraints. This means we have ##r<f## constraints on the virtual displacements,
$$\delta q^k f_k^{(\alpha)}(t,q)=0, \quad \alpha \in \{1,2,\ldots,r \}$$
which cannot be integrated to holonomic constraints. In other words we suppose we have already solved for all holonomic constraints in choosing our ##f## generalized coordinates. This means the functions ##f_k^{(\alpha)}## are such that
$$\partial_j f_k^{(\alpha)}-\partial_k f_j^{(\alpha)} \neq 0.$$
In this case we cannot vary the ##q^k## independently in Hamilton's principle of stationary action, but we have to fulfill the constraints. This can be done by introducing ##r## Lagrange multipliers ##\lambda_{\alpha}## and vary the ##q^k## and the ##\lambda^{\alpha}## independently. The variation of the extended action then reads
$$\delta S=\int \mathrm{d} t \delta q^k \left (\frac{\partial L}{\partial q^k}-\frac{\mathrm{d}}{\mathrm{d} t} \frac{\partial L}{\partial \dot{q}^k}+\lambda_{\alpha} f_k^{(\alpha)} \right) \stackrel{!}{=} 0.$$
The equations of motion thus read
$$\frac{\partial L}{\partial q^k}-\frac{\mathrm{d}}{\mathrm{d} t} \frac{\partial L}{\partial \dot{q}^k}+\lambda_{\alpha} f_k^{(\alpha)} =0, \quad f_k^{(\alpha)} \dot{q}^k=0.$$
Note that Einstein's summation convention is understood here, for both the summation over ##k \in \{1,\ldots,f \}## and ##\alpha \in \{1,\ldots,r \}##. Now you have ##(k+r)## functions ##q^k(t)## and ##\lambda_\alpha(t)## to solve for and also ##k## Lagrange Equations of the 1st kind and the ##r## constraint equations.