This is simply wrong! I don't know when the textbook writers started to make these wrong statements leading to the wrong vakonomic dynamics. In all traditional textbooks the d'Alembert principle and the action principle are equivalent for nonholonomic constraints. The only restriction for the action principle is that the forces should be derivable from a generalized potential.
The correct way to implement the nonholonomic constraints, which are by assumption of the form
$$A_{ak}(q,t) \dot{q}^k+B_a(q,t)=0.$$
Here the ##k \in \{1,\ldots,n \}## labels the generalized coordinates and ##a \in \{1,\ldots,r \}## the different constraints. The Einstein summation convention is used.
In form of differentials the constraints read
$$A_{ak}(q,t) \mathrm{d} q^k + B_a(q,t) \mathrm{d} t=0.$$
For the action principle we assume that
$$L=L(q,\dot{q},t),$$
and the action
$$A[q]=\int_{t_1}^{t_2} \mathrm{d} t L(q,\dot{q},t)$$
should be stationary under variations ##\delta q##, ##\delta t=0## (the latter is the important point here!). The variations of the ##\delta q## are however not independent but constrained by the in general nonholonomic constraints. Since ##\delta t=0## these constraints read
$$A_{ak}(q,t) \delta q^k + B_a(q,t) \delta t=A_{ak}(q,t) \delta q^k=0.$$
We can vary the ##\delta q^k## arbitrarily and take the constraints on the variation into account by introducing ##r## Lagrange multipliers ##\lambda_a##, i.e., we make
$$\delta A+\int_{t_1}^{t_2} \mathrm{d} t \lambda_a A_{ak}(q,t) \delta q^k=0.$$
The variation of ##A## reads, using that by definition the initial and final points have to be kept fixed in Hamilton's principle, ##\delta q^k(t_1)=\delta q^k(t_2)=0##,
$$\delta A =\int_{t_1}^{t^2} \mathrm{d} t \left (\delta q^k \partial_{q^k} L + \delta \dot{q}^k \partial_{\dot{q}^k} L \right ) = \int_{t_1}^{t^2} \mathrm{d} t \delta q^k \left [\partial_{q^k} L - \mathrm{d}_t (\partial_{\dot{q}^k} L) \right ].$$
So the stationarity of the action functional together with the Lagrange-multiplier expression for the constraints on the variations, leads to
$$\int_{t_1}^{t_2} \mathrm{d} t \delta q^k \left [\partial_{q^k} L - \mathrm{d}_t (\partial_{\dot{q}^k} L) + \lambda_a A_{ak} \right ] \stackrel{!}{=}0.$$
This leads to the equation of motion
$$\partial_{q^k} L - \mathrm{d}_t (\partial_{\dot{q}^k}) L+\lambda_a A_{ak}=0.$$
These are the same equations of motion you also get from d'Alembert's principle. These are equations for the ##n## generalized coordinates ##q^k##'s and ##r## Lagrange multipliers ##\lambda_a##. The additional equations needed to solve for these ##(n+r)## unknowns are of course the constraint equations in the original form,
$$A_{ak} \dot{q}^k+B_a=0.$$
You can find this straight-forward derivation in many classical textbooks like Landau&Lifshitz vol. 1, Goldstein (2nd condition) and, once more discussing many different principles from d'Alembert to Maupertuis, Lagrange, Hamilton, Routh, etc., including a clear treatment of the nonholonomic constraints.