The problem is that the literature often uses [itex]c_{\rm{s}}^2[/itex] to mean two different things, sometimes simultaneously. Looking at things from a thermodynamic perspective, one can write [itex]P=P(\rho,S)[/itex], and then perturb to give
[tex]\delta P=\frac{\partial P}{\partial\rho}\delta \rho +\tau \delta S[/tex]
where [itex]\frac{\partial P}{\partial\rho}[/itex] is then identified as the adiabatic sound speed i.e. the speed with which perturbations travel through the background.
Now, for a scalar field we can parametrise as [itex]P=P(X,\phi)[/itex]. Then, the adiabatic sound speed can be written as
[tex]c_{\rm{s}}^2=\frac{\partial P}{\partial\rho}=\frac{\partial_X P +\partial_\phi P}{\partial_X\rho+\partial_\phi\rho}[/tex]. By writing things like this, it should be apparent that this is not the same as the first expression you quote. It turns out that, for a scalar field, the speed of propagation is not the adiabatic sound speed, but in fact a different speed (say, the "effective sound speed"), which is defined as
[tex]\tilde{c_{\rm{s}}}^2=\frac{\partial_X P}{\partial_X\rho}[/tex]. If you like, you can show this by calculating the KleinGordon equation for the perturbation of the field and looking at the term in front of the spatial derivative.
