I stumble already over slide 4 on the first cited sources:
The general ket in ##\mathcal{H}_A \otimes \mathcal{H}_B## is
$$|Psi \rangle = \sum_{ij} c_{ij} |u_i \rangle \otimes |v_j \rangle,$$
where the ##|u_i \rangle## and ##|v_j \rangle## are CONSs of ##\mathcal{H}_A## and ##\mathcal{H}_B##. The notation on these slides don't make sense, as becomes clear if the dimension of the two Hilbert spaces are not the same. One example is my beloved Stern-Gerlach experiment, where ##\mathcal{H}_A## is the infinite dimensional (separable) Hilbert space realizing the Heisenberg algebra of position and momentum operators and ##\mathcal{H}_B## the ##(2s+1)##-dimensional Hilbert space describing spin for particles with spin ##s \in \{1/2,1,\ldots \}##.
If we stick to the definition that a pure state is non-entangled (I prefer to say it's separable) iff it can be written in terms of a product state
$$|\Psi \rangle = |\psi \rangle \otimes |\phi \rangle.$$
Then the decomposition in the arbitrary product basis reads
$$|\Psi \rangle = \sum_{ij} \psi_i \phi_j |u_i \rangle \otimes v_j \rangle.$$
That means in this sense of entanglement, a state is separable iff
$$c_{ij}=\psi_i \phi_j,$$
i.e., iff the coefficients are products of complex numbers depending only on ##i## and ##j## respectively.
A basis-independent characterization then should be: A pure state is separable iff the reduced state of either subsystem is itself a pure state again. From now on I write ##|u_i,v_j \rangle## for ##|u_i \rangle \otimes |v_j \rangle##. Then indeed one has
$$\hat{\rho}_A=\mathrm{Tr}_B \hat{\rho}=\sum_{i,j,k} \langle u_i,v_j|\hat{\rho}|u_k,v_j \rangle |u_i \rangle \langle u_k|.$$
For the above product state we indeed have
$$\hat{\rho}_A=\sum_{i,j,k} \langle u_i,v_j|\psi,\phi \rangle \langle \psi,\phi|u_k,v_j \rangle |u_i \rangle \langle u_k| = \sum_{i,j,k} \psi_i \psi_k^{*} |\phi_j|^2 |u_i \rangle \langle u_k|=|\phi \rangle \langle \phi|.$$
The other direction is proven in Ballentine Quantum Mechanics: If the partial trace of a pure state of a composite system is pure then necessarily this state is separable.
In other words: within this definition a pure state is entangled if its partial trace (on either of the subsystems) is mixed.
The paper by Sasaki et al seems to use my notion of entanglement, while their reference 3 (the RMP by Horodeki##^4##) takes the more general definition discussed above. I still think it's more clear to state entanglement with respect to a concrete measurement on subsystems. On the other hand the more general definition is easier to state (at least for pure states).