Simon Bridge said:
... it follows that a superposition can end up with a state that is not pure. ie. not all combinations of pure states are, themselves, pure states. Am I following you? Being a superposition is not a defining characteristic of a pure state.
Lets see if I understand what you are asking...
|a> and |b> are pure states of single particle systems.
A 2-particle system is set up so it could be |a,b> or |b,a> ... the 2-particle system may be in state |S> = (1/√2)|a,b> + (1/√2)|b,a> (for instance).
You want to know what it is that distinguishes |S> from |a> and |b> in terms of being a pure or a mixed state.
Is |a,b> a pure state?
That about right?
I think this becomes a bit confused now. By definition a pure state is described by an arbitrary vector in the Hilbert space (more correct is to say it's described by a vector in Hilbert space modulo a factor, i.e., a ray in Hilbert space, but that's not so important at this level of discussion). Since the Hilbert space is a vector space, any superposition of vectors is again a vector providing the description of another pure state. It's also arbitrary whether you say a state is in superosition of other states of not. You can always choose a basis (complete set of orthonormal vectors) where a given state is a member of this basis (Schmidt decomposition theorem).
Entanglement also doesn't so much refer to states but to observables, although of course both concepts are closely related. So quantum theory offers the possibility to prepare systems in pure states where certain observables are entangled. One example is the Stern-Gerlach experiment. For simplicity I refer to non-relativistic QT. You have a (electrically neutral) particle or composite particle-like system (in the original case it was a silver atom) with a magnetic moment and let it run through an inhomogeneous magnetic field. Then the particle is deflected due to the corresponding dipole force and the deflection is proportional to the spin component in direction of the magnetic field. This leads to a spatial separation for different spin states, which makes the position and spin-##z## component of the particle (assuming the magnetic field being directed in ##z##-direction) entangled.
In non-relativistic physics, the position ##\vec{x}## and the spin-##z## component ##s_z## form a complete set compatible observables, thus you can consider the position-spin basis which formally is a Kronecker product (as introduced in bhobbas posting above) of position and spin eigenstates, ##|\vec{x},\sigma_z \rangle##. Now from an arbitrary not too badly localized state (think of a Gaussian wave packet as an example) after running through a Stern-Gerlach apparatus you end up in a state like
$$|\Psi \rangle=\sum_{i=-s}^s c_i |\phi_{\vec{x}_i},\sigma_z=i \rangle.$$
here ##|\phi_{\vec{x}_i} \rangle## are the position part of the wave function, peaked around a position ##\vec{x}_i##, where a particle most probably ends up after running through the Stern-Gerlach apparatus and having a spin-##z## component with ##\sigma_z=i##. This is entanglement: If the wave packets are not too much overlapping (in practice you can build SG apparati that lead to practically non-overlapping partial beams) you have a one-to-one correspondence between the position of the particle and the value of its spin-##z## component, i.e., you have entanglement between position and spin-##z## component.
It becomes even more fascinating if you have composite system, of which far distant parts of the entire system can have entangled observables. The mostly used examples are polarization entangled two-photon states, because it's nowadays pretty easy to provide such states by a process called parametric downconversion, where you shine with a laser into a birefringent crystal and get out two photons with less energy than the laser frequency that are entangled with respect to their polarization. You can, e.g., have the polarization states of the two photons in the state
$$|\Psi \rangle=\frac{1}{\sqrt{2}}( |H V \rangle-|VH \rangle).$$
You can determine in which state each of the single photons is. For entangled states that's usually a mixed state, and it is defined by a process called "partial tracing". The pure two-photon state can equivalently described by the statistical operator ##\hat{\rho}_{\Psi}=|\Psi \rangle \langle \Psi |##, i.e., a statistical operator describes a pure state if and only if it's a projection operator. The partial tracing leads to
$$\hat{\rho}_A=\frac{1}{2} (|H \rangle \langle H|+|V \rangle \langle V|)=\frac{1}{2} \hat{1},$$
and the same for the other photon, labeled with ##B##, which means that each single photon of the pair is completely unpolarized. However the total two-photon state tells us that the polarizations of the photons are nevertheless strictly correlated, i.e., if Alice finds for her photon that it is ##H## polarized, then Bob's photon must be ##V## polarized and vice versa. The entanglement persists over long times, if nothing during there propagation disturbes them, and this means the detection of the photons may take place at arbitrary far distances, and still although the photons themselves are strictly unpolarized the polarization of the two photons in the pair are strictly correlated due to the entanglement.