This is very misleading if not even wrong ;-).
In quantum theory a pure state is represented by a ray in Hilbert space, i.e., by a unit vector in Hilbert space modulo a phase factor. It is very important to keep this subtlety in mind, because it's crucial for the understanding of states that they are defined by a normalized Hilbert space vector modulo a phase. E.g., it immediately makes clear that all half-integer representations of rotations make sense, and indeed the matter surrounding is is built by particles with spin 1/2 (nucleons and electrons).
Equivalently you can define a pure state as being represented by a projection operator
\hat{P}_{\psi}=|\psi \rangle \langle \psi \rangle,
where |\psi \rangle is an arbitrary unit vector representing the ray that represents the pure state.
A pure state encodes the most complete determination of the system possible. The delicate point of quantum theory now is Born's postulate, i.e., the interpretation of the state as probalistic, i.e., when preparing a system in the most complete way in a pure state, e.g., by a filtering process (von Neumann measurement) that determines a complete set of compatible observables, fixing the state to be represented by a ray where each representant |\psi \rangle is in the common eigenspace of all the self-adjoint operators representing the measured observables.
Then these observables are determined and have the specific values, but other observables are in general indetermined. For any observable you only know probabilities to find a certain value when measuring it. Let A be this observable of interest and |a,\beta \rangle a complete set of eigenvectors of the corresponding self-adjoint operator \hat{A}. Then The probability to find the value a when measuring the observable A on a system prepared in the pure state \hat{P}_{\psi} is given by
<br />
P(a|\psi)=\sum_{\beta} \langle a,\beta|\hat{P}|a,\beta \rangle=\sum_{\beta} |\langle{a,\beta}|\psi \rangle|^2.<br />
This is Born's rule.
The relation with the wave-mechanical formulation is that
\psi(a,\beta)=\langle a,\beta|\psi \rangle
is the wave function in the A representation.
Now in many situations you do not have determined the state of the system completely, because it is simply too complex to determine a complete set of observables somehow. E.g., for a macroscopic body you have to specify a number of observables (degrees of freedom) at an order of magnitude of 10^{26} (Avogadro's number specifying the number of particles contained in 1 mole the substance, which is defined as the number of carbon atoms if you have 12 g carbon at hand).
Then you use a socalled mixed state. This is analogous to classical statistics, where you use a, e.g., a single-particle phase-space distribution function to describe a gas has a whole. In quantum theory this is done by introducing a more general state operator that is not a projection operator as for pure states. It is a positive semidefinite operator self-adjoint \hat{R} which has unit trace,
\mathrm{Tr} \hat{R}=1.
If \hat{R} is a projection operator, i.e., \hat{R}^2=\hat{R} then you have a pure state, because then you can find a complete set of orthonormalized eigenvectors |\lambda \rangle of \hat{R}, where the eigenvalues are 0 or 1, because if \lambda is an eigenvector you have
\hat{R}^2 |\lambda \rangle=\hat{R} \lambda |\lambda \rangle=\lambda^2 |\lambda \rangle.
But on the other hand we have \hat{R}^2=\hat{R}, which implies that \lambda^2=\lambda, and this equation has only the solutions 0 and 1. Now the trace is
\mathrm{Tr} \hat{R}=\sum_{\lambda} \langle \lambda|\hat{R}|\lambda \rangle = \sum_{\lambda} \lambda=1.
This means that exactly one eigen value must be 1. Let's denote this eigenvector with |\psi \rangle. Since the orthonormal set of eigenvectors of a self-adjoint operator is complete this means that
\hat{R}=\sum_{\lambda} \lambda |\lambda \rangle \langle \lambda|=|\psi \rangle \langle \psi|,
but this is precisely the projection operator of the pure state represented by the ray containing the unit vector |\psi \rangle as explained above.
To understand the more general case of a proper mixed state, i.e., a state represented by a statistical operator for which \hat{R}^2 \neq \hat{R}. We consider the following gedanken experiment. Suppose, a physicist (say Alice) prepares particles always in a given set of pure states \hat{P}_{j}=|\psi_j \rangle \langle |psi_j \rangle. She sends a lot of such particles to Bob, but doesn't tell him in which of these pure states she prepared each individual particle. The only thing she tells him is that on average she prepares a fraction P_j of particles in the state represented by \hat{P}_j. Note that the |\psi_j \rangle are normalized but not necessarily orthogonal to each other. How should Bob then describe the probability to find a value a when measuring the observable A? This is answered by Bayes's theorem from probability theory. First of all the probability to measure this value under the contraint that Alice sends him a particle in the state \hat{P}_j the probability is given according to Born's rule, as explained above:
P(a|\psi_j)=\sum_{\beta} \langle a,\beta|\psi_j \rangle \langle \psi_j | a, \beta \rangle.
Since \hat{P}_j is prepared with probability P_j, assuming that Alice doesn't hide some correlations between her preparations, i.e., if she sends the particles prepared in an independent way, then for Bob the probability to measure a is given by
P(a|R)=\sum_j P_j P(a|\psi_j)=\sum_{\beta} \langle a,\beta|\sum_j P_j |\psi_j \rangle \langle \psi_j | a,\beta.
Thus defining the statistical operator as
\hat{R}=\sum_j P_j |\psi_j \rangle \langle \psi_j |
we can write
P(a|R)=\langle a,\beta|\hat{R}|a,\beta \rangle.
Thus \hat{R} has the same formal meaning as the projection operator representing a pure state.
Further it fulfills the formal properties of a statistical operator as defined above, because P_j \geq 0 and \sum_j P_j=1. To evaluate the trace, we introduce an arbitrary complete orthonormal system |n \rangle. Then we have
\mathrm{Tr} \hat{R}=\sum_n \langle n|\hat{R}|n \rangle=\sum_{n,j} P_j \langle n|\psi_j \rangle \langle \psi_j|n \rangle.
Now we have
\sum_{n} \langle n|\psi_j \rangle \langle \psi_j|n \rangle=\sum_{n} \langle \psi_j|n \rangle\langle n|\psi_j \rangle =\langle \psi_j | \psi_j \rangle=1
and thus finally
\mathrm{Tr} \hat{R}=\sum_j P_j=1,
as it should be.
I hope that now the formalism of general "mixed states" as being described by statistical operators has become somewhat clearer. It's a quite difficult concept and needs some time to be fully understood!
Here are also physically different kinds of probabilities involved. The first kind is a specifically quantum theoretical one and refers to the probabilities for finding a specific result when measuring an observable on a system which we know to be prepared in a given pure state. Here we have the most complete knowledge about the system's state that is possible for a quantum system, but we still only know the probabilities to find a specific value when measuring the observable (except when the measured observable has a determined value due to the state preparation, i.e., if the state vector |\psi \rangle is a eigenvector of the self-adjoint operator representing the measured observable). These probabilities come into play, because according to quantum theory even when we have complete knowledge about the system's state, not all observables have determined values. According to quantum theory it's impossible to prepare a system such that all observables have determined values at once. This is the indeterministic nature of quantum theory and implies a radical change in our view on Nature compared to classical physics. This makes quantum theory to appear "weird" to many people, but it's the best theory we have about Nature, and it has been tested very stringently since it's discovery in 1925 with the result that it describes Nature to very high degree of accuracy.
The second type of probabilities comes into the game if we have a proper mixed state. These were the probabilities P_j in our gedanken experiments above. They come into play, because Bob has not the complete knowledge about the state of the systems Alice prepares for him. This kind of probabilities are also occurring in classical statistical mechanics, i.e., we describe a classical system that is in principle deterministic with probabilities, because we have incomplete knowledge about its state (i.e., about all positions and momenta of all particles making up the classical system). In quantum theory it's the incomplete knowledge about in which pure state the system is prepared.
Of course, in practice one has to find the statistical operator by other means, based on the information one really has. One very powerful method is the information theoretical approach to (quantum) statistics, as developed by Jaynes based on Shannon's information theory and the corresponding interpretation of the entropy as a measure of missing information given some probability distribution. According to this principle one has to choose the probability distribution that maximizes the entropy under the constraint of the given knowledge about the system, because then one has associated probabilities to the situation in question such as not to imply some prejudice, because in the sense of the Shannon-Jaynes entropy it's the probability distribution that maximizes the missing information under the constraints of the factual knowledge about the system. For more details of this approach, see, e.g., my manuscript on Statistical Physics on my home page:
http://fias.uni-frankfurt.de/~hees/publ/stat.pdf