The standard interpretation says that in all generality the state is described by a statistical operator (I'd not call it density matrix, which is only the special case that you work in the position representation).
The statistical operator has the following properties
Mathematical
----------------------
(a) it's a self-adjoint positive semidefinite operator
(b) its trace is 1
In addition there are self-adjoint operators representing observables. The observables can take values given by the eigenvalues of the operator representing this observable.
A self-adjoint operator also defines a complete set of eigenvectors (in the somewhat sloppy sense of physicists, also including "generalized eigenvectors" if there are continuous eigenvalues like for position and momentum operators).
Physics
-----------
A state, represented by a statistical operator, describes how a system is initially prepared, before you make a measurement.
Then measuring an observable ##A##, represented by a self-adjoint operator ##\hat{A}## with eigen values ##a## with orthonormalized eigenstates ##|a,\alpha \rangle##, where ##\alpha## is some parameter(s) labeling the different eigenvectors to the eigenvalue ##a##.
Then the meaning of the statistical operator (and according to the minimal statistical interpretation, which is a flavor of the family of Copenhagen interpretations) is that the probability to get the value ##a_k## when measuring the observalbe ##A## on a system prepared in the state described by the statistical operator ##\hat{\rho}## is
$$P(a|\hat{\rho})=\sum_{\alpha} \langle a,\alpha|\hat{\rho}|a,\alpha \rangle.$$
Now ##\hat{\rho}## as a self-adjoint operator has itself a complete set of orthonormalized eigenvectors ##|p_k \rangle##.
Because ##\hat{\rho}## is positive semidefinite, you have
$$p_k=\langle p_k|\hat{\rho}|p_k \rangle \geq 0.$$
and we have defined that
$$\mathrm{Tr} \rho = \sum_k \langle a_k|\hat{\rho}|a_k\rangle = \sum_k p_k=1.$$
There's one "extreme case", namely that exactly one eigenvalue of ##\hat{\rho}## is 1 and all others then must necessarily vanish. So say ##p_1=1## and ##p_k=0## for ##k \in \{2,3,\ldots \}##.
Due to the completeness of the eigenvectors we then have
$$\hat{\rho}=\sum_k p_k |p_k \rangle \langle p_k|=|p_1 \rangle \langle p_1|.$$
In this case the probability for finding the value ##a## when measuring the observable ##A## is
$$P(a|\hat{\rho})=\sum_{\alpha} \langle a,\alpha|p_1 \rangle \langle p_1|a,\alpha \rangle=\sum_{\alpha} |\langle a,\alpha|p_1 \rangle|^2.$$
Now ##\psi(a,\alpha):=\langle a,\alpha|p_1 \rangle## is nothing else than the wave function in the ##A## representation, and the general formalism with the statistical operator gets you also the special case when you describe the state in terms of a wave function.
It should be clear that the pure states are very special. They are the most detailed kind of states you can have about the quantum system, i.e., it provides the most complete description of the state the system is in. One thus also says this special case provides a "pure state".
Now in this threads also some more advanced things have been stated. One is entropy. That's a very important concept. You may have heard about entropy as a thermodynamical quantity appearing in the 1st and 2nd fundamental law of thermodynamics. The concept is, however, much more general, and the thermodynamical entropy is a special application to equilibrium states.
The entropy is rather a measure for the missing information relative to complete knowledge. This measure of information is by definition a positive quantity, and it was discovered in this general sense by C. Shannon in his analysis of communication via noisy channels. The concept then has been also applied in physics, particularly in quantum theory. Nowadays it's even a quite new and important field of research, called quantum information theory. The upshot is that the measure for the missing information (the entropy), given you know the system is prepared in the state ##\hat{\rho}## is given by
$$S=-\mathrm{Tr}(\hat{\rho} \ln \hat \rho)=-\sum_{k} p_k \ln p_k.$$
If there are values ##p_k=0## in the sum it has to be understood as
$$\lim_{x \rightarrow +0} x \ln x=0.$$
Now if you have a pure state, only one ##p_k=1##. All the others are 0. Since also ##1 \ln 1=0##, this entropy (named after its discoverers the Shannon-von Neumann-Jaynes entropy) this means that the entropy vanishes if and only if the system is prepared in a pure state.
That makes sense: Since entropy is non-negative by definition (note that ##0 \leq p_k \leq 1## and thus ##\ln p_k \leq 0##) it's lowest value is 0, and a statistical operator that has 0 entropy means that it provides complete knowledge in the sense of this entropy measure for missing information. This shows that the Shannon-von Neumann-Jaynes definition of entropy makes sense for QT: According to QT the most complete information you can have about a system is that it is prepared in a pure state, and that's precisely the states for which entropy vanishes.
The information theoretical point of view is also an important way to guess the best statistical operator given some (usually incomplete) information. The best choice would be from a scientific point of view, to choose the statistical description of the situation reflecting the least possible prejudice, i.e., you want to describe the situation based on the (maybe incomplete) information, i.e., you want to choose the very statistical operator that maximizes the entropy under the constraint of which information you have. This describes the statistics such that the probabilities (encoded in the so chosen "maximum-entropy stat. op.") do not falsely mimik more knowledge than you have since the maximum of the entropy ensures that it is the state of maximum missing information under the constraint of what's really known.
@DarMM made an important remark in #2, and that's at the heart of the profound difference between the description of nature comparing this description by quantum vs. classical physics:
In classical physics you use statistical methods only, because you can't have the complete information about the system for some reason. Most profoundly that's clear for macroscopic systems. Within classical mechanics in principle everything is determined: If you know the precise positions and momenta of all particles making up the macroscopic system, you know everything completely. That's the "true state" the system is in according to classical mechanics, but of course in practice it's not possible to accurately now all precise values of the position and momentum of ##10^{24}## particles. Thus you choose some important "macroscopic observables" like the center of mass of the body and its total momentum and maybe some more quantities of this kind, depending on the problem you want to solve. In statistical physics you describe this e.g., by a single-particle phase-space distribution function, giving the phase-space density for finding a particle with momentum ##\vec{p}## at the position ##\vec{x}##. This ignores all kinds of correlations between particles. For that you'd need two-particle, three-particle phase space distribution functions etc. No matter, how detailed you get, you can in practice never know the complete state of the macroscopic body, but it always is in a definite state, i.e., all the possible observables (which are functions of the phase-space variables) take a determined value. In other words the true state is just one point in phase space, but you can't know this because of the complexity of the macroscopic body and thus you use statistics to describe the really important ("relevant") macroscopic observables as statistical averages over the microscopic degrees of freedom.
Now in quantum theory there are situations, where you can have a system in a pure state (providing complete information about it in the quantum sense!) but parts of it are not in a pure state. That's the case if you have entanglement. Thus there are well defined systems like an electron being entangled in both spin and momentum with a positron coming from a decay of a neutral pion at rest; due to the conservation laws fulfilled in electromagnetic transitions the total ##e^+ e^-## system must be in the spin-singlet state and due to momentum conservation, their momenta are back-to-back since the total momentum must stay 0. One can now ask, what's the state of the electron alone, just ignoring the positron. As it turns out the spin-z component of the electron is in its maximum-entropy state, which is simply ##\hat{\rho}_{\text{spin of electron}}=\hat{1}/2##, i.e., it is totally unpolarized. Though the spin-z component of the complete system consisting of electron and the positron is completely determined to be 0, the spin state of the electron is as incomplete as it can get.
By the way: The entropy of the spin state in this case is
$$S=-1/2 \ln(1/2)-1/2 \ln(1/2)=-\ln(1/2)=\ln 2.$$
Thus the "information content" (here completely missing) of a "qu-bit" (i.e., a two-level system) is ##\ln 2##.