I also don't see what this has to do with Bell's theorem. You can use any set of normalized kets ##|\psi_i \rangle## (it doesn't need to be a basis nor need the vectors be orthogonal to each other). Now
$$\hat{\rho}=\sum_i p_i |\psi_i \rangle \langle \psi_i|$$
is a statistical operator, if it's self-adjoing (##\Rightarrow p_i \in \mathbb{R}##), positive semidefinite (##\Rightarrow p_i \geq 0##) and normalized, ##\mathrm{Tr} \hat{\rho}=1##. To see what this means for the ##p_i## just take an arbitrary orthonormal basis ##|u_n \rangle## to evaluate the trace:
$$\mathrm{Tr} \hat{\rho} = \sum_n \langle u_n |\hat{\rho} u_n \rangle = \sum_i p_i \sum_n \langle \psi_i|n \rangle \langle n|\psi_i \rangle = \sum_i p_i \langle \psi_i|\psi_i \rangle=\sum_i p_i \stackrel{!}{=}1.$$
The heuristic meaning of this is most simply seen by the following gedanken experiment. Assume that Alice can prepare a particle in the pure states defined by the ##|\psi_i \rangle##. Alice now sends Bob particles prepared in these states. For each particle Alice chooses randomly with probability ##p_i## to send Bob a particle prepared in state ##|\psi_i \rangle## at each time, and that's all Bob knows about this ensemble of particles sent to him. So he'll describe the statistics of this ensemble with the above given statistical operator. This you can also formally show by using the Shannon-Jaynes principle of maximum entropy.
More intuitively you can argue with Bayes's theorem. Say Bob makes a complete measurement on each of the particles Alice sends him. E.g., he measures a complete minimal set of compatible observables ##A_k## (##k \in \{1,\ldots,N \}##. Then the probability to get the measurement result ##(a_1,\ldots,a_N)## should be given by
$$P(a_1,\ldots,a_N)=\langle u_{a_1,\ldots,a_n}|\hat{\rho} u_{a_1,\ldots,a_n} \rangle, \qquad (*)$$
where ##|u_{a_1,\ldots,a_n} \rangle## are the uniquely (up to an irrelevant phase factor) defined simultaneous eigen states of the operators ##\hat{A}_k## with eigenvalues ##(a_1,\ldots,a_k)##.
To see this just consider what Alice does. She prepares particles in the states ##|u_i \rangle## with probability ##p_i##. Supposed a specific particle sent to bob is prepared in this state. Then the probability for measuring ##(a_1,\ldots,a_N)## for Bob's measurement is
$$P(a_1,\ldots,a_n|u_i)=|\langle u_{a_1,\ldots,a_N}|u_i \rangle|^2.$$
Now the probability that Alice sends a particle prepared in state ##u_i \rangle## is ##p_i##. So due to Bayes the probability to measure ##(a_1,\ldots,a_N)## for the ensemble of particles sent by Alice is
$$P(a_1,\ldots,a_n)=\sum_i p_i P(a_1,\ldots,a_n|u_i),$$
but this is indeed given by (*).