It might be interesting to pinpoint precisely, where the mathematical formalism of quantum mechanics differs from the formalism of classical probability theory in order to understand what parts have a clear interpretation and where the interpretational difficulty arises. Let's first recall what the basic settings of these two formalisms are:
- Classical probability theory:
- The state space is a measurable space ##(\Lambda,\Sigma)## consisting of a set ##\Lambda##, a sigma-algebra ##\Sigma \subseteq \mathcal P(\Lambda)## of subsets of ##\Lambda##, equipped with a probability measure ##P:\Sigma\rightarrow\mathbb R## with ##P(\Lambda)=1##.
- Observables are given by random variables ##O:\Lambda\rightarrow\mathbb R##.
- The probability to find the value of ##O## to lie in the Borel set ##B## is given by ##P_O(B) = \int_{O^{-1}(B)} \mathrm d P##.
- The expectation value of ##O## is given by ##\left<O\right> = \int_\Lambda O\, \mathrm d P##.
- Quantum theory
- The state space is a complex Hilbert space ##\mathcal H## and it is equipped with a quantum state ##\rho##, which is given by a self-adjoint trace-class operator with ##\mathrm{Tr}(\rho)=1##.
- Observables are given by densely defined self-adjoint operators ##O:\mathcal D\rightarrow \mathcal H## (##\mathcal D\subseteq\mathcal H##, ##O^\dagger=O##). They have an associated projection-valued measure ##\pi_O##.
- The probability to find the value of ##O## to lie in the Borel set ##B## is given by ##P_O(B) = \mathrm{Tr}(\rho \pi_O(B))##.
- The expectation value of ##O## is given by ##\left<O\right> = \mathrm{Tr}(\rho O)##.
At first, these two formalisms look completely different, but we can find a common basis. We will construct a probability space for each (classical or quantum) observable in such a way that the full list of these probability spaces contains the same information as the big (classical or quantum) state space we started with.
- Classical probability theory: Given a random variable ##O##, we construct the probability space ##(\Lambda_O,\Sigma_O,P_O)## by setting ##\Lambda_O = O(\Lambda)##, ##\Sigma_O=\mathcal B(\Lambda_O)## (the Borel sigma algebra) and ##P_O : \Sigma_O\rightarrow\mathbb R##, ##P_O(B) = P(O^{-1}(B))##.
- Quantum theory: Given a self-adjoint operator ##O##, we construct the probability space ##(\Lambda_O,\Sigma_O,P_O)## by setting ##\Lambda_O = \mathrm{spec}(O)##, ##\Sigma_O = \mathcal B(\Lambda_O)## and ##P_O : \Sigma_O\rightarrow\mathbb R##, ##P_O(B) = \mathrm{Tr}(\rho \pi_O(B))##.
- The observables ##O## are represented by the identity function ##\mathrm{id}_{\Lambda_O}## on ##\Lambda_O##.
In both cases, we find:
- The probability to find the value of ##O## to lie in the Borel set ##B## is given by ##P_O(B)##.
- The expectation value of ##O## is given by ##\left<O\right> = \int_{\Lambda_O} \mathrm{id}_{\Lambda_O}\,\mathrm d P_O##.
So actually, both formalisms can be reduced to a common framework, namely the list ##(\Lambda_O,\Sigma_O,P_O)_O## of probability spaces associated to each observable ##O##. In fact, this list contains all information that can be computed in each formalism, so it is exactly as good to have this list of probability spaces as having the traditional information as written in the beginning. In the quantum case for instance, this list of probability spaces already contains all that can be known about superpositions.
Now, since the probability spaces ##(\Lambda_O,\Sigma_O,P_O)## are just classical probability spaces, we can also interpret them exactly this way. The computed probabilities are just relative frequencies. So what is the difference between the quantum formalism and classical probability theory? The difference is precisely that the probability spaces ##(\Lambda_O,\Sigma_O,P_O)## that arise from the classical formalism can be combined back into a huge probability space ##(\Lambda,\Sigma,P)##, while this is not possible for the probability spaces that arise from the quantum formalism. This is exactly the only difference between these formalisms and all interpretational questions of quantum mechanics (that go beyond those of classical probability theory) are just instances of the single question: "How should we interpret the fact that the probability spaces ##(\Lambda_O,\Sigma_O,P_O)## can't be combined into one big probability space?" This is the only question that we don't currently have a definite answer to.