I think what happens is when he introduces Theorem 2 he just uses the word "probability" as a label for a number that is associated with particular basis vector in Schmidt decomposition and follows the usual probability rules, and shows how to compute it from symmetry under swaps. At this moment this "probability" is not yet connected to the outcome of measurement. This is done later in section V where he considers multiple memory-record states. But yes, it is rather confusing, I'm not sure I get it.
I'm not sure why you say that. Zurek certainly stresses the "no-collapse" assumption and refers to Everett's Relative States often enough. He may have been avoiding explicit mentioning of DeWitt's "branching" or "splitting" because these expressions [STRIKE]are a can of worms[/STRIKE] do not describe what happens accurately enough.
To me, an interpretation is (assumes, implies) MWI if all "branches" of a wavefunction in superposition are treated on equal footing. This is what you get by default. To make interpretation non-MWI, one has to somehow suppress all branches but one. Different ways to do it are:
- Postulate objective collapse (out of fashion)
- Tag one branch only with particle trajectories in configuration space
- Invoke anthropic principle down to outright solipsism
- etc.
- Just ignore them, they are not worth talking about (I'm sort of ok with this one)
But the main reason I look favourably at MWI is the mindboggling hugeness of the Hilbert space. It just feels too big compared to the size of the configuration space for a single world, but probably just the right size for the entire multiverse

I mean the nature has to compute the wavefunction for the entite multiverse anyway as a side effect of running our world. It seems a shame to throw most if it away
DK
PS My favourite interpretation of QM is http://hitchhikers.wikia.com/wiki/Whole_Sort_of_General_Mish_Mash"