For me the "quantum eraser experiment" is the best example for the fact that collapse ideas are very misleading at best. As all of quantum mechanics it's best understood in terms of the minimal statistical interpretation and the fact that for all practical purposes (FAPP) the frequentist (statistical) interpretation of probabilities is the only reliable interpretation. Everything else is metaphysics and hinders the understanding of the basic concepts of quantum theory. In addition, the quantum-eraser experiment is a good example for the fact that one cannot even describe it adequately without the mathematics underlying quantum theory. However, it can be utmost simplified by only considering the polarization state of the photons.
The polarization of a photon is described by a two-dimensional complex vector space. We work in a fixed reference frame, given by a Cartesian coordinate system. We assume that the photons enter the double slit from the ##z## direction. anThe polarization state is then spanned by the two linear-polarization states ##|\hat{x} \rangle## and ##|\hat{y} \rangle##. In the following we'll also need the left- and right-circular polarization states,
$$| L \rangle=\frac{1}{\sqrt{2}}(|\hat{x} \rangle + \mathrm{i} |\hat{y} \rangle), \quad|R \rangle=\frac{1}{\sqrt{2}}(|\hat{x} \rangle - \mathrm{i} |\hat{y} \rangle).$$
Here and in the following we assume that all these vectors are normalized to 1, e.g., ##\langle \hat{x} | \hat{x} \rangle=1##. Both the linear-polarization and the circular-polarization states then build an orthonormal basis of the two-dimensional vector space of polarization states.
What's now the physical (and only physical!) meaning of these vectors? According to the minimal interpretation, they answer the question, what's the probability to measure a photon with a certain polarization (assuming ideal polarization filters an photon detectors). If the photon is prepared in an arbitrary normalized polarization state ##|\psi \rangle##, then the probability to find it to be, say, polarized in ##x## direction is given by Born's rule:
$$P(\hat{x}|\psi)=|\langle \hat{x}|\psi \rangle|^2.$$
There's no more from a physicist's point of view to be associated with the state, and no more is necessary to use the formalism of quantum theory to real-world experiments. According to this minimal interpretation, how has an experiment to look like when I want to test the above idea about the probabilities? The answer is that the experimentalist has to prepare a lot of single photons, independently from each other, in the state ##|\psi \rangle##, which is done by a practical procedure, e.g., one generates single photons of arbitrary polarization and let's it run through a polarization foil, letting through photons with a given linear polarization, e.g, one which is in a direction given by the angle ##\phi## with respect to the ##x## axis,
$$|\hat{\phi} \rangle=\cos \phi |\hat{x} \rangle+\sin \phi |\hat{y} \rangle.$$
Then you always either get a photon of this polarization state or no photon. Then you only consider the cases, where you have a photon. That's called the preparation procedure. These photons, now prepared in a well-defined polarization state, you let run through a polarization foil, letting through photons which are linearly polarized in ##x## direction and you count how many of the photons, polarized in ##\phi## direction run through this polarization foil. Divide this number by the total number of prepared photons, and you get a relative frequency, which should converge (in the weak sense) to the predicted probability, if quantum theory is right. According to Born's rule, this probability is
$$P(\hat{x}|\hat{\phi})=|\langle \hat{x}|\hat{\phi} \rangle|^2=\cos^2 \phi.$$
Now you can check this hypothesis, using the usual rules for statistical evaluation of data, taught in the basic lab courses at the beginning of any physics curriculum. There's no need for complicated philosophical debates about "collapse" or the "meaning of probabilities" or anything else metatphysical. It's simply this very practical way to measure the polarization. Of course, there's no need to stress the fact that the predictions of quantum theory (QED in this case) about the polarization of photons have been tested very thoroughly, and according to all the measurements of quantum opticians, the predictions were always found to be correct with the highest accuracy, and quantum optics is among the fields with the highest accuracy reached in modern experimental physics ever!
Now let's discuss the single-photon double-slit experiment in a quite qualitative way. Letting run a single photon (with a well-enough defined momentum (in ##z## direction, the double slit is located in a plane parallel to the ##xy## plane) through a double slit leads to an unpredictable single dot on a screen place far away from the double slit also parallel to the ##xy## plane. Preparing very many such photons, all these single spots lead to an interference pattern as expected from classical wave theory (Maxwell electrodynamics). This, however is only true, if the slits are indistinguishable for the photons, i.e., if you cannot know (even in principle) through which slit each single photon has gone. If the initial photon's polarization state is |\psi \rangle the photon distribution after the slit is described by
$$|\psi' \rangle=N [1+\exp(\mathrm{i} \varphi(x))] |\psi \rangle, \quad \varphi(x) \simeq 2 \pi \frac{x d}{l \lambda},$$
where ##x## is the x position on the screen, ##d## the distance of the slits ##l## the distance between the slits and the screen, and ##\lambda## the wavelength of the photon, and ##N## is an appropriate normalization factor. This you can read in any elementary textbook on optics (it doesn't even need to be a quantum textbook, because the interference pattern is of course described by classical optics in this approximation).
The detection probability making the interference pattern when a large number of photons are sent through the slits, in this case is, according to Born's rule
$$P(x)=|N|^2 |1+\exp(\mathrm{i} \varphi x)|^2=2|N|^2 [1+\cos \varphi(x)].$$
How to gain such which-way information? The only way is to mark the slits somehow, and for this one can use quarter-wave plates. These are made of a birefringent chystal, which have a different index of refrection for photons polarized in two perpendicular directions. They are made just as thick that a photon polarized in \phi direction suffers a phase shift that is different from the phase shift of a photon that is polarized perpendicular to that, by ##\pi/4##. That's why it's called a quarter wave plate. Now we put in one of the slits a quarter-wave plate that's oriented such that \phi_1=+\pi/4 and the other perpendicular to it \phi_2=-\pi/4. Now again we prepare photons polarized in \hat{x} direction and do the double-slit experiment. Without the quarter-wave plates we'd find a pattern showing interference effects when looking at the result of a large ensemble of such photons. Now, with the wave plates, we mark the photons dependent on through which slit they go, because those running through slit 1 will become circular left-handed and those running through slit 2 will become right-handed polarized. Thus you only need to measure whether behind the slits a photon is left- or right-handed polarized. If you don't do this measurement but simply let run the photons through this "manipulated" double slits, the interference pattern is gone. It appears in the "non-manipulated" experiment, because of the relative phase shift due to the different pathlenght a photon going through the one or the other slit suffers and then superimposing the probability amplitudes, taking its modulus squared. Then you have an interference term, because the photons coming through the one or the other slit are still both ##x## polarized, and there's an interference term present in the probability at which place the photons will hit the screen. With the quarter-wave plates put into the slits, the photon coming to slit one is in the polarization state |L \rangle and the one coming through slit 2 is in the polarization state ##|R \rangle##. Thus in this case the state of a photon hitting the screen at position ##x## is given by
$$|\psi' \rangle= \frac{N}{\sqrt{2}} [|L \rangle + \exp(\mathrm{i} \phi(x)) |R \rangle].$$
Since now the two kets in this superposition are perpendicular to each other we find for the detection probability
$$P'(x)=|N|^2,$$
which is ##x## independent, i.e., the interference pattern is completely gone.
It's important to note that we don't need to take notice about through which slit the photons have gone. It is only important that the photons carry now this information, and that it is possible to gain this information completely by making a measurement. The reason, why we can gain the complete which-way information in the setup with the quarter-wave plates is, that the polarization state of a photon going through the one slit is exactly orthogonal to the one of the photons going through the other, and that's decieded by the very fact that we have placed the quarter-wave plates in a relative orientation different by precisely ##\pi/2##. If you distort this relative orientation, you cannot gain complete which-way information, but only with a certain probability that the photon is gone through slit 1 or slit 2. Then you can only gain partial which-way information, and the interference pattern is still partially present (but with lower contrast). In this sense which-way information and the contrast of the interference pattern are mutually exclusive possibilities to prepare the photons running through the double slit.
Now comes the really difficult part :-), but it's difficult only in the sense that we are not used to quantum entanglement in everyday life. The formalism is pretty much straight forward. The point is that nowadays the quantum opticians can create pretty easily photon pairs with entangled polarizations. This was not easy some decades ago, when A. Aspect made his groundbreaking experiment concerning such biphoton states demonstrating the violation of Bell's inequality for the first time during his PhD work:
http://en.wikipedia.org/wiki/Alain_Aspect
The modern way of creating such biphotons is to shine laser light on certain types of berefringent crystals, leading to the emission of a pair of photons that are entangled concerning their polarization states. A composite system's state vector is described by tensor products of the single-component state vectors and (very important!) by superpositions of such direct-product states. Again, we only note the polarization state of the photon pair and look at the case where "singlet states" are prepared, i.e.,
$$|\Psi \rangle = \frac{1}{\sqrt{2}} (|\hat{x} \rangle \otimes \hat{y} \rangle - |\hat{x} \rangle \otimes \hat{y} \rangle).$$
We have not noted the other degrees of freedom (e.g., position). The funny thing, however is, that the two photons can go a far distance apart from each other without the polarization state being disturbed, i.e., all the time, from their creation on, they are in this state, as long as nobody disturbs (measures!) their polarization state. So it's probable to detect one photon at one place A ("Alice" measuring the photon) letting it run through a polarization filter to measure its spin and the other one at a far distant place B ("Bob" measuring the photon).
The first question now is, what would Alice and Bob find when measuring their photon's polarization? This is answered by "tracing out" the unmeasured degrees of freedom. Although the biphoton state is a pure state, usually the parts of such a composite system (here the polarization of, say, Alice's photon) is a mixture, and the corresponding statistical operator is described as
$$\hat{\rho}_A=\mathrm{Tr}_B |\Psi \rangle \langle \Psi|=\frac{1}{2} (|\hat{x} \rangle \langle \hat{x} |+|\hat{y} \rangle \langle \hat{y} |).$$
This means, before the measurement Alice does not know anything about her photon's polarization. It's a state of minimal knowledge (maximal entropy). What does that mean? Again, to make sense of all these probabilistic content of the state, one has to prepare an ensemble of equally prepared biphotons and do the experiment very often to check statistically whether the prediction of QT is correct. Here ##\hat{\rho}_A## simply describes that a so prepared ensemble leads to totally unpolarized photons for Alice. The same holds for Bob's photons.
But now comes the very "quantic" point about such entangled states! The trick is that Alice and Bob note the polarization of their photons both measuring the polarization in ##x## direction and note the time when their photon arrived. Because the photons where sent from a common place the time stamps admit to make sure that you compare the states of the two photons that belong together from the very beginning as they were prepared by parametric downconversion in the entangled state ##|\Psi \rangle##. And at the end of the experiment, Allice and Bob can meet and compare their results. Now they can ask the following: Is there a correlation between the outcome of Bob's and Alice's measurement of the polarization state of their photons? Quantum theory predicts the following: If Alice finds that her photon is polarized in x direction (which happens in 50% of all cases), then Bob's photon is described by the corresponding projection,
$$|\hat{\rho}_{B|A \hat{x}} =\mathrm{Tr}_A(|\hat{x} \rangle \langle \hat{x} \langle \otimes \hat{1} |\Psi \rangle \langle \Psi|)=\frac{1}{2} |\hat{x} \rangle \langle \hat{y}|.$$
This says: If Alice measured here photon to be ##x## polarized, Bob's photon must necessarily by ##y## polarized. If Alice finds her photon to be ##y## polarized, Bob's photon must be ##x## polarized, and each case happens in 50% of all measurements.
Although each experimenters ensemble of photons as a whole shows totally unpolarized photons, there is this 100% correlation between them. It also doesn't matter, whether Alice or Bob measure there photons first or if they do it at the same time. The outcome is always the same. The 100% correlation is thus not due to the measurement of 1 photon but, as also clear from the above description, due to the common preparation of the two photons by parametric downconversion. Also neither Alice nor Bob can know about the correlation just from their measurement, they have to compare their measurement protocols afterwards to find the correlation. Thus, it's not possible to send any signals via the correlation by manipulating one of the photons an measure the other. Thus the entanglement does not admit a measurement protocol that could enable us to communicated with signals faster than the speed of light in vacuum. So everything is consistent with the relativistic space-time and causality framework. We mention in passing that such measurements can check Bell's inequality. You only have to measure Alice's photon polarization in another cleverly chosen direction relative to Bob's. But let's come back to the quantum eraser experiment now.
Now we use a parametric-down converted pair of photons as describe above. Alice's photon is sent through the double slit with the quarter-wave plates, again oriented in ##+\pi/4## and ##-\pi/4## orientation. First we have to check, what the quarter-wave plates do to Alice's photon within the biphoton state. To that end we note that the unitary operators describing the quarter-wave plates' operation on an arbitrary single-photon polarization state is given by
$$\hat{Q}_{\pm}=|L \rangle \langle \hat{x} | \pm \mathrm{i} |R \rangle \langle \hat{y}|.$$
Thus after Alice's photon is gone through the double-slit, the photon going through slit 1 and 2 the biphoton corresponding states are given by
$$|\Psi_{+}' \rangle = (\hat{Q}_{\pm} \times \hat{1}) |\Psi \rangle= \frac{1}{\sqrt{2}} (|L \rangle \otimes |\hat{y} \rangle + \mathrm{i} |R \rangle \otimes |\hat{y} \rangle).$$
$$|\Psi_{-}' \rangle = (\hat{Q}_{\pm} \times \hat{1}) |\Psi \rangle= \frac{1}{\sqrt{2}} (|R \rangle \otimes |\hat{y} \rangle - \mathrm{i} |L \rangle \otimes |\hat{y} \rangle),$$
and the probability distribution of Alice's photons at the screen is given by the biphoton state
$$|\Psi_1' \rangle=N(|\Psi_+' \rangle+\exp(\mathrm{i} \varphi(x)) |\Psi_0' \rangle).$$
The interference pattern is gone again, because ##|\Psi_+' \rangle## and ##|\Psi_-' \rangle## are orthogonal to each other. Obviously we can gain which-way-information with 100% accuracy by measuring both photon's polarization state. If Alice's measures an L-polarized photon and Bob a ##x## polarized one the photon must have gone through the ##-## slit and anlogously for the three other possible cases. One should note that we can put Bob so far away from Alice and the double slit that his measurement of the photon-polarization state does not affect Alice's photon in any way.
Now we alter the experimental setup only at Bob's place. We let him direct his polarization filter in an angle ##\alpha## relative to the ##x## axis and consider only the case when Bob's photon is found to be in this state, which is the case in 50% of all cases (Bob's photons are of course totally unpolarized as Alice's as discussed above). The corresponding sub-ensemble then is found by projecting out all other states. The projection operator for Bob's single photon polarization state is given by
$$\hat{P}_{\alpha}=|\hat{\alpha} \rangle \langle \hat{\alpha}|, \quad |\hat{\alpha} \rangle=\cos \alpha |\hat{x} \rangle + \sin \alpha |\hat{y} \rangle.$$
Thus the interference pattern for the sub-ensemble is given by the (unnormalized!) state ket, when setting $\alpha=\pi/4$:
$$\hat{1} \otimes \hat{P}_{\pi/4} |\Psi_1' \rangle=N \frac{1+\exp(\mathrm{i} \varphi(x))}{2 \sqrt{2}}(|L \rangle \otimes |\hat{\pi/4} + \mathrm{i} |R \rangle \otimes |\hat{\pi/4 }\rangle).$$
Now the interference pattern is found again at full contrast, but the overall brightness is reduced by a factor 1/2, because we have only looked at an sub-ensemble. Again we stress that we can make this interference pattern visible only after bringing Alice's ans Bob's measurements together, and we must make sure that Alice and Bob note their times when detecting their photons (Alice measures the position of her photon hitting the screen and Bob notes, whether his photon has passed his polarization filter orientied with angle ##\pi/4## relative to the ##x## axis) to be able to identify the photons belonging to the same biphoton. Then, thanks to the 100% correlation encoded in the polarization-entangled photons, we can filter out all photons at Alice's screen which are entangled with Bob's photon, which is measured to have the corresponding ##\pi/4##-polarization, and this happens after all photons are long gone just using Alice's and Bob's measurement protocols.
The total ensemble of Alice's photons do not make up the interference pattern in any case, because it's possible to gain which-way information in principle, but for this Bob would have to make a polarization measurement with his polarizer exactly oriented in ##x## (or equivalently in ##y##) direction. Orienting Bob's polarizer in Direction ##\alpha=\pi/4##, does not enable us to gain which-way information at all. The which-way information is completely gone for the so obtained subensemble, i.e., again the appearance of a polarization pattern at full contrast is only possible if Bob makes a measurement which disables us to gain any information about the way Alice's photon has taken through the double slit. The reappearance of the interference pattern is thus due to "post selection", i.e., it is done long after all photons are gone, and thus it's indeed not Bob's measurement that causes the reappearance of the pattern, but the reappearance by post selecting the appropriate subensemble is due to the original preparation of the photon pair at the very beginning in the entangled state, describing the (long-range) correlations between the polarization states of the two photons in the biphoton state.
Thus, there's no collapse assumption whatsoever needed to describe the reappearance of the interference pattern through "erasing the which-way infromation" due to Bob's ##\alpha=\pi/4## polarization measurement, having an effect on Alice's photon whatsoever. This would indeed violate Einstein Causality, because choosing the distances between the biphoton source and Alice's double-slit experiment and Bob's polarization measurement such that Alice's photon hit's the screen earlier than Bob has measured the polarization state of his photon, the cause (collapse through Bob's measurement) would be after the effect (reappearance of Alice's interference pattern for the appropriate sub-ensemble). Thus there is not only no need for a collapse assumption but this assumption would violate the very reason for why physics is possible it all, namely the validity of causality!
That's why I stick to the minimal-ensemble interpretation, which is fully satisfactory from a physicist's point of view. Nature doesn't ask whether we like how she behaves or if its behavior is consistent with our metaphysical or philosophical prejudices, she just behaves like she does. That's it. Case closed!