Here is a simplified explanation of these different patterns in the DCQE experiment as performed by Kim et al, which I posted earlier somewhere on this forum. It is a bit long, but maybe it helps you understand (and I am too lazy to write a completely new description ;) ).
You can treat this experiment as if every photon here originates from either point A or B (following the pictures in the DCQE paper by Kim). Now the setting is as following:
a) In entangled photon experiments each photon on its own behaves like incoherent light. This means, there is no or at least not much correlation between the phases (\phi_1 and \phi_2) of the fields originating from points A and B at the same moment. For the sake of simplicity I will now treat the phase of these fields as completely random and the amplitude as constant and equal.
b) The two-photon state has a well defined phase. This means that the fields of both paths (signal and idler), which originate from the same point (A or B) have a fixed phase relationship. For the sake of simplicity I will now assume, that the initial phase is the same for signal and idler.
Let's first consider, what happens at the detector D0, which scans the x axis. Just like in an usual double slit experiment you will not detect any photons, if there is destructive interference and will detect a large number of photons, if there is constructive interference. Each point P on the x axis, which the detector scans, corresponds to a certain path difference between the paths from A to P and B to P, which can be expressed in terms of an additional phase difference \Delta \phi. So you will have constructive interference at a point if \Delta \phi +(\phi_2 -\phi_1) = 2 \pi. As \Delta \phi is a constant for each point this means that a detection at a certain point P means, that the phase difference between fields at point A and B had a fixed value \phi_2 -\phi_1 at a certain time short before. So in fact, scanning the x-axis means scanning the phase difference of the fields. As the fields change completely random and independent of each other, each phase difference will be realized and there will be no interference pattern at D0 alone.
Now let's have a look at the other side. There are two detectors, which both fields can reach and two detectors, which can only be reached by one field. Let me explain D1 first. I will assume that the distances from A and B to each of the detectors are equal. Before the field originating from A reaches the detector, it crosses 2 beam splitters (no reflection) and a mirror. This influences the phase of the field. Assuming that the beam splitters are 50-50 each transmission changes the phase by \frac{\pi}{2} and each reflection changes the phase by \pi. So summarizing the phase of the field originating from A will be \phi_1+\pi(reflection at the mirror) + 2*\frac{\pi}{2} (transmission at the beamsplitters. The field reaching D1 from point B is reflected twice and transmitted once, so the phase will be \phi_2 + 2*\pi+\frac{\pi}{2}, so the phase difference at the detector will be \Delta \phi=\phi_2+2*\pi+\frac{\pi}{2}-(\phi_1+\pi+2*\frac{\pi}{2})=\phi_2-\phi_1+\frac{\pi}{2}. Of course a detection implies again, that there is no destructive interference and most detections will occur, if \phi_2-\phi_1+\frac{\pi}{2}=2 \pi (or a multiple).
Now let's check the other detector. Here the field originating from A is reflected twice and transmitted once and the field originating from B is transmitted twice and reflected once, which leads to a phase relationship of \Delta \phi = \phi_2 + \pi + 2* \frac{\pi}{2}-(\phi_1+ \frac{\pi}{2}+2*\pi)=\phi_2-\phi_1-\frac{\pi}{2}. So the \Delta \phi at the two detectors are exactly \pi out of phase. This means that constructive interference on one detector at some certain phase difference automatically implies destructive interference at the other. So each detector selects a set of phase differences. Let me once again stress that the phases are completely random, so there will be no interference on these detectors either. The detectors D3 and D4 are simpler. As there is only one field present, there will be no interference and the phase does not matter. The detections will be independent of \phi_1 and \phi_2.
Now we're almost done. Now we have to consider the two-photon state, where the relative phases are not random anymore. As I stated before, a certain spot on the x-axis of D0 corresponds to a certain phase difference \phi_2 - \phi_1. This very same phase difference will also correspond to a certain amount of constructive (or destructive) interference on D1 and consequently also (due to the different geometries concerning transmission and reflection mentioned above) to an equivalent amount of destructive (or constructive) interference on D2. So you will see this interference pattern in the coincidence counts of D0/D1 and D0/D2 due to the fixed phase difference of the two photon state. Now it is also clear, why there is no interference pattern if you have which-way information. If you have which-way information, there is just one field present, which has a random phase. There is no interference pattern present, which corresponds to the phase difference, which is present at D0 and therefore no interference pattern can show up in the coincidence counts.
I hope this simplified scheme shows, why the choice between the interference pattern and the which-way information can be done after the signal photon has already been detected, why it does not depend on whether we have a look at the data or not and that there are no problems with causality.