Sorry, but HOW would you know it "will work fine" with the "classical model"? You haven't seen the full data, nor have you seen what kind of subtraction that has to be made.
All the PDC source (as well as laser & thermal light) can produce is semi-classical phenomena. You need to read Glauber's derivation (see [4]) of his G functions (called "correlation" functions) and his detection model and see the assumptions being made. In particular, any detection is modelled as quantized EM field-atomic dipole interaction (which is Ok for his purpose) and expanded perturbatively (to 1st order for single detector, n-th order for n-detectors). The essential points for Glauber's n-point G coincidences are:
a) All terms with vacuum produced photons are dropped (that results in the normally ordered products of creation and anhilation EM field operators). This formal procedure corresponds to the operational procedure of subtracting the accidentals and unpaired singles. For single detector that subtraction is non-controversial local pocedure and it is built into the design i.e. the detector doesn't trigger if no external light is incident on its cathode. {This of course is only approximate, since the vacuum and the signal fields are superposed when they interact with electrons, so you can't subtract the vacuum part accurately from just knowing the average square amplitude of vacuum alone (which is 1/2 hv per EM mode on average) and the square amplitude of the superposition (the vector sum), i.e. knowing only V^2 and (V+S)^2 (V and S are vectors), you can't deduce what is the S^2, the pure signal intensity. Hence, detector by its design effectively subtracts some average for all possible vectorial additions, which it gets slightly wrong on both sides -- the existence of subtraction which are too small results in (vacuum) shot noise, and of subtractions which are too large results missing some of the signal (absolute efficiency less than 1; note that conventional QE definition already includes background subtractions, so QE could be 1, but with very large dark rate, see e.g.
QE=83% detector, with noise comparable to the signal photon counts). }
But for the multiple detectors, the vacuum removal procedure built into the Glauber's "correlations" is non-local -- you cannot design a "Glauber" detector, not even in principle, which can subtract locally the accidental coincidences or unpaired singles . And these are the "coincidences" predicted by the Glauber's Gn() functions -- they predict, by definition, the coincidences modified by the QO subtractions. All their nonlocality comes from non-local formal operation (dropping of terms with absorptions of spacelike vacuum photons), or operationally, from inherently non-local subtractions (you need to gather data from all detectors and only then you can subtract accidentals and unpaired singles). Of, course, when you add this same nonlocal subtraction procedure to semi-classical correlation predictions, they have no problem replicating Glauber's Gn() functions. The non-locality of Gn() "correlations" comes exclusively from non-local subtractions, and it has nothing to do with the particle-like indivisible photon (that assumption never entered Glauber's derivation, that is a metaphysical add-on, with no empirical consequences or a formal counterpart with such properties, grafted on top of the theory, for their mnemonic and heuristic value).
b) Glauber's detector (the physical model behind the Gn() or g2=0, eq.8 of AJP paper) produces 1 count if and only if it absorbs the "whole photon" (which is a shorthand for the quantized EM field mode; e.g. an approximately localized single photon |Psi.1> is a superposition of vectors Ck(t)|1k>, where Ck(t) are complex functions of time and 4-vector k=(w,kx,ky,kz), and |1k> are Fock states in some base, which depend on k as well; note that hv quantum for photon energy is an idealization applicable to infinite plane wave). This absorption process in Glauber's perturbative derivation of Gn() is is purely dynamical process i.e. the |Psi.1> is absorbed as an interaction of EM field with atomic dipole, all being purely local EM field-matter field dynamics treated perturbatively (in 2nd quantized formalism). Thus, to absorb the "full photon" the Glauber detector has to interact with all the photon's field. There is no magic collapse of anything in this dynamics (the von Neumann's boundary classical-quantum is moved one layer above the detector) -- the fields merely follow their local dynamics (as captured in 2nd quantized perturbative approximation).
Now, what does g2=0 (where g2 is eq AJP.8) mean for the single photon incident field? It means that T and R are fields of that photon and
the Glauber detector which absorbs and counts this photon, and to which g2() of eq (AJP.8) applies, must interact with the entire mode, which is |T> + |R>, which means the Glauber detector which counts 1 for this single photon is spread out to interact with/capture both T and R beams. By its definition this Glauber detector leaves EM vacuum as the result of the absorpotion, thus any second Glauber detector (which would have to be somewhere else, e.g. behind it in space, thus no signal EM field would reach it) would absorb and count 0, just the vacuum. That of course is the trivial kind of "anticorrelation" predicted by the so-called "quantum" g2=0 (g2 of eq AJP.8). There is no great mystery about it, it is simply a way to label T and R detectors as one Glauber detector for single photon |Psi.1>=|T>+|R> and then declare it has count 1 when one or both DT or DR trigger and declare it has count 0 if none of DT or DR triggers. It is a trivial prediction. You could do the same (as is often done) with photo-electron counts on a single detector, declare 1 when 1 or more photo-electrons are emitted, 0 if none is emitted. The only difference is that here you would have this G detector spread out to capture distant T and R packets (and you don't differentiate their photoelectrons regarding declared counts of the G detector).
The actual setup for the AJP experiment has two separate detectors. Neither of them is the Glauber detector for the single mode superposed of T and R vectors, since they don't interact with the "whole mode", but only with the part of it. The QED model of detector (which is not Glauber's detector any more) for this situation doesn't predict g2=0 but g2>=1, the same as semiclassical model does, each detector triggers on average half the time, independently of each other (within the gate G window when both T and R have the same intensity PDC pulse). Since each detector here gets just the half of the "signal photon" field, the "missing" energy needed for its trigger is just the vacuum field which is superposed to the signal photon (to its field) on the beam splitter (cf eq. AJP.10, the a_v operators). If you were to split further each T and R into halves, and then these 1/4 beams to further halves, and so on for L levels, with N=2^L final beams and N detectors, the number of the triggers for each G event would be Binomial distribution (provided the detection time is short enough that signal field is roughly constant; otherwise you compund Binomial, which is super-Poissonian), i.e. the probability of exactly k detectors triggering is p(k,N)=C(N,k)*p^k*(1-p)^(N-k), where p=p0/N, and p0 is probability of trigger of a single detector capturing the whole incident photon (which may be defined as 1 for ideal detector and ideal 1 photon state). If N is large enough, you can approximate Binomial distribution p(k,N) with Poissonian distribution p(k)=a^k exp(-a)/k!, where a=N*p=p0, i.e. you get the same result as the Poissonian distribution of the photoelectrons (i.e. these N detectors behave as N electrons of a cathode of a single detector).
In conclusion, there is no anticorrelation effect "discovered" by the AJP authors for the setup they had. They imagined it and faked the experiment to prove it (compare AJP claim to much more cautious claim of the Chiao & Kwiat preprint cited earlier). The g2 of their eq (8) which does have g2=0 applies to a trivial case of a single Glauber detector absorbing the whole EM field of the "single photon" |T>+|R>. No theory predicts nonclassical anticorreltation, much less anything non-local, for their setup with the two independent detectors. It is a pure fiction resulting from an operational misinterpretation of QED for optical phenomena in some circles of Quantum Opticians and QM popularizers.
Have you ever performed a dead-count measurement on a photodetector and look at the energy spectrum from a dark count? This is a well-studied area and we do know how to distinguish between these and actual count rate.
Engineering S/N enhancements, while certainly important, e.g. when dealing with TV signals, have no place in these types of experiments. In these experiments (be it this kind of beam splitter "collapse" or Bell inequality tests) the "impossibility" is essentially of enumerative kind, like a pigeonhole principle. Namely, if you can't violate classical inequalities on raw data, you can't possibly claim a violation on subtracted data since any such subtraction can be added to classical model, it is a perfectly classical extra operation.
The only way the "violation" is claimed is to adjust the the data and then compare it to the original classical prediction that didn't perform the same subtractions on its predictions. That kind term-of-art "violation" is the only kind that exists so far (the "loophole" euphemisms aside).
-- Ref
4. R. J. Glauber, "Optical coherence and photon statistics" in Quantum Optics and Electronics, ed. C. de Witt-Morett, A. Blandin, and C. Cohen-Tannoudji (Gordon and Breach, New York, 1965), pp. 63–185.