I can add to mfb's already nice answer.
So for the matrix element, you are literally calculating the process ij -> W + k l m (where i,j,k,l,m are quarks or gluons - this would contribute to part of the W+3jet process if you applied the jet algorithm to it).
At the level of the matrix element, it has poles/Infrared divergences when the k l m partons are emitted soft or collinear with respect to one of the other partons in the process. The easiest way to see this would be to calculate the matrix element, which is a function of invariants like p_i . p_j, p_k . p_l ... etc. (in principle there will be a minimal subset of invariants, for 2->2 it is the usual S,T,U parameters, for 2->3 there are 5 minimal invariants etc. so on).
The function for this matrix element will involve combinations like:
p_i . p_k / (p_l . p_m)... etc.
And the divergences are when the stuff in the denominator goes to zero. To see these as divergences in Delta R, (eta,phi), I think you should write the distance between the four-vectors in eta and phi.
If for this process we are considering ( 2 -> W +3 ), we require that these 3 final state partons are well separated in Delta R all possible combinations of these invariants A / (p_l . p_m) ... will kinematically not contain any divergences. As such, you can compute the process and have no problems.
If instead, two of your 3 final state partons were very close. In terms of jets, the two partons would be reconstructed as one. The final state would then be W+2 jets. The IR divergences of this 3 partons final state (2 jet final state) are then canceled by a virtual correction to the 2 parton final state (where these two partons are well separated and form two jets).
Let me summarise the main points for W+n jets. Firstly, the W+1 parton final state is part of the NLO QCD correction to W+0 parton production. If we require this 1 parton to be well separated from other partons (that is to say the incoming partons) it counts as a jet. If the parton were collinear/soft with respect to the incoming partons then I would need the IR divergences from the virtual correction to cancel it. This argument then holds for the calculation of W+2 partons... bla bla bla.
The cancelation of these divergences which is performed is a consequence of KLN-theorem. Notice here we are only talking about Infrared divergences (not UV divergences).
Does this help, or is something not so clear?
%%%%%
There are only a very few number of processes which have been computed at NNLO in such a way that they can be interfaced with a parton shower (NNLO+PS see for example
http://arxiv.org/abs/1309.0017 and subsequent publications). However, NLO+PS has been automated now. This is one of the most important experimental tools, since it combines NLO accuracy for some observables (total cross section etc.), but has the benefit that it includes parton shower effects (so lots of gluon/quark splittings resummed to all orders at collinear leading log accuracy), as well as many non-perturbative effects which have been modeled/fit to data.
Of course, as mfb says, sometimes it is appropriate to compare to a calculation which is truly fixed-order (no parton shower). For example, comparisons of top quark differential measurements at NNLO fixed order accuracy (which are available at NLO+PS production and decay accuracy).
####
Edit: finally, as an after thought. It's probably better to look at these divergences at the level of the squared matrix element. Collinear divergences factorise at the level of the matrix element squared (not at the amplitude level like soft divergences). So I think it's better to analyse them not at the level of a matrix element. Not totally sure here though.