I don't know your background, but here is how I would explain it from the beginning. The presence of edge states for TIs/other QH systems is actually a very subtle thing. I have a considerable amount of background in this field (well for a beginning grad student I guess) but I still learn something new every time I read about the QH, FQH, QSHE, or 3D TIs. There are some great review articles written about TIs, one by Kane and Hassan, and it is always great to go back to the original papers. I also like Eduardo Fradkin's book a lot. Bernevig and Hughes also wrote a book just on TIs and TSCs, but I personally like Fradkin's style. Another thing about the QSHE is that it is very intimately related to the other topics I mentioned, so you should really read about the QHE and QAHE to really understand the subject. The original papers by Halperin and Laughlin among others are very nice.
As you probably know, in time reversal symmetric systems with an odd number of electrons you have Kramers degeneracy. For s=1/2 systems, time reversal can be written as isigma^y K where K is complex conjugation and T^2=-1. Time reversal identifies states at k and -k, so when you are at a time reversal symmetric point for some state |a>, then T|a> is also an eigenstate with the same energy as a. Time reversal also flips the spin.
TR polarization refers to looking at the polarization (the integral or the Berry connection) of one member of a Kramer's pair along a cycle. When doing this, you must separate the two halves of the brillouin zone as in equation 3.13 of the paper you are referring to and then relate them by TR. You then can use the relation of the states from time reversal to go to equation 3.14. The reason you need to separate the Brillouin zone is because if the state is topologically nontrivial, there will be a singularity. Equation 3.14 comes from the fact that you have initially picked a gauge. This is represented in the sewing matrix <u_n(-k)|T |u(k)>, which is antisymmetric. Note that for an antisymmetric 2*2 matrix, the pfaffian they refer to is just the 12 component. It is basically an off diagonal matrix of the phases for the states coming from the gauge choice
The log of this phase must be included in the berry connection since the berry connection is like the vector potential in momentum space and the Log of the pfaffian is basically some gauge transformation. You can think of the switch of partners as being like a magnetic monopole singularity. In this case you cannot globally define a vector potential because of the singularity at the origin. In the QSHE the state does not return to itself which is a singularity corresponding to a zero in the pfaffian. The way to overcome this is to define potentials in two sectors and relate them by some gauge transformation. This gauge transformation will be restricted because the wf must be single valued and lead to a topological invariant. For the QHE this is the hall conductance. But for the QSHE it is only modulo two since if there are inversions at two TR symmetric points, the state is trivial.
Now going back to your original question about edge states, the QSHE is actually just a copy of the Haldane model for each spin (the effective B field is from SO). For Haldane's model, you have broken TR and two Weyl fermions with different masses and an onsite potential M with opposite signs on the two sites as well as a staggered flux from NNN (t2). In Haldane's paper, there is a term in m=M-C t2 sin φ (forgot the constant) proportional to sigma^z which can be identified as a mass m in the relativistic energy. The two solutions for m are m=M-C t2 sin φ and m=M+Ct2sin φ. At M=C t2 sin φ you go through a transition where the gap closes and reopens. The hall conductance changes sign here. You can also go through the opposite transition (coming from -infinity) meaning that in between two values you have a nontrivial hall conductance which must mean you have broken TR. If you put the Haldane model next to an insulator with TR (or equivalently, the vacuum), it is like a domain wall where the hall conductance changes by -1. You can then solve the Dirac equation with a weyl spinor and mass proportional to sigma^z and a function of the length of the system. The mass changes sign at the domain wall and if you solve this equation, you see you get some state at the domain wall that decays exponentially into the bulk. You can see a linear dispersion for these states and that they traverse the gap. In the QSHE, the transition occurs from tuning SO. This means the spin up and spin down electrons have gaps of opposite sign. We get an odd number of pairs of edge states since even pairs can scatter and gap each other out. Also, spin is no longer a valid quantum number so the hall conductance is not quantized for this state.
This edge excitations in the QSHE are similar to the ones in the QHE and AQHE, but they go in different directions for different spins. Actually, the chiral Dirac fermion theory for the edges of the IQHE and QAHE are identical. For a physical picture of QHE edge states, think of Laughlin's flux threading argument for the cylinder periodic in y (with an edge at the end of the cylinder and the flux inserted adiabatically). The flux translates the system along ky (think Faraday's law) so you go along the pumping cycl .When you apply an E field, the Landau levels are no longer degenerate (perturbation theory).The bulk is not effected by the potential since the states are filled. However, the edge is affected by this potential and the Landau levels are “lifted”. Now these states near the Fermi energy can move at the Fermi velocity along the edge. This velocity is Ec/B, the drift velocity for e- in perpendicular E and B fields. There is also a hydrodynamical way to think of the edge states as deformations of an incompressible fluid. This causes the excitations of the edge, which are local changes in density.
Another more formal way to see the edge states is from Chern-Simons theory. You of course calculate the hall conductance from the current you obtain in this theory. However, when you look at the action on a system with a boundary, you must separate the two since you can’t just set the boundary term to zero. When you do this, you find both violate individually violate gauge invariance but cancel each other out. This means the surface term needs to be there.
So I hope this helps, please correct me if I misstated anything.