This is a very interesting question, and primarily from the point of view of human psychology. More about that in a moment.
The original poster is correct, the only valid explanation for spin-orbit coupling comes from the Dirac equation. A full derivation of the term can be found for example in "Relativistic Wave Mechanics" by Corinaldesi and Strocchi, Chap VIII, Non-Relativistic Limit of the Dirac Equation. What's especially interesting is not so much that it's a relativistic effect (Of course it is! Electron spin is already a relativistic effect) but that it's an effect which is second order. The first order nonrelativistic approximation to the Dirac equation is the Schrodinger-Pauli equation. The energy from the spin-orbit term is a small correction to this, smaller than the first-order atomic binding energy by a factor of e2/ħc, the fine structure constant. (Which of course is why we call it that.)
Not only is it a relativistic effect, the derivation is quite mathematical. So people start saying to themselves, "Wait, isn't there a simpler approach? Some way we can understand this intuitively?" Well no, there isn't. It just has to be swallowed. But this is where the psychology comes in.
Of course, Physics is full of models, in which you simplify things by making an assumption. But to mean anything the assumption must be an approximation, i.e. it can't just be blatantly false, it must be valid in some limit. But sometimes we forget that, and make assumptions which are so intuitively appealing that they take on a life of their own, even though they contradict physics. Examples are the hole model of antiparticles, the vector model of angular momentum, and so on.
One of these ideas was in the first suggestion above, that an electron is in some ways like a classical extended object, a charged sphere, rotating on its axis perhaps. Another one, more common in the spin-orbit context, is to imagine that the electron stands still and the nucleus revolves about it. The nucleus then produces a B field which the electron spin reacts to, causing the spin-orbit term. Now if you know any physics at all, you know that it is not valid to take over the Dirac equation to a noninertial frame. Nevertheless, this idea is in many textbooks, and has been handed down from generation to generation of students because it is so intuitively appealing. Accompanying this is usually mention of an additional correction due to the Thomas precession.
[By the way, non-commutativity of Lorentz transformations is an equally pseudo way of deriving the Thomas precession, but that's another topic!]
Any such scheme (I can't call it a model) will result in an answer that's pretty close, maybe off by a factor of 2 or 3, but otherwise very satisfying. This reinforces the feeling that we now "understand" the effect much better. Why is this? Because by dimensionality arguments, almost anything that produces the right units can be at most off by a small numerical factor, even if what it's based on is complete horseradish.