There's probably some errors in this, but all the important ideas are there and I don't have time to make sure its exactly right. You might want to go over it and test each step by yourself, since that's pretty informative.
Basically, you start with the klein gordon equation (in c=hbar = 1 units) (\partial^2 + m^2 ) \phi = 0
You get this equation by starting with the relativistic invariant and replacing E with the energy operator and p with the momentum operator. Because this is second order, you can't reproduce the result that probability is conserved like with the Schrödinger equation (it won't hold it's normalization). So Dirac set out to find a first order equation, which would "square" to the klein gordon equation, so that it would both be relativistic and preserve probability.
To state the problem mathematically, you have some operator D, such that:
D \psi = 0 and D^2 \psi = (\partial^2 + m^2) \psi = 0
So you try to write down a first order operator for D, and solve for coefficients:
D = (A\partial - Bm), D^2 = (A\partial - Bm)(A\partial - Bm) = A^2\partial^2 + m^2B^2 - (AB + BA)\partial = \partial^2 + m^2
So A^2 and B^2 have to be one, but that doesn't allow AB + BA to be zero, as long as A and B are complex numbers. Dirac's great idea is that A and B can be matrices, and by finding a set of matrices with those properties you have derived the Dirac Equation.
That's roughly what the derivation is, and how it's linked to special relativity (because it's closely related to the Klein Gordon equation). Anti matter is predicted from solutions of the Dirac equation, but I don't remember how to derive them. Maybe someone else can help you out.