The four-momentum transformations are exact and don't depend on distance, so the formula derived from them should be considered exact. Why would it be identical to Einstein's formula if Einstein's formula was just an approximation?

No, actually the formula printed in Einstein's paper is identical to the exact formula and it is a far-field approximation of another formula.
But semantics aside, I get what you mean.

However, this indicates that Einstein's Doppler model is inaccurate. Einstein's model uses spherical Doppler wavefronts, why is this model inaccurate?

I understand that he uses plane waves since this is the far-field approximation.

But, if Einstein took all the spherical wavefronts into account and came up with an exact formula based upon that, that formula would differ from the printed formula, which means it would differ from the formula derived from the four-momentum transformations.

So if we consider the Doppler model exact, or if we consider the four-momenta exact, we arrive at different results. Which should not be. Maybe there's a catch somewhere, but I don't know where it could lie.

The 4 momentum is for a 'particle of light'. That is, the geometric optics approximation. It is exact for a classical massless particle, and not exact for classical EM radiation.

Then I don't understand your question. If you understand that he is approximating a spherical wave as a plane wave then what are you confused about? With that approximation Wikipedia and Einstein match.

But he didn't, so this supposition it isn't relevant.

The spherical Doppler is approximately a plane wave (per Einstein's approximation), which is exactly the four momentum.

Does this mean that the further we are from the Doppler source, the more the EM radiation behaves like classical massless particles, and the closer the 2-index tensor approximates the 4-vector?

It might help to focus on the angle in that formula. And I think Einstein's wording leaves something to be desired about what the angle represents.

Here's how I'd put it:

A source emits an idealized unidirectional light wave. Think of it as a laser pulse or a photon if you'd like, or as a wave front sufficiently far away from the source to be regarded as unidirectional (that's what Einstein is getting at). In the source's rest frame, the light wave has frequency ##f##. Now consider an observer moving with some arbitrary constant velocity ##\vec \beta = \vec v / c## relative to the source. Say that ##\vec \beta## forms an angle ##\theta## with the direction of propagation of the emitted light wave (as measured in the source's rest frame). Einstein's Doppler formula tells us that in the moving observer's rest frame, the light wave has frequency

where ##\beta = |\vec \beta|## and ##\gamma = (1 - \beta^2)^{-1/2}##. Also note that ##\beta \cos \theta## is simply the component of ##\vec \beta## that's parallel to the direction of propagation of the light wave (as measured in the source's rest frame).

If the light wave were spherical, the formula would still work, but the ##\theta## would need to be updated for each "part" of the wave.

No. It only means that for one particular purpose, calculating the Doppler shift of the radiation, using a 4-vector is a good enough approximation far away from the source. There are still aspects of the radiation that are not captured in this approximation, and which can be probed by other measurements besides those that measure the Doppler shift. (For example, you can't explain the signal that comes through your radio antenna by describing the incoming radiation as a 4-vector, even if you are far away from the source.)

Doesn't the four-momentum derivation get us precisely the same formula, with the same domain of applicability?

So we start with the general four-momentum:

##\vec P = (E, \vec p c)##.

Since it's a four-vector, its components obey the Lorentz transformation. Specifically (with standard configuration):

##E^\prime = \gamma(E - \beta \, p_x c)##.

Too lazy to do the trigonometry right now (maybe someone else will?), so let's just simplify this and say that all the momentum is in the x-direction: ##p = p_x##. Now the Lorentz transformation for energy can be written:

##E^\prime = \gamma(E - \beta \, pc)##.

For our idealized unidirectional monochromatic light wave, classical EM gives us ##E = pc##. So:

It is irrelevant because his equations and claims are for plane waves, not spherical waves, and the Wikipedia derivation is also for plane waves. Spherical waves are not considered in either source, so they are not relevant to the comparison of the two sources. It is like reading two travel books on Singapore, getting confused, and then trying to resolve it by asking a question about Hong Kong. Yes, your plane may have flown through Hong Kong to get Singapore, but once there you expect every book to show the same streets.

The four momentum is for a plane wave. Einstein points out that the field far from a source is approximately a plane wave. Einstein uses the plane wave approximation to derive a formula for plane waves which is identical to the plane wave formula used by Wikipedia in another way. Then you are surprised by the fact that the Einstein plane wave equation matches the Wikipedia plane wave equation. This should not be surprising.

Here seems to be a lot of unnecessary confusion. Of course, electrodynamics can be written in manifest covariant form. It's a Poincare invariant local classical field theory and as such can be formulated with Minkowski tensors (I restrict myself to special relativity here). To see this, we start with the usual non-covariant formalism (although it can be made also covariant, using the Silberstein-vector representation, but that's an advanced topic) 1+3-D formalism in a fixed inertial reference. I use Heaviside-Lorentz units, which are the most convenient and natural ones in the context of relativistic electromagnetics. The homogeneous Maxwell equations read
$$\vec{\nabla} \cdot \vec{B}=0, \quad \vec{\nabla} \times \vec{E}+\frac{1}{c} \partial_t \vec{B}=0.$$
From the first equation we see that (at least locally) there exists a vector potential, ##\vec{A}## such that
$$\vec{B}=\vec{\nabla} \times \vec{A}.$$
Plugging this in the 2nd equation gives
$$\vec{\nabla} \times \left ( \vec{E}+\frac{1}{c} \partial_t \vec{A} \right)=0,$$
and thus the field in the parenthesis is (at least locally) a gradient field
$$\vec{E}=-\vec{\nabla} A^0 - \frac{1}{c} \partial_t \vec{A}.$$
Now I've already written ##A^0## for the scalar potential, which is suggestive for putting ##(A^0,\vec{A})## together as components of a Minkowski four-vector field, ##A^{\mu}##. Now we have to check whether the inhomogeneous Maxwell equations are compatible with that.

We first note that the charge density and current density can be put into a four-vector field,
$$(j^{\mu})=(c \rho,\vec{j}).$$
Indeed we can write for a charged-particle flow ##\vec{v}##,
$$(j^{\mu})= \rho (c,\vec{v})=\frac{\rho}{\gamma} c (\gamma,\gamma \vec{\beta})=\rho_0 c (u^{\mu}),$$
where
$$\vec{\beta}=\frac{\vec{v}}{c}, \quad \gamma=\frac{1}{\sqrt{1-\vec{\beta}^2}}.$$
Now ##\rho_0## is the density in the (local) restframe of the fluid cell and as such a scalar, while the four-velocity
$$(u^{\mu})=\gamma (1,\vec{\beta})$$
is a four-vector field, and thus ##(j^{\mu})## is indeed a four-vector field.

Now let's see what the inhomogeneous Maxwell equations tell. In (1+3) form they read
$$\vec{\nabla} \cdot \vec{E}=\rho=\frac{1}{c} j^0, \quad -\frac{1}{c} \partial_t \vec{E}+\vec{\nabla} \times \vec{B}=\frac{1}{c} \vec{j}.$$
To see that these are in fact Poincare-covariant equations, we plug in the potentials and use ##x^0=ct##,
$$\vec{\nabla} \cdot (\vec{\nabla} A^0+\partial_0 \vec{A} )=(\Delta A^0+\partial_0 \partial_i A^i)=-\frac{1}{c} j^0.$$
Now
$$\Delta = \sum_i \partial_i \partial_i=-\partial^i \partial_i,$$
and thus
$$(\partial_i (\partial^i A^0-\partial^0 A^i)=+\frac{1}{c} j^0.$$
This suggests to introduce the antisymmetric Faraday tensor,
$$F^{\mu \nu}=\partial^{\mu} A^{\nu}-\partial^{\nu} A^{\mu} \; \Rightarrow \; \partial_i (\partial^i A^0-\partial^0 A^i) = \partial_{\mu} F^{0 \mu}=\frac{1}{c} j^0. \qquad (*)$$
Now it's clear that we must get also the Maxwell-Ampere law in terms of ##F^{\mu \nu}##. Indeed plugging in the potentials gives
$$-\partial_0 (-\partial_0 \vec{A}-\vec{\nabla} A^0)+\vec{\nabla} \times (\vec{\nabla} \times \vec{A})=\frac{1}{c} \vec{j}.$$
Using
$$\vec{\nabla} \times(\vec{\nabla} \times \vec{A})=\vec{\nabla} (\vec{\nabla} \cdot \vec{A})-\Delta \vec{A},$$
written in terms of components we get
$$\partial_0 (\partial^0 A^i - \partial^i A^0) + \partial_j (-\partial^i A^j+\partial^{j}A^i)=\frac{1}{c} j^i$$
or with the Faraday-tensor components
$$\partial_0 F^{0i}+\partial_j F^{ji}=\partial_{\mu} F^{\mu i} =\frac{1}{c} j^i.$$
Together with (*) we thus have also the inhomoegenous Maxwell equations covariant form
$$\partial_{\mu} F^{\mu \nu}=\Box A^{\nu} -\partial^{\nu} \partial_{\mu} A^{\mu}=\frac{1}{c} j^{\nu} \quad \text{with} \quad \Box = \partial_{\mu} \partial^{\mu}.$$
You can use gauge invariance to simplify the equation for the four-vector potential by imposing the Lorenz gauge condition,
$$\partial_{\mu} A^{\mu}=0,$$
leading to
$$\Box A^{\mu}=\frac{1}{c} j^{\mu}.$$
For a free field, ##j^{\mu}=0##, the plane-wave solution reads
$$A^{\mu}=a^{\mu} \exp(-\mathrm{i} k \cdot x)$$,
where ##a^{\mu}=\text{const}## and ##k=(k^{\mu})=\text{const}##. From ##\Box A^{\mu}=0## one finds
$$k_{\mu} k^{\mu}=0$$
and from the Lorentz-gauge condition
$$k_{\mu} a^{\mu}=k \cdot a=0.$$
The gauge is still not completely fixed, because you can change to another vector potential by
$$A_{\mu}'=A_{\mu} +\partial_{\mu} \chi$$
with an arbitrary scalar field ##\chi## which obeys the wave equation ##\Box \chi=0## without violating the Lorenz-gauge condition. This can be used to arbitrarily set ##a^0=0## ("radiation gauge"), and you are left with two independent components for ##a^{\mu}=(0,\vec{a})## subject to the transversality condition ##\vec{k} \cdot \vec{a}=0##.

Now as any other vector field ##A^{\mu}## tansforms under LT's as
$$\bar{A}^{\mu}(\bar{x})={\Lambda^{\mu}}_{\rho} A^{\rho}(x).$$
Thus we have
$$\bar{k}=\hat{\Lambda} k,$$
which includes the Doppler and aberration effect in compact form.