Photons can only be understood by relativistic quantum field theory, and to get the possible field equations, you have to consider the symmetry group of Minkowski spacetime. In quantum theory this symmetry group is represented by unitary representations, and the most simple ones are the irreducible interpretations.
Considering those symmetry transformations that are smoothly connected with the identity, you have the proper orthocrhonous Poincare group, which can be built from the translations of space-time, rotations of space, and Lorentz boosts. The generators of these transformations are then representing the corresponding conserved quantities from Noether's theory (translations in time and space are related to energy and momentum, rotations to angular momentum, and the invariance under Lorentz boosts leads to the theorem that the center of energy of any closed system moves with constant velocity).
Now you can find the unitary irreducible representations of the Poincare group (as was first derived by Wigner in 1939). The upshot is that there are two major classes of such representations, which also admit a socalled local realization of the Poincare group and a local relativistic QFT, where "local" means that there are local observables and all interactions are local, i.e., the Hamilton density ##\mathcal{H}(x)## commutes with all other local observables at space-like separation of the arguments, i.e., ##[\mathcal{H}(x),\mathcal{A}(y)]=0## for all four-vectors ##(x-y)## being spacelike: These two major classes are characterized by the Casimir operator ##p_{\mu} p^{\mu}=m^2## (using natural units with ##\hbar=c=1##) with ##m^2>0## (leading to QFTs describing particles with non-zero invariant mass) or ##m^2## (leading to QFTs describing "particles" (or rather "quanta") with 0 invariant mass).
Since now ##p^0=E## is the energy of a particle you have ##E^2-\vec{p}^2=m^2##. For ##m=0## you have ##E=|\vec{p}|##, and the speed of these particles is ##|\vec{v}|=|\vec{p}|/E=1##, i.e., particles with zero invariant mass move with the speed of light with respect to any inertial frame of reference, which makes the different from the massive particles.
Only for massive particles, you can express ##\vec{p}## in terms of ##\vec{v}=\vec{p}/E## since only then ##|\vec{v}|<1## and only then you can write
$$E^2-\vec{p}^2=E^2(1-\vec{v}^2)=m^2 \; \Rightarrow \; E=\frac{m}{\sqrt{1-\vec{v}^2}}$$
and
$$\vec{p}=E \vec{v}=\frac{m \vec{v}}{\sqrt{1-\vec{v}^2}}.$$
That's why massive particles have always a speed less than the speed of light with respect to any inertial frame of reference, and that's why for massless particles you cannot express ##E## and ##\vec{p}## in terms of the speed, because it's always 1 (i.e., it's always moving with the speed of light).
It turns also out that in another respect it's not so trivial to derive the properties of massless particles from those of massive particles letting the mass go to zero: Besides having a momentum, quantum particles also have a spin. For massive relativistic particles the spin is pretty similar to the spin within non-relativstic quantum mechanics. It's only no longer commuting with ##\vec{p}##, and thus it becomes a bit more complicated to handle.
This drastically changes for massless particles. Massless particles indeed do not make sense in non-relativistic QM, because one can show that massless realizations of the Galilei group do not lead to a dynamical quantum description, which can be in any sense interpreted physically.
In relativisic QFT it makes, however, sense to have massless particles, but they behave a bit different than you naively think. If the spin of a massless particle is ##\geq 1##, and photons are described, by quantum fields leading to massless particles, with spin 1 (i.e., massless vector fields). These, are however special compared to the analogous case of massive vector fields.
First of all it turns out that there are only 2 instead of three spin-degrees of freedom, i.e., electromagnetic fields and their quanta, the photons, have only 2 and not three polarization degrees of freedom, which can be characterized as the total angular momentum component in direction of their momenta, i.e., the helicities, which can only take the values ##\pm 1## but not the value ##0## (as is the case for massive particles). Also for massless particles the helicity is an invariant under arbitrary Lorentz boosts, while for massive particles you can flip the sign of the helicity by transforming from one inertial reference frame to another. This is, because you can always move faster than the massive particle in the original frame, and in the corresponding more frame the helicity looks flipped in sign. This is impossible for massless particles, which move always with the speed of light, and thus you cannot overtake them with any other inertial reference frame, i.e., the helicities for massless particles don't change under Poincare transformations.
Finally, in this approach to relativistic QT via symmetry principles, you have to construct the position observable out of this formalism. As it turns out, this is no problem for massive particles (it's also no problem in non-relativistic QT to derive the position operators, which is no surprise, since in non-relativistic QT the particles must necessarily have a mass, as stated already above), but for massless particles with a spin ##\geq 1##, you cannot find such a position observable, and this implies that it doesn't make any sense at all to think about photons as simple massless classical point-particles, because photons by construction cannot in any clear physical way be localized at all.