Argh, one more book I thought useful, which you can't recommend anymore. The irony is that the work Einstein got his Nobel prize for is the only one which is completely obsolete today, known as "the old quantum theory" with all the mysterious properties like wave-particle dualism etc. His works on relativity (classical part) and statistical physics are, of course, valid today. It's amusing that Einstein's Nobel certificate explicitly says that the prize was not (!) given for relativity, which was due to some philosophical dispute dominated by a then famous philosopher Bergson, who thought that the relativsitic notion of time was flawed ;-)).
Today we exclusively use "modern quantum theory", which has been discovered almost at the same time (1925-1926) in three formulations: "matrix mechanics" (suggested by Heisenberg, worked out by Born, Jordan, and Heisenberg in 1925), "wave mechanics" (Schrödinger 1926), and "transformation theory" (Dirac 1926). Born and Jordan had already in 1926 also the idea of field quantization, applied to the electromagnetic field, but this was overlooked for some time and had to be rediscovered by Dirac in 1927.
Unfortunately the introductory chapters of textbooks still use the old wave-particle dualism and the wrong picture of photons as "light particles" of the old quantum theory, which regularly leads to confusion. The problem with that approach is that you have unlearn these claims again when going on to study the now established modern quantum theory.
Photons are the least particle-like entities of the entire business. The reason is that they have to be described as massless fields with spin 1 within relativistic quantum field theory. This implies that that they do not even admit the definition of a proper position operator, i.e., it doesn't make sense to talk about "point particles" in the context of photons from very basic principles. Thus photons are not only not localizable but you cannot even really define what "localizable" in a strict sense should mean.
A much better intuitive picture is to think of photons in terms of electromagnetic waves, known from classical electrodynamics since Maxwell's and Hertz's works in the 19th century. The only "particle-like aspect" is that a single-photon must be understood as a specific state of the quantized electromagnetic field, where strictly only one quantum of this field is present. This implies that you can detect only the "entire photon" or "nothing" at a certain place and time, where the place is determined by the location of the photon detector (e.g., a CCD camera or a photo plate). From the fundamental theory of the interaction of the em. field with charged particles you find out that the detection probability at a certain place and time is proportional to the energy density of the electromagnetic field, and that's all you can know about the photon (plus the probability for a given polarization state, if this is also measured).