It was based on his desire to make even accelerating frames "relative" - because of that he developed (or expanded on) the equivalence principle. If that principle is correct, then the same phenomena must be observed in a homogeneous gravitational field (although that hardly exists) as in a constantly accelerated frame. I'm pretty sure that he clearly described that in 1911, but now I can't find that paper so I must tell this from memory (sorry).
So, light that is sent in the direction of acceleration (thus, "up" inside an accelerating rocket in space) is observed to be Doppler shifted (red shifted) when it is received at the "top".
From that he predicted that similarly, light that is emitted from lower to higher gravitational potential will also be red shifted.
Basing himself, as he did for SR, on the wave model of light, he figured out that the number of wave crests in transit cannot change (conservation of cycles).
For example, if for 1 second a radio wave of 1 kHz is emitted, 1000 cycles have been sent and - at least in vacuum - then it is not possible that less than 1000 cycles arrive. Cycles are also conserved in an accelerating rocket: the 1000 cycles simply arrive over a more than 1 s interval.
Apparently (I don't recall how he motivated it), he believed that for the equivalent case in a gravitational field the emitter and the receiver are not magically accelerating, without energy input (disclaimer: that's just one of several arguments that I can come up with). The only remaining possibility was that a clock second is longer near heavy masses.
Thus he predicted that there not only will be redshift, but also gravitational time dilation.