TIL about SIGSALY.
https://en.wikipedia.org/wiki/SIGSALY
This was an encryption method used in WWII between the highest levels of US and Allies.
It follows the method that was familiar to me (and in common use today, I think): digitize the voice, then modify those values with a pseudo-random number generator before transmitting. On the receiving end, the complementary arithmetic is applied with an identical pseudo-random number generator.
Today, digitizing speech is easy, cheap and common in 8 bit ~ 16 bit precision.
Today, creating a pseudo-random number stream is easy, cheap and common. Start both generators ( a few lines of code?) with the same 'key', and you will get an identical random appearing number stream.
But in those days, neither was so easy.
To digitize the voice, and limit the amount of data required, they used a 10 band vocoder, quant to 6
values (not 6 bits!), and measured the frequency of the voice ( 6 course and 6 fine - 36 values), and set a bit for 'voiced' or 'unvoiced' (a pitch with harmonics was used on the receiver to simulate a voice, white noise to simulate consonants like "s", "t", etc), sampled every 50 msec. If I did this right, that would be a data rate of about 1,460 bits per second. Take that mp3! Though I wonder what that level of fidelity would take for mp3 (to be fair, mp3 is not optimized for speech)?
For test purposes, a big relay based device was used as the programmable pseudo-random number generator.
Here's the most amazing part to me: to provide a matching pseudo-random number to the receiving end, they made two copies of an analog white noise source with a phonograph record. One record was used at each end. They had to carefully sync the records (within 50 msec!) at each end to decode the transmission. Only two copies of any particular recording were made.
Though, getting the exact speed would not be hard, a synchronous motor does that. And if you get close with the start times, you could 'hunt' a bit with a test pattern which would jump from white noise to intelligible speech when you got it synced. But you'd be eating into the 12 minutes of recorded noise.