# One-Time Pad and Frequency Analysis

• I
Summary:
frequency analysis as a weakness of one-time pad
My lecturer said that the cryptosystem one-time pad, has a weakness which is when it is subject to frequency analysis. But after him trying to explain why that is a weakness of this system I am still unable to see why. Because the frequency of letters is completely irrelevant to the structure of the actual message right? Since a row of 10 a's could correspond to 10 different letters, doesn't that imply that frequency analysis offers no assistance when trying to crack one-time pad?

Office_Shredder
Staff Emeritus
Gold Member
I agree with you on this.

Haku
Staff Emeritus
If the one-time pad is shorter than the message, it is vulnerable to frequency analysis. For the same reason, if a pad is reused, even in part, it is vulnerable to frequency analysis. Otherwise it is not.

Klystron and jedishrfu
This is what my lecturer said:

"You know 'E' is one of the most common letters in the English language, and the frequency of each letter is known. This refers to the frequency occurring in both the message to be encoded and the secret key. You are right in that the letter 'E' in the message could be encoded using any letter in the secret key, but it is most likely to be encoded using the letter 'E'. So the most frequent occurrence in the cryptotext would be 'E' encoded using 'E'. Similarly, you can compute the frequency that any letter is encoded using any other letter, and use this to get an estimate of the total frequency of any letter occurring in the cryptotext. Of course, you would need a large message to do this effectively, but the general idea still holds."

I don't think that this implies one-time pad would be vulnerable to frequency analysis though right?

This is another explanation I got, can someone please explain this to me?
"
The idea is that you can calculate the frequency of any letter appearing in the ciphertext. For example, the letter A could have been encoded as A+A, or B+Z, or C+Y, or D+X, or E+W and so on. So, based on the frequency of letters in English, you can calculate the probability of each of these cases occurring and hence the total frequency of A appearing in the ciphertext.

If you do then get A appearing a lot in the ciphertext (more than expected), then you could assume some proportion of these are coming from the encoding E+W (or whichever pair has a high frequency of occuring). It's pretty fiddly, but the idea is that you can compute these frequencies and with sufficiently long messages you can use some form of trial and error to estimate with high probability which letters are appearing in the message/secret key (and this is much better than just blind guessing)."

Update: It is vulnerable to frequency analysis because the combinations of letters has some probability, E+E being the greatest. So do the rest of the possible combinations, therefore you could expect to see some patterns in the long run. This is how I understand it now, is this correct?

Office_Shredder
Staff Emeritus
Gold Member
No, E+E does not have the greatest probability of appearing. If the one time pad is done correctly, E+A,...,E+Z are all equally likely transformations of the character E.

If the one time pad is like, an actual english paragraph, then yes it is vulnerable to attack. But a proper one time pad is a purely random string of characters

No, E+E does not have the greatest probability of appearing. If the one time pad is done correctly, E+A,...,E+Z are all equally likely transformations of the character E.

If the one time pad is like, an actual english paragraph, then yes it is vulnerable to attack. But a proper one time pad is a purely random string of characters
Im my course they say that a book is a good example of the key, but that would not be a random string of characters would it? Is the key meant to be a completely randomly generated string of letters? Then you sum the corresponding numerical values together mod 26 and it gives you a number which you then encode as a letter correct?

Office_Shredder
Staff Emeritus
Gold Member
Yes, a book is actually a bad example of a one time pad. I agree if you use a book then you are subject to a frequency analysis attack.

The best in class implementation generates completely random keys, then after you use the key once you throw it away. The hard part here is distributing the keys without anyone intercepting, since if you had a way of securely transmitting it then you already had a way of securely transmitting your message.

Throwing away the key might also be challenging, since you will need to destroy it in a way that it cannot be recovered.

Also I think there's at least one real world example where one time pad were not generated sufficiently randomly and hence were cracked.

Haku
Nugatory
Mentor
This is what my lecturer said:

"You know 'E' is one of the most common letters in the English language, and the frequency of each letter is known. This refers to the frequency occurring in both the message to be encoded and the secret key."
This is absolute nonsense. In a one-time-pad cryptosystem, the keys are generated randomly, not taken from some snippet of English-language text. There is some possibility that you have misunderstood your lecturer, and they were trying to explain why anything less than completely random key generation will lead to a vulnerability.

This is absolute nonsense. In a one-time-pad cryptosystem, the keys are generated randomly, not taken from some snippet of English-language text. There is some possibility that you have misunderstood your lecturer, and they were trying to explain why anything less than completely random key generation will lead to a vulnerability.
Nah, for some reason in this course they have taught it as if they keys are books or something similar. That is where the confusion was, I didn't realise that they taught it as if the key was taken to be some english-language text.

f95toli
Gold Member
Nah, for some reason in this course they have taught it as if they keys are books or something similar. That is where the confusion was, I didn't realise that they taught it as if the key was taken to be some english-language text.
Well, from purely practical point of view using a book is probably not a bad solution if your message is short enough. As long as it is a popular book (Say "Moby Dick") that is widely available in English (or some other language) it nicely solves the problem of how to share the key.
Hence, I suspect using a book is one of the more common implementations of one pad crypto. It is just not very secure if you have a long message.

Office_Shredder
Staff Emeritus
Gold Member
Well, from purely practical point of view using a book is probably not a bad solution if your message is short enough. As long as it is a popular book (Say "Moby Dick") that is widely available in English (or some other language) it nicely solves the problem of how to share the key.
Hence, I suspect using a book is one of the more common implementations of one pad crypto. It is just not very secure if you have a long message.

Common implementations between who? Casual friends sending oto encrypted messages for fun? Any professional organization using a one time pad is not going to do this.

f95toli
Gold Member
Common implementations between who? Casual friends sending oto encrypted messages for fun? Any professional organization using a one time pad is not going to do this.

Well, I was mainly thinking of the former.
I don't know if there are any actual examples of "professional" organisations using a OTP based on books; although it must have at least been considered during say WW2 or the early part of the cold war.

Truly "random" OTP were (are?) used for e.g. the number stations but the problem is of course that you still need to share the OTP somehow and being in possession of a OTP would be almost impossible to explain if you are caught.

Office_Shredder
Staff Emeritus
Gold Member