Cracking a monoalphabetic substitution cipher

In summary, the conversation is about breaking a ciphertext using a monoalphabetic substitution cipher. The ciphertext is 244 characters long and consists of only uppercase letters. The speaker has identified the probability distribution of the letters and has made some progress in identifying a few letters of the key. They also mention the use of a chi-square test for efficient solving, but note the importance of considering the context of the problem and the fact that language is markovian.
  • #1
Bipolarity
776
2
I am trying to break a harmless ciphertext that uses a monoalphabetic substitution cipher.
The ciphertext is exactly 244 characters long, without any spaces between words. It consists only of uppercase letters.

ciphertext = "JGRMQOYGHMVBJWRWQFPWHGFFDQGFPFZRKBEEBJIZQQOCIBZKLFAFGQVFZFWWEOGWOPFGFHWOLPHLRLOLFDMFGQWBLWBWQOLKFWBYLBLYLFSFLJGRMQBOLWJVFPFWQVHQWFFPQOQVFPQOCFPOGFWFJIGFQVHLHLROQVFGWJVFPFOLFHGQVQVFILEOGQILHQFQGIQVVOSFAFGBWQVHQWIJVWJVFPFWHGFIWIHZZRQGBABHZQOCGFHX"

I have come up with the probability distribution of the letters. The ratio indicates the probability of that letter.

Letter: F Tally: 37 Ratio: 15.163934426229508
Letter: Q Tally: 26 Ratio: 10.655737704918032
Letter: W Tally: 21 Ratio: 8.60655737704918
Letter: G Tally: 19 Ratio: 7.786885245901639
Letter: L Tally: 17 Ratio: 6.967213114754098
Letter: O Tally: 16 Ratio: 6.557377049180328
Letter: V Tally: 15 Ratio: 6.147540983606557
Letter: H Tally: 14 Ratio: 5.737704918032787
Letter: B Tally: 12 Ratio: 4.918032786885246
Letter: P Tally: 10 Ratio: 4.098360655737705
Letter: I Tally: 9 Ratio: 3.6885245901639343
Letter: J Tally: 9 Ratio: 3.6885245901639343
Letter: R Tally: 7 Ratio: 2.8688524590163933
Letter: Z Tally: 7 Ratio: 2.8688524590163933
Letter: E Tally: 4 Ratio: 1.639344262295082
Letter: M Tally: 4 Ratio: 1.639344262295082
Letter: A Tally: 3 Ratio: 1.2295081967213115
Letter: C Tally: 3 Ratio: 1.2295081967213115
Letter: K Tally: 3 Ratio: 1.2295081967213115
Letter: Y Tally: 3 Ratio: 1.2295081967213115
Letter: D Tally: 2 Ratio: 0.819672131147541
Letter: S Tally: 2 Ratio: 0.819672131147541
Letter: X Tally: 1 Ratio: 0.4098360655737705
Letter: N Tally: 0 Ratio: 0.0
Letter: U Tally: 0 Ratio: 0.0
Letter: T Tally: 0 Ratio: 0.0

Since I haven't taken much statistics, I'm not sure how I would set up a chi-square test to solve this problem but my cryptanalysis text says that a program using a chi-square test would be essential to solve this problem in the most efficient way possible.

Perhaps someone could help me with the chi-square? Or perhaps someone could help me with a few letters of the key using their knowledge of English?

I don't necessarily care about solving the problem efficiently, I would just like to know what the cipher text comes to.

Progress so far:
E --> F --> E
T --> Q --> T
H --> G --> H

Thanks!

BiP
 
Last edited:
Physics news on Phys.org
  • #2
Hey BiPolarity.

For a chi-square test, you have an expected distribution (which will be the expected distribution of letters corresponding to frequencies which is something that Claude Shannon was looking at) and an observed.

You calculate the test statistic by summing [(Oi - Ei)]^2/Ei and then calculate the probability value corresponding to where this estimate lies on a Chi-Squared n-1 distribution where n is the number of entries in the PDF (so in this case n = 26).

If the probability is too small (usually we test at a level of 0.05), then we reject this. However, you need to take into account the context of your problem.

The other thing is that language by its nature is markovian which means that conditional probabilities have just as much, (if not more of an importance) than the non-conditional data.
 
Back
Top