Cracking a monoalphabetic substitution cipher

  • Thread starter Thread starter Bipolarity
  • Start date Start date
  • Tags Tags
    Substitution
Click For Summary
SUMMARY

This discussion centers on breaking a monoalphabetic substitution cipher using a ciphertext of 244 uppercase characters. The user has calculated the frequency distribution of letters, identifying 'F' as the most common with a tally of 37. To enhance the decryption process, the user seeks assistance with implementing a chi-square test to compare observed and expected letter frequencies, a method suggested by cryptanalysis literature. Progress has been made in establishing some letter mappings, specifically E to F and T to Q.

PREREQUISITES
  • Understanding of monoalphabetic substitution ciphers
  • Familiarity with letter frequency analysis
  • Basic knowledge of statistical methods, specifically chi-square tests
  • Experience with cryptanalysis techniques
NEXT STEPS
  • Learn how to perform a chi-square test for letter frequency analysis
  • Study the principles of Markov chains in language processing
  • Explore tools for automating frequency analysis in cryptography
  • Investigate historical methods used in breaking monoalphabetic ciphers
USEFUL FOR

This discussion is beneficial for cryptographers, students of cryptanalysis, and anyone interested in the statistical methods used to decode substitution ciphers.

Bipolarity
Messages
773
Reaction score
2
I am trying to break a harmless ciphertext that uses a monoalphabetic substitution cipher.
The ciphertext is exactly 244 characters long, without any spaces between words. It consists only of uppercase letters.

ciphertext = "JGRMQOYGHMVBJWRWQFPWHGFFDQGFPFZRKBEEBJIZQQOCIBZKLFAFGQVFZFWWEOGWOPFGFHWOLPHLRLOLFDMFGQWBLWBWQOLKFWBYLBLYLFSFLJGRMQBOLWJVFPFWQVHQWFFPQOQVFPQOCFPOGFWFJIGFQVHLHLROQVFGWJVFPFOLFHGQVQVFILEOGQILHQFQGIQVVOSFAFGBWQVHQWIJVWJVFPFWHGFIWIHZZRQGBABHZQOCGFHX"

I have come up with the probability distribution of the letters. The ratio indicates the probability of that letter.

Letter: F Tally: 37 Ratio: 15.163934426229508
Letter: Q Tally: 26 Ratio: 10.655737704918032
Letter: W Tally: 21 Ratio: 8.60655737704918
Letter: G Tally: 19 Ratio: 7.786885245901639
Letter: L Tally: 17 Ratio: 6.967213114754098
Letter: O Tally: 16 Ratio: 6.557377049180328
Letter: V Tally: 15 Ratio: 6.147540983606557
Letter: H Tally: 14 Ratio: 5.737704918032787
Letter: B Tally: 12 Ratio: 4.918032786885246
Letter: P Tally: 10 Ratio: 4.098360655737705
Letter: I Tally: 9 Ratio: 3.6885245901639343
Letter: J Tally: 9 Ratio: 3.6885245901639343
Letter: R Tally: 7 Ratio: 2.8688524590163933
Letter: Z Tally: 7 Ratio: 2.8688524590163933
Letter: E Tally: 4 Ratio: 1.639344262295082
Letter: M Tally: 4 Ratio: 1.639344262295082
Letter: A Tally: 3 Ratio: 1.2295081967213115
Letter: C Tally: 3 Ratio: 1.2295081967213115
Letter: K Tally: 3 Ratio: 1.2295081967213115
Letter: Y Tally: 3 Ratio: 1.2295081967213115
Letter: D Tally: 2 Ratio: 0.819672131147541
Letter: S Tally: 2 Ratio: 0.819672131147541
Letter: X Tally: 1 Ratio: 0.4098360655737705
Letter: N Tally: 0 Ratio: 0.0
Letter: U Tally: 0 Ratio: 0.0
Letter: T Tally: 0 Ratio: 0.0

Since I haven't taken much statistics, I'm not sure how I would set up a chi-square test to solve this problem but my cryptanalysis text says that a program using a chi-square test would be essential to solve this problem in the most efficient way possible.

Perhaps someone could help me with the chi-square? Or perhaps someone could help me with a few letters of the key using their knowledge of English?

I don't necessarily care about solving the problem efficiently, I would just like to know what the cipher text comes to.

Progress so far:
E --> F --> E
T --> Q --> T
H --> G --> H

Thanks!

BiP
 
Last edited:
Physics news on Phys.org
Hey BiPolarity.

For a chi-square test, you have an expected distribution (which will be the expected distribution of letters corresponding to frequencies which is something that Claude Shannon was looking at) and an observed.

You calculate the test statistic by summing [(Oi - Ei)]^2/Ei and then calculate the probability value corresponding to where this estimate lies on a Chi-Squared n-1 distribution where n is the number of entries in the PDF (so in this case n = 26).

If the probability is too small (usually we test at a level of 0.05), then we reject this. However, you need to take into account the context of your problem.

The other thing is that language by its nature is markovian which means that conditional probabilities have just as much, (if not more of an importance) than the non-conditional data.
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 52 ·
2
Replies
52
Views
6K
  • · Replies 1 ·
Replies
1
Views
4K
Replies
8
Views
6K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
15
Views
2K