Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Cracking a monoalphabetic substitution cipher

  1. Aug 12, 2012 #1
    I am trying to break a harmless ciphertext that uses a monoalphabetic substitution cipher.
    The ciphertext is exactly 244 characters long, without any spaces between words. It consists only of uppercase letters.

    ciphertext = "JGRMQOYGHMVBJWRWQFPWHGFFDQGFPFZRKBEEBJIZQQOCIBZKLFAFGQVFZFWWEOGWOPFGFHWOLPHLRLOLFDMFGQWBLWBWQOLKFWBYLBLYLFSFLJGRMQBOLWJVFPFWQVHQWFFPQOQVFPQOCFPOGFWFJIGFQVHLHLROQVFGWJVFPFOLFHGQVQVFILEOGQILHQFQGIQVVOSFAFGBWQVHQWIJVWJVFPFWHGFIWIHZZRQGBABHZQOCGFHX"

    I have come up with the probability distribution of the letters. The ratio indicates the probability of that letter.

    Letter: F Tally: 37 Ratio: 15.163934426229508
    Letter: Q Tally: 26 Ratio: 10.655737704918032
    Letter: W Tally: 21 Ratio: 8.60655737704918
    Letter: G Tally: 19 Ratio: 7.786885245901639
    Letter: L Tally: 17 Ratio: 6.967213114754098
    Letter: O Tally: 16 Ratio: 6.557377049180328
    Letter: V Tally: 15 Ratio: 6.147540983606557
    Letter: H Tally: 14 Ratio: 5.737704918032787
    Letter: B Tally: 12 Ratio: 4.918032786885246
    Letter: P Tally: 10 Ratio: 4.098360655737705
    Letter: I Tally: 9 Ratio: 3.6885245901639343
    Letter: J Tally: 9 Ratio: 3.6885245901639343
    Letter: R Tally: 7 Ratio: 2.8688524590163933
    Letter: Z Tally: 7 Ratio: 2.8688524590163933
    Letter: E Tally: 4 Ratio: 1.639344262295082
    Letter: M Tally: 4 Ratio: 1.639344262295082
    Letter: A Tally: 3 Ratio: 1.2295081967213115
    Letter: C Tally: 3 Ratio: 1.2295081967213115
    Letter: K Tally: 3 Ratio: 1.2295081967213115
    Letter: Y Tally: 3 Ratio: 1.2295081967213115
    Letter: D Tally: 2 Ratio: 0.819672131147541
    Letter: S Tally: 2 Ratio: 0.819672131147541
    Letter: X Tally: 1 Ratio: 0.4098360655737705
    Letter: N Tally: 0 Ratio: 0.0
    Letter: U Tally: 0 Ratio: 0.0
    Letter: T Tally: 0 Ratio: 0.0

    Since I haven't taken much statistics, I'm not sure how I would set up a chi-square test to solve this problem but my cryptanalysis text says that a program using a chi-square test would be essential to solve this problem in the most efficient way possible.

    Perhaps someone could help me with the chi-square? Or perhaps someone could help me with a few letters of the key using their knowledge of English?

    I don't necessarily care about solving the problem efficiently, I would just like to know what the cipher text comes to.

    Progress so far:
    E --> F --> E
    T --> Q --> T
    H --> G --> H

    Thanks!

    BiP
     
    Last edited: Aug 12, 2012
  2. jcsd
  3. Aug 12, 2012 #2

    chiro

    User Avatar
    Science Advisor

    Hey BiPolarity.

    For a chi-square test, you have an expected distribution (which will be the expected distribution of letters corresponding to frequencies which is something that Claude Shannon was looking at) and an observed.

    You calculate the test statistic by summing [(Oi - Ei)]^2/Ei and then calculate the probability value corresponding to where this estimate lies on a Chi-Squared n-1 distribution where n is the number of entries in the PDF (so in this case n = 26).

    If the probability is too small (usually we test at a level of 0.05), then we reject this. However, you need to take into account the context of your problem.

    The other thing is that language by its nature is markovian which means that conditional probabilities have just as much, (if not more of an importance) than the non-conditional data.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Cracking a monoalphabetic substitution cipher
Loading...