# Homework Help: Stuck on cryptology problem(vigenere cipher)

1. Mar 29, 2005

I was assigned to decrypt this cipher text:

EWPW DRAPO MGTGT EXVAH RPSWI LSFUN RZVSE FOIWJ
NYOU MHSER EFBMB VANHA LOOXI VPBIF FIFMA BFGRH
GARE TLKLF NZQXQ DWGKP UWNWQ QFRZL VROOP UGBQQ
HVKS CDOPW LSGTE JMFSE MLZMD TNDED VVGRO UUMLV
NLHA KWASO ITAPR DTBBG CHDSH TNSFM NGWMF CASWM
WGKD RWJRN UNDVV SFFAE TAGUF HLAUC AETLB MHVAN
WJHU QUQQL SQETD BWGBR APMJW PM

the kasiski test is of no real help; the frequencies are to close together.

highest frequencies of 2 and 3 letter substrings
(higher substrings have almost no discrepancies of frequency):

et: 0.01544402 (132,12,20)
aet: 0.007751938 = 12

kasiski yields 4 as gcd, which suggests 4 as the keylength.

Index of Coincidence is 0.04110484 ; this value is much closer to 0.038 than 0.065, which suggests that the encipherment scheme is polyalphabetic(confirmed by professor to be vigenere).

using the friedman test yields:
(.027*260)/((259*.04110484)-(.038*260+.065)) = 10.01

suggesting that the keylength is 10, which is definately a better guess than 4(our professor supplied the hint that the keyword was of length 7,8,9 or 10).

going on the guess that the keylength is 10, i split the cipher text into 10 subsequences:

egpzybofrxwosedokdtcwflhqg
wtsvomxmeqqpcjtuwtnajfavqb
pgwsubiatdqudmnuabssraualr
wtiemvvblwfgofdmsbfwnecnsa
delfhapfkgrbpselogmmutawqp
rxsosnbglkzqwedvicnwnaejem
avfiehirfplqlmvnthggdgthtj
ohnjelfgzwrvgzghpsmdvfbqbp
mrrnfoiaqnoktmrarhfrshmuwm

with respective ic's of:
0.02461538
0.03384615
0.05538461
0.03692308
0.02769231
0.02461538
0.04
0.06153846
0.02769231
0.06461538

which isn't very much help, as only 3 of the 10 are closer to .065 than .038. supposing that the keylength guess of 10 was incorrect, i split the message into subsequences of 7,8 and 9, and calculated their ic's only to be equally disappointed in the results.

now, going by the results of the friedman test, i stick with a keylength of 10 and use frequency analysis on the results of the split, starting with column ten as its ic was closest to .065 than the others, and i figured it would be easiest.

this netted frequency results of:
r: 0.1923077
m: 0.1538462
a: 0.07692308
f: 0.07692308
h: 0.07692308
n: 0.07692308
o: 0.07692308
i: 0.03846154
k: 0.03846154
q: 0.03846154
s: 0.03846154
t: 0.03846154
u: 0.03846154
w: 0.03846154

suggesting that e mapped to r, and t mapped to m. however, this implies a shift of 13 and 19 respectively, which is inconsistent. i play around with the most commonly used letters(e,t,a,i,n), and i find a shift that looks possible:4.

n to r = 4
i to m = 4
e to a = 4
b to f = 4
d to h = 4
j to n = 4
k to o = 4
a to w = 4

so e mapps to a and presumably the last letter of the keyword is w. trying similar methods on the other subsequences doesn't yield any satisfactory results, and i'm stuck. i've tried asking other people in my class, but none have gotten any farther than i! my professor said this was a somewhat difficult problem and i'm getting fustrated. i'm thinking about writing a brute force program; try every possible combination of ten characters for the keyword - ignoring keywords with repeated letters - and using those keywords to decipher the message and searching the output for the most common digraphs and trigraphs. i'm not exactly sure how feasible this is (26^10 possible combinations?) but i'm going crazy here.