IS A DEPTH? =========== Introduction ============ Recovering plain text from messages in depth is very time consumming. Before to undertacking this task we must be sure that the messages are in depth. Indeed, many messages with similar indicator groups, are not genuine messages in depth at all, but are messages with the same text which were enciphered twice in succession, since in the first encipherment a mistake was made in the setting of the message. For this reason the read addressee could not decipher the message, sent a query back, and then the message was sent again with the correct encipherment. Often, it is not possible to decide in advance whether these messages are in depth or not. In TICOM DF-120 document, German cryptanalysts describe several methods which can help to decide whether two messages are in depth or not. Note: Most of the following text is an excerpt of the TICOM DF-120 document. 1. Trigraph Differences ======================= a) Statistics ------------- From solved M-209 messages from Africa, out of 10,000 letters the 300 most frequent clear text trigraphs were counted (taking into account the separator letter "Z") and the clear text found correspond to the negative cipher intervals in messages in depth. The triple number (Zahlentripel) which arose was weighted with values built up simply from the product of the frequencies of their occurences, since the probability that two definite clear text trigraphs will appear over each other is equal to the product of the frequency of the percentage of their occurence. From these differences a table was prepared which indicates for us the corresponding weight for every possible zahlentripel. b) The Method ------------- One now finds each interval triple which appears in the two cipher messages, in the upper as well as in the lower message, and adds up the weights which apply to them. The sum is then multipled by 100 and divided by twice the number of positions being investigated. ... The final result then gives us a point of departure for telling whether the messages being investigated are in depth or not. Experience has show that values of 50,000 or more make depth probable. Values under 40,000 speak against depth, and with numbers in between 40,000 and 50,000 one can count on either possibility. c) Example Note: There is no a complete example in the TICOM document. I tried to reconstitute one. I created a text of 10,000 characters from a Gutemberg's projet text. I calculated trigraphs frequency of the most 300 frequent ones. Then, I calculated the zahlentripel table; Here an excerpt of this table: ========================== ! zahlentripel ! weight ! ========================== ! 13-16-14 ! 480 ! ! .... ! ... ! ! 14-14-16 ! 234 ! ! 14-14-17 ! 660 ! ! 14-14-18 ! 1430 ! ! 14-14-19 ! 742 ! ! .... ! ... ! ! 16-14-14 ! 672 ! ! .... ! ... ! For example, calculation of the 14-14-18 entry: - Frequency of the "NGZ" trigram: 55 - Frequency of the "ZSH" trigram: 26 - Difference between NGZ and ZSH: 14-14-18 - Weight of this difference: 55*26 = 1430 Note: we store only the most high values. Here are three messages (we have removed the indicators). Are they in depth? Msg1: RWRAS XQDUE DVDSO ANZQC HAKPQ IZFIG DDNNA SMTVY SHWYC RGXSU UVPFE KHMEZ PNKMU HBZAY VJYST GFBSN OGFHV DJBFV PBWCR EXEBP UTBJB HTQEN BFEQW GUXKB OVPTH WMMDQ TBMZK WRRHY LORIA SQLRT SSIBZ ASZJT EPKBN ZRWLL MRBBG MGUHM IARWY NCCQB BSXND GVSLY JIFGR CSWLQ UEYYG PWFYD BMSIF LOVPR GRCXX EUFDN JLLDN HKGMD PKRUJ CKGFZ OKCIT XWKXS YHTKV LAQZU MIDVR DHJGM Msg2: EGDMC PIBPC MSQKN JMISB KVPJE YKFUG XXKFL MRHHQ SKXFO GZSNM UHHOQ VSMNV JEQOY TNWWQ RGLFF XKVSW GSKQX EWBXR OSAMA VBSMH GBOES RTZXX XJLYX CHILK OQGIN YYSWU BTBGE ZAHSQ GFLEB WVDMN GXXMZ MBIRA DVPIV FOBAJ MLKTA PLYKH YPSBQ LCZCF PLXYF YBSHW ZSTTI LGPYG FGHJM ZKGGD BJNGO JTTAR WRFSK NOGDR AYCYL UIIQS ULLTN WKPBO XTFMR PGEDX GXDNW GZNNU CUMBG SDWRM Msg3: JYIHY HAWBW OXWMM GGGOD LYQNM VIHLX REETK YNWOK NXJZY HFMVY KXZNE UHCBZ HASWE AAVNI QRQMH YZDXK RODAB QJKPM DPXJM BGKVZ UWEUG FHYMU GMLGV DZKAZ QMIJG BPAMJ ITRDR PIEKI QPNSO BFYRW NNEBR PGCPG KFQIU TDNYI PIPMY MWCJI GKODX MZYUK LGXFI WMLQH CGGBE HSUUU PQHER DAFOV CJZKV JJELY MLJLA ZLFRY JFZXB JKKAI FPFJJ GPKCQ RIZQJ SIFVJ AWNOX XEWVQ FOMEA FGZUD Begining of the calculation from the two first messages: RWRAS....DHJGM RWR DHJ WRA HJG RAS JGM EGDMC....SDWRM EGD => Difference (RWR - EGD) = 13-16-14, weight = 480 GDM => Difference (WRA - GDM) = 16-14-14, weight = 672 DMC => Difference (RAS - DMC) = 14-14-16, weight = 234 SDW => Difference (DHJ - SDW) = 11-04-13, weight = 756 DWR => Difference (HJG - DWR) = 04-13-15, weight = 935 WRM => Difference (JGM - WRM) = 13-15-00, weight = 4611 Sum = (480 + 672 + 234 + .... + 756 + 935 + 4611) Z = (Sum * 100) / (2 * length) = (392503 * 100) / (2*290) = 67,673 Final results (with 290 letters in superposition): - Messages 1 and 2: 67,673 (they are effectively in depth) - Messages 1 and 3: 43,332 - Messages 2 and 3: 44,822 But if I only take the first 70 letters: - Messages 1 and 2: 75,419 (they are effectively in depth) - Messages 1 and 3: 32,748 - Messages 2 and 3: 37,257 Finally, if I only take the first 50 letters: - Messages 1 and 2: 77,296 (they are effectively in depth) - Messages 1 and 3: 41,679 - Messages 2 and 3: 30,440 Remark: The German Cryptanalysts doesn't transform value by logarithm function. Normally, when we multiply probalities we add logarithms. 2. Other Criteria ================= a) Number of Doublets --------------------- The doublet frequency of two English clear texts when a word separator letter is used is about 8%. In the cipher text of messages in depth, however, doublets occur only when there also are doublets at these positions in the basic clear texts. A doublet frequency of between 6 and 8 percent in our cipher text, therefore, signifies depth. This doublet criterion is not valid for messages which are offset by one in their clear text... b) Different Length and Different Enipherment Times --------------------------------------------------- With the same length and the same encipherment time there is always the suspicion that we are dealing with the same text in both messages and that it was only sent on time wrong and on time correctly. Here we obtain, by means of our trigraph differences, a starting point for depth. 3. Other methods not used by German cryptanalysts ================================================= a) IC --------------------------- IC (Index of Coincidence) is a technique invented by W.F. Friedman of putting two texts side-by-side and counting the number of times that identical letters appear in the same position in both texts. If the texts are in depth, the IC is characteristic of the language. If the texts aren't in depth, the IC is near by 0.0385 IC: - Message 1 & 2: 0.076 - Message 1 & 3: 0.051 - Message 2 & 3: 0.028 Remark: The IC measure was used by Germans but to find depths for code problems, not for M-209 problems. They used Hollerith tabulation machines to do that. b) Bamburismus -------------- Bambursimus uses a mathematical measure to decide if two messages are in depth. This measure is more accurate than IC. It was invented by Alan Turing. The Bayes Factor for M monograms in an overlap of N is therefore: BF = ((P ** M) * ((1-P)**(N-M))) / ((1/26)**M)*((25/26)**(N-M)) In deci BAN : BFdB = int ( 20 * log (BF )) P is the probability of single character matching (it is equivalent to IC [Index of Coincidence]). In German, P = (1/17) = 0.074 (1 – P ) is the inverse probability (probability of no matching) M is the number of coincidences N is the length of superimposition The presence of bigrams and trigrams are considered as bonus. Final BFdb results (with 290 letters in superposition): - Messages 1 and 2: 82 with 1 bigram (they are effectively in depth) - Messages 1 and 3: 0 - Messages 2 and 3: -84