IS A DEPTH?
Recovering plain text from messages in depth is very time consumming. Before
to undertacking this task we must be sure that the messages are in depth.
Indeed, many messages with similar indicator groups, are not genuine messages
in depth at all, but are messages with the same text which were enciphered
twice in succession, since in the first encipherment a mistake was made in
the setting of the message. For this reason the read addressee could not
decipher the message, sent a query back, and then the message was sent again
with the correct encipherment. Often, it is not possible to decide in advance
whether these messages are in depth or not.
In TICOM DF-120 document, German cryptanalysts describe several methods which
can help to decide whether two messages are in depth or not.
Note: Most of the following text is an excerpt of the TICOM DF-120 document.
1. Trigraph Differences
From solved M-209 messages from Africa, out of 10,000 letters the 300
most frequent clear text trigraphs were counted (taking into account the
separator letter "Z") and the clear text found correspond to the negative
cipher intervals in messages in depth. The triple number (Zahlentripel)
which arose was weighted with values built up simply from the product of the
frequencies of their occurences, since the probability that two definite
clear text trigraphs will appear over each other is equal to the product
of the frequency of the percentage of their occurence.
From these differences a table was prepared which indicates for us the
corresponding weight for every possible zahlentripel.
b) The Method
One now finds each interval triple which appears in the two cipher messages,
in the upper as well as in the lower message, and adds up the weights which
apply to them.
The sum is then multipled by 100 and divided by twice the number of positions
being investigated. ...
The final result then gives us a point of departure for telling whether the
messages being investigated are in depth or not. Experience has show that
values of 50,000 or more make depth probable. Values under 40,000 speak
against depth, and with numbers in between 40,000 and 50,000 one can count
on either possibility.
Note: There is no a complete example in the TICOM document.
I tried to reconstitute one.
I created a text of 10,000 characters from a Gutemberg's projet text.
I calculated trigraphs frequency of the most 300 frequent ones.
Then, I calculated the zahlentripel table;
Here an excerpt of this table:
! zahlentripel ! weight !
! 13-16-14 ! 480 !
! .... ! ... !
! 14-14-16 ! 234 !
! 14-14-17 ! 660 !
! 14-14-18 ! 1430 !
! 14-14-19 ! 742 !
! .... ! ... !
! 16-14-14 ! 672 !
! .... ! ... !
For example, calculation of the 14-14-18 entry:
- Frequency of the "NGZ" trigram: 55
- Frequency of the "ZSH" trigram: 26
- Difference between NGZ and ZSH: 14-14-18
- Weight of this difference: 55*26 = 1430
Note: we store only the most high values.
Here are three messages (we have removed the indicators). Are they in depth?
RWRAS XQDUE DVDSO ANZQC HAKPQ IZFIG DDNNA SMTVY
SHWYC RGXSU UVPFE KHMEZ PNKMU HBZAY VJYST GFBSN OGFHV DJBFV
PBWCR EXEBP UTBJB HTQEN BFEQW GUXKB OVPTH WMMDQ TBMZK WRRHY
LORIA SQLRT SSIBZ ASZJT EPKBN ZRWLL MRBBG MGUHM IARWY NCCQB
BSXND GVSLY JIFGR CSWLQ UEYYG PWFYD BMSIF LOVPR GRCXX EUFDN
JLLDN HKGMD PKRUJ CKGFZ OKCIT XWKXS YHTKV LAQZU MIDVR DHJGM
EGDMC PIBPC MSQKN JMISB KVPJE YKFUG XXKFL MRHHQ
SKXFO GZSNM UHHOQ VSMNV JEQOY TNWWQ RGLFF XKVSW GSKQX EWBXR
OSAMA VBSMH GBOES RTZXX XJLYX CHILK OQGIN YYSWU BTBGE ZAHSQ
GFLEB WVDMN GXXMZ MBIRA DVPIV FOBAJ MLKTA PLYKH YPSBQ LCZCF
PLXYF YBSHW ZSTTI LGPYG FGHJM ZKGGD BJNGO JTTAR WRFSK NOGDR
AYCYL UIIQS ULLTN WKPBO XTFMR PGEDX GXDNW GZNNU CUMBG SDWRM
JYIHY HAWBW OXWMM GGGOD LYQNM VIHLX REETK YNWOK
NXJZY HFMVY KXZNE UHCBZ HASWE AAVNI QRQMH YZDXK RODAB QJKPM
DPXJM BGKVZ UWEUG FHYMU GMLGV DZKAZ QMIJG BPAMJ ITRDR PIEKI
QPNSO BFYRW NNEBR PGCPG KFQIU TDNYI PIPMY MWCJI GKODX MZYUK
LGXFI WMLQH CGGBE HSUUU PQHER DAFOV CJZKV JJELY MLJLA ZLFRY
JFZXB JKKAI FPFJJ GPKCQ RIZQJ SIFVJ AWNOX XEWVQ FOMEA FGZUD
Begining of the calculation from the two first messages:
EGD => Difference (RWR - EGD) = 13-16-14, weight = 480
GDM => Difference (WRA - GDM) = 16-14-14, weight = 672
DMC => Difference (RAS - DMC) = 14-14-16, weight = 234
SDW => Difference (DHJ - SDW) = 11-04-13, weight = 756
DWR => Difference (HJG - DWR) = 04-13-15, weight = 935
WRM => Difference (JGM - WRM) = 13-15-00, weight = 4611
Sum = (480 + 672 + 234 + .... + 756 + 935 + 4611)
Z = (Sum * 100) / (2 * length) = (392503 * 100) / (2*290) = 67,673
Final results (with 290 letters in superposition):
- Messages 1 and 2: 67,673 (they are effectively in depth)
- Messages 1 and 3: 43,332
- Messages 2 and 3: 44,822
But if I only take the first 70 letters:
- Messages 1 and 2: 75,419 (they are effectively in depth)
- Messages 1 and 3: 32,748
- Messages 2 and 3: 37,257
Finally, if I only take the first 50 letters:
- Messages 1 and 2: 77,296 (they are effectively in depth)
- Messages 1 and 3: 41,679
- Messages 2 and 3: 30,440
Remark: The German Cryptanalysts doesn't transform value by logarithm
function. Normally, when we multiply probalities we add logarithms.
2. Other Criteria
a) Number of Doublets
The doublet frequency of two English clear texts when a word separator letter
is used is about 8%. In the cipher text of messages in depth, however, doublets
occur only when there also are doublets at these positions in the basic clear
texts. A doublet frequency of between 6 and 8 percent in our cipher text,
therefore, signifies depth.
This doublet criterion is not valid for messages which are offset by one in
their clear text...
b) Different Length and Different Enipherment Times
With the same length and the same encipherment time there is always the
suspicion that we are dealing with the same text in both messages and that
it was only sent on time wrong and on time correctly. Here we obtain, by
means of our trigraph differences, a starting point for depth.
3. Other methods not used by German cryptanalysts
IC (Index of Coincidence) is a technique invented by W.F. Friedman of putting
two texts side-by-side and counting the number of times that identical letters
appear in the same position in both texts. If the texts are in depth, the IC
is characteristic of the language. If the texts aren't in depth, the IC is
near by 0.0385
- Message 1 & 2: 0.076
- Message 1 & 3: 0.051
- Message 2 & 3: 0.028
Remark: The IC measure was used by Germans but to find depths for code problems,
not for M-209 problems. They used Hollerith tabulation machines to do that.
Bambursimus uses a mathematical measure to decide if two messages are in depth.
This measure is more accurate than IC. It was invented by Alan Turing.
The Bayes Factor for M monograms in an overlap of N is therefore:
BF = ((P ** M) * ((1-P)**(N-M))) / (((1./26.)**M)*((25./26.)**(N-M)))
In deci BAN :
BFdB = int ( 20 * log (BF ))
P is the probability of single character matching (it is equivalent to IC [Index of Coincidence]).
In German, P = 0.0762
In English, P = 0.0667
In French, P = 0.0778
In Spanish, P = 0.0770
(1 – P ) is the inverse probability (probability of no matching)
M is the number of coincidences
N is the length of superimposition
The presence of bigrams and trigrams are considered as bonus.
Final BFdb results (with 290 letters in superposition):
- Messages 1 and 2: 82 with 1 bigram (they are effectively in depth)
- Messages 1 and 3: 0
- Messages 2 and 3: -84