IS A DEPTH?
===========

Introduction
============
Recovering plain text from messages in depth is very time consumming. Before
to undertacking this task we must be sure that the messages are in depth.

Indeed, many messages with similar indicator groups, are not genuine messages
in depth at all, but are messages with the same text which were enciphered 
twice in succession, since in the first encipherment a mistake was made in
the setting of the message. For this reason the read addressee could not 
decipher the message, sent a query back, and then the message was sent again
with the correct encipherment. Often, it is not possible to decide in advance
whether these messages are in depth or not.

In TICOM DF-120 document, German cryptanalysts describe several methods which 
can help to decide whether two messages are in depth or not.

Note: Most of the following text is an excerpt of the TICOM DF-120 document.

1. Trigraph Differences
=======================
a) Statistics
-------------
From solved M-209 messages from Africa, out of 10,000 letters the 300
most frequent clear text trigraphs were counted (taking into account the
separator letter "Z") and the clear text found correspond to the negative
cipher intervals in messages in depth. The triple number (Zahlentripel)
which arose was weighted with values built up simply from the product of the
frequencies of their occurences, since the probability that two definite
clear text trigraphs will appear over each other is equal to the product
of the frequency of the percentage of their occurence.

From these differences a table was prepared which indicates for us the 
corresponding weight for every possible zahlentripel. 

b) The Method
-------------
One now finds each interval triple which appears in the two cipher messages,
in the upper as well as in the lower message, and adds up the weights which
apply to them.

The sum is then multipled by 100 and divided by twice the number of positions
being investigated. ...

The final result then gives us a point of departure for telling whether the
messages being investigated are in depth or not. Experience has show that 
values of 50,000 or more make depth probable. Values under 40,000 speak
against depth, and with numbers in between 40,000 and 50,000 one can count
on either possibility.

c) Example 
Note: There is no a complete example in the TICOM document.
I tried to reconstitute one.

I created a text of 10,000 characters from a Gutemberg's projet text.
I calculated trigraphs frequency of the most 300 frequent ones.
Then, I calculated the zahlentripel table;

Here an excerpt of this table:
==========================
! zahlentripel ! weight  !
==========================
!   13-16-14   !   480   !
!   ....       !   ...   !
!   14-14-16   !   234   !
!   14-14-17   !   660   !
!   14-14-18   !  1430   !
!   14-14-19   !   742   !
!   ....       !   ...   !
!   16-14-14   !   672   !
!   ....       !   ...   !

For example, calculation of the 14-14-18 entry:
- Frequency of the "NGZ" trigram: 55
- Frequency of the "ZSH" trigram: 26
- Difference between NGZ and ZSH:  14-14-18
- Weight of this difference: 55*26 = 1430

Note: we store only the most high values.

Here are three messages (we have removed the indicators). Are they in depth? 

Msg1: 
            RWRAS XQDUE DVDSO ANZQC HAKPQ IZFIG DDNNA SMTVY
SHWYC RGXSU UVPFE KHMEZ PNKMU HBZAY VJYST GFBSN OGFHV DJBFV
PBWCR EXEBP UTBJB HTQEN BFEQW GUXKB OVPTH WMMDQ TBMZK WRRHY
LORIA SQLRT SSIBZ ASZJT EPKBN ZRWLL MRBBG MGUHM IARWY NCCQB
BSXND GVSLY JIFGR CSWLQ UEYYG PWFYD BMSIF LOVPR GRCXX EUFDN
JLLDN HKGMD PKRUJ CKGFZ OKCIT XWKXS YHTKV LAQZU MIDVR DHJGM 

Msg2:

            EGDMC PIBPC MSQKN JMISB KVPJE YKFUG XXKFL MRHHQ
SKXFO GZSNM UHHOQ VSMNV JEQOY TNWWQ RGLFF XKVSW GSKQX EWBXR
OSAMA VBSMH GBOES RTZXX XJLYX CHILK OQGIN YYSWU BTBGE ZAHSQ
GFLEB WVDMN GXXMZ MBIRA DVPIV FOBAJ MLKTA PLYKH YPSBQ LCZCF
PLXYF YBSHW ZSTTI LGPYG FGHJM ZKGGD BJNGO JTTAR WRFSK NOGDR
AYCYL UIIQS ULLTN WKPBO XTFMR PGEDX GXDNW GZNNU CUMBG SDWRM 


Msg3:
            JYIHY HAWBW OXWMM GGGOD LYQNM VIHLX REETK YNWOK
NXJZY HFMVY KXZNE UHCBZ HASWE AAVNI QRQMH YZDXK RODAB QJKPM
DPXJM BGKVZ UWEUG FHYMU GMLGV DZKAZ QMIJG BPAMJ ITRDR PIEKI
QPNSO BFYRW NNEBR PGCPG KFQIU TDNYI PIPMY MWCJI GKODX MZYUK
LGXFI WMLQH CGGBE HSUUU PQHER DAFOV CJZKV JJELY MLJLA ZLFRY
JFZXB JKKAI FPFJJ GPKCQ RIZQJ SIFVJ AWNOX XEWVQ FOMEA FGZUD


Begining of the calculation from the two first messages:


RWRAS....DHJGM
RWR      DHJ 
 WRA      HJG
  RAS      JGM

EGDMC....SDWRM 
EGD    => Difference (RWR - EGD) = 13-16-14, weight = 480
 GDM    => Difference (WRA - GDM) = 16-14-14, weight = 672
  DMC    => Difference (RAS - DMC) = 14-14-16, weight = 234
SDW    => Difference (DHJ - SDW) = 11-04-13, weight = 756
 DWR    => Difference (HJG - DWR) = 04-13-15, weight = 935
  WRM    => Difference (JGM - WRM) = 13-15-00, weight = 4611

Sum = (480 + 672 + 234 + .... + 756 + 935 + 4611)

Z = (Sum * 100) / (2 * length) = (392503 * 100) / (2*290) = 67,673

Final results (with 290 letters in superposition):
- Messages 1 and 2:  67,673  (they are effectively in depth)
- Messages 1 and 3:  43,332
- Messages 2 and 3:  44,822

But if I only take the first 70 letters:
- Messages 1 and 2:  75,419  (they are effectively in depth)
- Messages 1 and 3:  32,748
- Messages 2 and 3:  37,257

Finally, if I only take the first 50 letters:
- Messages 1 and 2:  77,296  (they are effectively in depth)
- Messages 1 and 3:  41,679
- Messages 2 and 3:  30,440


Remark: The German Cryptanalysts doesn't transform value by logarithm 
function. Normally, when we multiply probalities we add logarithms.


2. Other Criteria
=================
a) Number of Doublets
---------------------
The doublet frequency of two English clear texts when a word separator letter
is used is about 8%. In the cipher text of messages in depth, however, doublets
occur only when there also are doublets at these positions in the basic clear
texts. A doublet frequency of between 6 and 8 percent in our cipher text, 
therefore, signifies depth.

This doublet criterion is not valid for messages which are offset by one in
their clear text...


b) Different Length and Different Enipherment Times
---------------------------------------------------
With the same length and the same encipherment time there is always the 
suspicion that we are dealing with the same text in both messages and that
it was only sent on time wrong and on time correctly. Here we obtain, by
means of our trigraph differences, a starting point for depth.


3. Other methods not used by German cryptanalysts
=================================================
a) IC
---------------------------
IC (Index of Coincidence) is a technique invented by W.F. Friedman of putting 
two texts side-by-side and counting the number of times that identical letters 
appear in the same position in both texts. If the texts are in depth, the IC
is characteristic of the language. If the texts aren't in depth, the IC is
near by 0.0385

IC:
- Message 1 & 2:  0.076
- Message 1 & 3:  0.051
- Message 2 & 3:  0.028

Remark: The IC measure was used by Germans but to find depths for code problems,
not for M-209 problems. They used Hollerith tabulation machines to do that.


b) Bamburismus
--------------
Bambursimus uses a mathematical measure to decide if two messages are in depth.
This measure is more accurate than IC. It was invented by Alan Turing.

The Bayes Factor for M monograms in an overlap of N is therefore: 

BF = ((P ** M) * ((1-P)**(N-M)))  / (((1./26.)**M)*((25./26.)**(N-M)))
 
In deci BAN :
BFdB = int ( 20 * log (BF ))

P is the probability of single character matching (it is equivalent to IC [Index of Coincidence]).
In German,  P = 0.0762
In English, P = 0.0667
In French,  P = 0.0778
In Spanish, P = 0.0770

(1 – P ) is the inverse probability (probability of no matching)

M is the number of coincidences

N is the length of superimposition

The presence of bigrams and trigrams are considered as bonus.

Final BFdb results (with 290 letters in superposition):
- Messages 1 and 2:  82 with 1 bigram  (they are effectively in depth)
- Messages 1 and 3:   0
- Messages 2 and 3: -84