Cryptanalysis of Louis XIV's codebooks () ()


My Web page about Cryptology

The Louis XIV's codebooks

Methods to break codebooks, introduction

In cryptology, there are basically two encryption methods:

  • Ciphers (which are divided into substitution and transposition)
  • Codebooks

The methods of cryptanalysis of these two types are very different.

Note: Because codebooks are derived from simple substitution, there are some analogies between breaking codes and breaking a substitution.

To break a codebook, the heart of the method is the indexing of the codegroups and then the search for the meaning of each of the codegroups according to the context. This is why we begin this document by describing this method.

Then we describe the theoretical approach of Friedman, one of the most famous cryptanalyst of the history. We continue with the method that Bazeries used to break for the first time in history a two part codebook (at least the first time in a public way). We continue with the English decipherments which have been partly kept secret for a long time. We end by describing the methods of the Spanish cryptanalysts, especially when they attacked the French codes of the 17th century.

The Heart of the Method: Codegroup Indexing

Description of the method

Yardley's book "The American Black Chamber" describes to us the heart of any method of cryptanalysis of a codebook:

Each cryptogram is reported on a set of cards (as many as there are groups in the set of cryptograms). Each card contains a codegroup (which will serve as an index) and which corresponds to each group in the order of appearance as well as the four groups preceding and following the indexed group. At the end of the line is indicated a reference which specifies the cryptogram number and the line where the group appears.

The cards are then classified according to the index (the number of the codegroup). We thus obtain group frequency. Later we will only study the most frequent groups (and then the other groups). As deductions are made, the codes preceding and following the index group are replaced by their value in clear, which often makes it possible to deduce the meaning of the index group.

As you can see, this method has two major drawbacks:

  • It takes a lot of work. The use of a large team quickly becomes essential if the code is complex and includes many groups.
  • You need to know the meaning a few codegroups before you can use this method (you have to prime the pump).

It can also be noted that this method easily gives us the frequency of the codegroups and the repetitions.

Note: Nowadays, a computer program simplifies the use of this method.

Example

To illustrate this method, we will take the messages that allowed Bazeries to break the 1691 codebook. Here is the beginning of the 2nd cryptogram:

Line 1:  480 367 029 270 479 163 167 128 024 207 229 234
Line 2:  508 099 094 469 289 260
...
Each of the following lines is reproduced on a card:
                <<480>> 367 029 270 479    2:1 
            480 <<367>> 029 270 479 163    2:1 
        480 367 <<029>> 270 479 163 167    2:1 
    480 367 029 <<270>> 479 163 167 128    2:1 
480 367 029 270 <<479>> 163 167 128 024    2:1 
...
024 207 229 234 <<508>> 099 094 469 289    2:2
Then, we classify the cards according to the indexes, for example:
034 282 255 097 <<508>> 405 024 484 097    1:17
026 125 454 022 <<508>> 556 119 573 502    1:33
... 
024 207 229 234 <<508>> 099 094 469 289    2:2
...
As the deciphering progresses, one of the previous card becomes as follows:
026  125 454 022 <<508>> 556 119 573 502    1:33
vous ne  m   en  <<--->> y   e   z   rien
It is easy to deduce that the codegroup 508 corresponds to the letter “a”. Of course in the genuine method, we prefer that there are confirmations.

References

  • The American Black Chamber, by Herbert O. Yardley, Aegean Park Press. Unmodified reproduction of a book published in 1931.

Friedman's method

Friedman analyzed German codes during WW1. He wrote a report that describes the method to solve them. This method is made up of two parts: classification and identification.

The first phase consists of dividing each of the codegroups into a class: numbers, words and sentences, letters and syllables, punctuation marks, … The second phase is the identification of each of the codegroups. Here are the main classes:

  1. Numbers: These are very common, with the numbers 1 and 2 being the most numerous. They are in connection with groups of the "Military Unit" class (Army corps, division, battalion, regiment, ...). These groups are grouped into groups of two, three, four groups, exceptionally more than four. There is great diversity in the arrangement of permutations of these groups.
  2. Groups for spelling (letters and syllables): The purpose of these codegroups is to spell words that do not belong to the code. In fact, out of incompetence, many cipher clerk often spell out words that exist in the code.

    They are very common. They come in a chain. Unlike numbers, there are frequent repetitions of sequences of groups.

  3. Words and phrases …
  4. Punctuation marks …

In addition to the previous method called "By First Principles", Friedman describes a faster method, "By Analogy". It consists of bringing unresolved cryptograms closer to messages using an already known codebook. In particular by analyzing the codegroups more frequent, by comparing the beginnings of messages and by comparing messages of the same length.

References

  • Solving German Codes in World War I, with an added special "Code Problem" for the student, par William F. Friedman, Aegean Park Press, 1977. This book was written by Friedman at the end of the 1914-1918 war when he was still in France.

The decryption made by Bazeries of the 1691 codebook

At the end of the 19th century, Commander Bazeries succeeded in deciphering encrypted letters dating from 1691. They contained correspondence between Louvois or Louis XIV and Marshal de Catinat who was fighting in the Dauphiné (see the 1691 codebook).

Bazeries, following his work, wrote a book, published in 1893, which relates his hypothesis on the identity of the Man in the iron mask. He bases his hypothesis on his decipherments. In the appendix to his book, he describes the method he followed to break Louis XIV's codebook of 1691.

Here are excerpts from Bazeries' book that describes his method:

Until now, codebooks such as that of Louis XIV had been considered absolutely indecipherable. Each codegroup having an arbitrary value, it did not seem possible to determine its value. Also the decryption of Commander Bazeries produced great emotion in the world of cryptologists.

In 1891, Commandant Gendron, of the general staff of the army, made a study on the Catinat's campaigns. He found himself stopped in his research by ciphered dispatches. He communicated them to Major Bazeries, who after examination, made a point of deciphering them. …

The work required by the reconstitution of the cipher went through three successive phases. The first consisted in the pointing or more exactly, in the recording of the codegroups used, to obtain the order of frequencies and the repetitions. … This work was long and monotonous …

It does not seem useful to give here, in extenso, the result of this recording of the groups. It will only be pointed out that the most employed group was 22 existing 187 times. Next were the groups:

      124 existing 185 times,
       42 ---      184 times,
      341 ---      145 times,
      125 ---      127 times,
       24 ---      124 times,
      145 ---      122 times, etc.

It is clear that it is in the codegroups often used that it was necessary to seek the words, syllables and letters recurring most frequently in the plaintext. As for rehearsals, i.e. groups reproducing together, 2 by 2, 3 by 3, 4 by 4, etc. It is very important to mark them specially, because these repetitions contain the same word or the same phrase. …

(second phase) … The starting point of any cipher reconstruction is always the search for a supposed word ("mot probable" or crib). What word to suppose? Shall we say? In this case, Major Bazeries operated on the word "ennemi" (Enemy) that he expected, rightly, to encounter in the military correspondence submitted to him. "Ennemi" could be represented by a single group, which for: "les-ennemi-s" (Enemies) would result in 3 groups frequently coming together. "Ennemi" could be syllabled; then "les-en-ne-mi-s" formed 5 groups. After examination of the repetitions, it was determined for syllable "ennemi" and it was conjectured that it was the groups below reproduced often with a slight variation which ciphered "les enemies"

      124.  22.  146.  46.  469.
      124.  22.  125.  46.  574.
      124.  22.  125.  46.  120
      124.  22.  125.  46.  584.
      124.  22.  125.  46.  345.
      les   en   ne    mi   s

The supposition of the words "les ennemis" and of the groups which reproduced them having been found to be correct, the whole codebook of Louis XIV was broken by the sole fact of this starting point. …

The third phase is the most attractive. The work consists, as soon as we have a part of the codebook, to find the other part. You don't need to be a consummate cryptologist for this; sagacity and straightness of reasoning suffice for this task. However, we must proceed with caution. An ill-determined group is a source of errors, the rectification of which may become impossible; before definitively adopting the value of each group, it must be proven. …

(Example)

… pour bien défendre la place 436. 291.  53. 154. 22. 465.
                              et   j’ay  ??? e    en  mesme
48.   52. 255. 556. 281. 115. 9. 450. 201. 578.
temps que lon  y    fi   st   re me   t    t
93. 34. 548. 503. 311. 272.  22. 412. 115. 525.
re  de  l    ???  pour qu’il y   en   eu   au
259.  503. 437. 424. 385. 129. 273.  42. 454. 583 …
moins pour u    n    a    n    et    de  m    y 

Of course, it is "ordonn" (order) and "argent" (money) that are encrypted by "53" and "358". There is no shadow of a doubt, for that is the necessary meaning of the sentence.

Remarks :

  1. As we can see, Bazeries is not aware of the decryptions carried out by the English.
  2. Bazeries insists on the caution that must be applied when identifying a codegroup. Why did he not use this caution to determine the meaning of codegroup 330 (cf. the Man in the iron mask)?

References

  • Le masque de Fer, Révélation de la correspondance chiffrée de Louis XIV. Emile Burgaud et commandant Bazeries, 1893, librairie de Fimin-Didot.

Spanish deciphering methods at the time of Louis XIV

An extremely rare work has been discovered in the Brussels archives by DEVOS and SELIGMAN: L’ART DE DESCHIFFRER (The art of deciphering). This work describes the methods of cryptanalysis used by the Spaniards at the end of the 17th century. It deals at the beginning with simple substitution (the usual method of vowel/consonant separation is presented) and then the use of codebooks that mix letters, syllables and words. Each of the two parts is followed by examples of decryptions in Spanish and French. Thus, the letter of 1676 presented later.

In both cases, the authors of this report rely on the characteristics of the language used in the cryptogram. The authors only deal with French and Spanish. For the following, I will indicate their conclusions only with regard to French language. Here is a summary of the code analysis method presented in the second part of the book.

In French, the most common letters are E, A, N, R, S, T and U. E being by far the most common letter. The letter Q is most often followed by a U. The most common syllables being ce, de, le, la, me, ne, que, re, se, te, en, on, er, is, il, et. Note that many frequent syllables end in "E". Here are some groups that follow among the most frequent: n-s, n-t, r-t, r-s, de-s, le-s. The most frequent doubles are aa, ff, pp, tt. Some patterns are particularly useful: either the codegroup Y surrounded by the same codegroup X (X. Y. X.). This configuration can correspond to me-s-me, que-l-que, te-s-te, che-r-che, … The model X. Y. X. Y. can correspond to che-r-che-r, me-s-me-s, … The X.Y. X. Z. W. X. can correspond to re-p-re-n-d-re.

Note: The cryptanalysts who wrote the book can be seen to describe the language very accurately from the perspective of the cryptology. These descriptions have nothing to envy to those made in the 20th century by the famous cryptologist Friedman.

To analyze a code, we start by listing the most frequent codegroups. We analyze them most carefully: their frequency, their associations with each other and where they are distributed in the cryptogram.

Initially, the nuls groups are determined. They are easily identifiable: they are at the beginning and end of paragraphs.

It is then necessary to find the groups which code the letters and the syllables. In the case of the simple substitution, it was easy to find the E which is the most frequent letter. In a codebook its frequency is diluted in the use of syllables ending with this letter but these syllables are among the most frequent! The S, as it ends the words, is often found spatially with a spacing that corresponds to the words: "lesS grandeS et petiteS preuveS que leS fidèleS alliéS..." [the example of the work is in Spanish].

Based on the description of the language that we made previously, we will try to determine one or more groups.

When one studies a group, it is easy to find the meaning of those who follow or precede it. For example, if a group has been determined to correspond to the letter N, this can be followed by an "S" (actions, fortifications, in, ...) or a "T" (ment, tant, element , are, …). If a group corresponds to the syllable DE, it can form the words "grande" or "monde".

For a very common group, its meaning can also be found by elimination by determining what it cannot be.

To determine the groups that correspond to words, there are several indications. They are recognized because they cannot form words (they are not used to spell).

For proper names (cities, characters, ...), they often correspond to high numerical codegroups because they were added at the end of the code. On the other hand, they are often preceded by groups meaning "le, la, de, Monsieur, …"

If we are not sure whether a group is a letter or a syllable, we must study it in all the places where the group occurs and we determine if our hypothesis is verified or is to be rejected.

Some principles help in code reconstruction:

It is necessary to look for the repetitions of groups (X. Y., X. Y. Z., …) and the repetition of patterns (X. Y. X., …). If we think we can guess the meaning of a sequence of groups, slight modifications will give us equivalents, for example the different groups meaning the letter O [an example in Spanish is given].

If we have determined a few groups, such as BA, BE, BI and we find a rule that connects the codegroups that represent them, for example, 42, 53, 64, we can deduce the values for the other groups BO and BU: 75 and 86.

I don't know if the Spanish cipher office managed to decipher the French two part codebooks at the end of the 17th or 18th century, but I think they certainly had the skill.

References

  • L’ART DE DESCHIFFRER, Traité de déchiffrement du XVII siècle de la secrétairie d’état et de Guerre espagnole. Traité édité par J.P. DEVOS et H. SELIGMAN en 1967 aux UCL presses universitaires de Louvain.

English decipherments (J. Wallis, ...)

John Wallis who performed code breaking for the English government published in his book "Opera Mathematica" (which deals mainly with mathematics), two decryptions he had performed: one concerning a 1689 French codebook used for exchanges between the Marquis de Bethune and Cardinal d'Estré, the other concerning another French codebook from the same year used for exchanges between D. de Teil and King Louis XIV. The codes used to encrypt the letters that Wallis deciphered, are of one part code and semi-ordered type. John Wallis therefore did not have too much trouble deciphering them. However, he did not publish his method of resolution, which upset his fellow mathematician Leibniz.

In Kahn's book (The Codebreakers), there is another decryption by John Wallis of a cryptogram of King Louis XIV from 1693. Again, the codebook used a single-table and was of semi-ordered type (a: 2, b: 5, c: 8, … la: 185, le: 195, …).

Also in The Codebreakers, a French dispatch dated 1716 (one year after the death of King Louis XIV) is decrypted by the English cipher office (J. Wallis was already dead). On the other hand, here, it is a two part codebook. Bazeries was therefore not the first to break this type of code!

References