M-209: Cryptanalysis


M-209 Home Page Hill Climbing:

Introduction

Importance of M-209 Cryptanalysis

Decryption of M-209 and more generally Decryption of Hagelin machines serie-C was of enormous strategic importance, possibly equal or more important than Enigma one. Indeed, the M-209 and C-38 were not only used during the 2nd World War (by Americans and Italians respectively) but also these machines were used by virtually every country in the World during a part of the Cold War. The US after the 2nd World War were informed about the methods developped by German cryptanalysts. Then, the agency NSA, developped their own methods and prevent the publication of innovative approaches (see Ritchie). Mastering the cryptanalysis of the M-209 allowed NSA not only to decrypt enemies messages (or allies messages like France ones) but also to protect its communications. Thus, it appears that North Korea had not deciphered any M-209 messages during Korean War.

The methods used by Germans, by British and Americans are still partly classified. For example, the aim of the British Nightingale machine is still unknown. Conversely, Enigma cryptanalysis has now almost no secret.

Cipher Text Cryptanalysis versus Known Plain Text Attack

If you try to decrypt an encrypted message using the machine M-209 from ciphertext alone, you must have a message with a length of several thousand characters (most recently at least 400 characters). This situation is very rare or non-existent if the M-209 is well used. Indeed, most of the time the security rules impose to cipher clerk a maximum length (500, 370 or 250 characters). In all cases this length is below the threshold of several thousand characters.

That is why the decryption services were mainly directed to a known plain text attack. Several situations can cause an enemy to know the plain message (plain capture, crib, re-encryption of a message, ...), but the situation with messages in depth is the most likely to occur in actual use. In this case, it is possible to infer not only the plain text (two cryptograms are sufficient) but also the key used.

The existence of messages in depth is the main flaw of M-209 machines. If a Signal Service uses M-209 it must do everything so that these superimpositions do not occur, for example to limit the number of messages exchanged with the same internal key. Indeed, overlaps become inevitable from a certain threshold of exchanged messages (see secure indicator).

Practical Procedure to break M-209 code

We have just understood that alone, messages in depth are used to decrypt messages encrypted with the same internal key (typically daily messages). We can deduce the general procedure for decryption:
  1. First, we must look for depths. To do this we must try to superimpose each message to all other messages and this at each position. Testing the presence of depth requires a statistical method to assess whether messages are actually in depth. These searches require the use of machines or computers.
  2. Then you have to find the plain text in the overlapping area. It is possible to use manual methods, but again, machines can help. For instance one which can scan one of the messages with hundreds of probable word (crib) and display the corresponding string if a statistical test indicates plain text presence.
  3. It is now easy to determine the displacements created by the machine .
  4. Then you have to find the internal key. You must have at least 75 displacements (then 75 characters), possibly only 50. Until very recently, only manual methods were known.
  5. The last step is to find the External key for each message encrypted with the internal key discovered in the previous step. The crudest method is to try each external key (in total more than 100 million) for each message. A statistical test is used to test whether the key is likely. Obviously a computer is required.
Several situations can simplify the previous diagram:
  • The existence of messages with the same indicator. In this case, naturally it can be inferred that they use the same external key and are in depth.
  • The presence of more than two superimposed messages. In this case, find the plain texts (and therefore the displacement chain) is very easy.
  • If the overlapping area is long it becomes very easy to find the Internal key. For example, an area of 300 letters offers no difficulty.
  • If the indicator method is known, it can allow the enemy to find the External key without having to try all the keys.
  • Finally, if the indicator method is not sufficiently secure, it is possible to deduce the overlapping areas only from the indicators and thus to deduce the plaintext and then the internal key and then the External key of each message. It is even possible to find the relative position of each message in the keystream and to use cipher text only methods. The use of a good indicator method is obviously critical.

Cryptanalysis with Known plain text

The method

The encryption formula of the M-209 is:

	cipher_letter = ((key + 25) - plain_letter) % 26

A = 0, B = 1, C = 2, ...,% means "modulo"


25 = -1 % 26
We deduce the key:
	key = (cipher_letter + plain_letter + 1) % 26
Each key is called displacement too. The whole keys form the keystream.

If we know the plain text of the corresponding ciphered message, one can calculate the key sequence by the above formula.

Then, we try to determine the value of Pins of each wheel. To do this, we cut the key sequence with a period equal to the size of the wheel in question. For example for the wheel 17, the key sequence is cut in line of 17 values. In each column, the same Pin is used and it is either active or inactive for all values of the column. By averaging the values of the column we can deduce if the pin is active or not. Finally when we determined all Pins values, the value of Lugs is easy to find.

We have described in broad terms the method. The Morris's article in Cryptologia (1978) described in greater details this method. I wrote an example using this method. This method is also described more simply by Barker (1977). Beker & Piper (1982) used Morris's method but expanding it. Deavours & Kruh (1985) decribed Morris's method too. In 1990 Paul H. Greenough described a method for finding the key for a hypothetical machine derived from series-C Hagelin: the ungaged Hagelin. His method is of known plaintext attack type. It requires at least 500 characters.

We have just presented public methods (published in books or scientific journals). But the Germans and the Americans knew finding the internal key messages even when the number of displacements was very low (less than 75 characters) during WWII. The Signal Security Agency (predecessor of the NSA) used the Horizontal-Vertical method . Details of the German cryptanalysts method have not reached us. We have only a basic description (TICOM I-175)

Known plain text attack, problem: how to find messages in depth?

It's easy to find messages in depth if safety rules are not respected. Thus, if two messages have the same indicator, it can be inferred that they have been encrypted with the same key.

American cryptanalysts used IC (Index of Coincidence) to discover depths. I wrote the program superimpose which uses this method.

German cryptanalysts also developed a method to find an overlap even when a gap exists (and messages do have different indicators). It was a statistical method based on the supposed presence of frequent trigrams appearing in deciphering when messages were superimposed. (TICOM DF-120). I wrote a small description of this method.

It seems that the German had at the end of the WWII, a mechanized method for searching the presence of depths using a statistical method based on digraphs. Unfortunately the details are still classified (TICOM I-31).

Known plain text attack, problem: how to get plain text?

In the previous method, we must know at least 50 plain letters to apply it. 75 letters or more is preferable. There are several possible methods to get this plain text, such as a known stereotypical beginning: "Message number ... from headquarters to general ...". The usual method is based on the superimposition of messages encrypted with the same key.

Suppose there are two messages encrypted with the same key. Next, search for probable words in one of the two texts. Eg ZPARENZ chain was very common during WWII in American messages (it framed addresses). The probable word is placed at each position of the first text and we try to decipher the text of the 2nd message at the same position. If at a given position, plain text appears, it was proved that the probable word was actually used. Also known its position in the first text, we also find another probable word in the second text. Then we try to guess the text before and behind each probable word. After many trials / errors, we can determine the entire plain text of the two Cryptograms. The book gives many examples of using this method, the TICOM I-175 document too. The TM 11-380 manual (1947) gives an extensive description too. I wrote an example using this method. I also wrote a program which scans a text with a probable word and prints the correspondant word in another text in depth. This last method is similar to a manual one. I wrote too the program proba_word using a dictionary or a simple probable word but the program print a measure based on trigram probability.

Machines (or Computers) can help, such as trying to scan one of the messages with hundreds of probable word (crib) and display the corresponding string if a statistical test indicates plain text presence. Allies, during WWII and after used many machines (called "cribdragger") to do that.

Known plain text attack, problem: how to solve other messages?

Each message in a period (a day for example) are encrypted with the same basic key (Pins and Wheels setting). But each message is encrypted with a different message key (the initial position of the wheels). The brute force method is to try every possible key, but there are 100 million possibilities. Nowadays it isn't a hard problem with a computer. During the 2nd World War, the probable word method was used. This method is described in the Donini's article (Cryptologia, 1990). I wrote an example using this method.

After the WWII, Americans built the Hecate special hardware to find the message key using a crib of a maximum length of 12 letters.

George Lasry, in 2105, sent me an algorithm to recover the message key very quickly. His algorithm uses an Hill Climbing approach.

I wrote several programs which can find the message key. One use the brut force. It tries the 100 million possibilities with the IC criterion or a Crib like the Hecate machine. Another program uses the Lasry's algorithm.

German Break

This page summarizes how the German cryptanalysts solved M-209 encrypted messages from in-depth.

Hill Climbing method

In 2014, Georges Lasry solves several problems of my Challenge. He used computer programs based on the "Hill Climbing" algorithm. In July 2015 he and his colleagues Nils Kopal & Arno Wacker published in Cryptologia online, an article on the "Known plaintext attack" method they use. I describe his algorithm in another page. I wrote a program (HC_KPA.py) based on his algorithm.

Cryptanalysis with ciphertext only

Statistical method

The method is similar to the known plain text attack : The cipher text (not the key sequence) is written on a width of each wheel. We begin with the width of 17. The first step is to put each column into one of two classes. One class corresponds to an active Pin and the other to an inactive Pin, but we don't know which one. Then we try the two possibilities for each wheel. Then we determine the Lug cage. This method needs a great quantity of cipher text : several thousands of characters.

During World War II, the German cryptanalysts tried it on a german text of 5 000 characters. They never used it on actual american messages. In 1977, Barker describes and uses this method to attack a message of only 800 characters ... but ciphered with only four wheels (instead of 6 wheels)! He challenged the reader by an exercice of four messages, each one less than 1200 letters. I wrote a paper explaining the Barker's method and a program which implements it.

In 1978, James Reed, Dennis Ritchie and Robert Morris subbmitted an article to Cryptologia. They used a statistical method based on complicated mathematical (eigenvectors). NSA prevent them to publish their method. Recently, this article is finally available.

In 1982, Beker & Piper wrote a computer software which automatically deciphered messages of 3000 characters. Sometimes they succeded with messages of 2000 characters and even with 1500 characters. In 1985, Deavours & Kruh described in broad terms this method too.

Hill Climbing

Geof Sullivan describes a method based on Hill Climbing algorithm to solve CD-57 problems when the lugs are known. The messages have a reasonable size (under 500 characters). This method can be used for each Hagelin serie C cipher machine, including the M-209.

Geoff Sullivan, in the same article, solved messages without knowledge of Lugs, but with a size of 2500 characters! I describe his algorithm in another page.

In 2014, I wrote several softwares based on Hill Climbing algorithm to solve M-209 problems with cage (Lugs and Overlaps) known or unknown.

  • The first program (HC_PINSa.py) used a very simple algorithm (I toggle a Pin at random).
  • The second program (HC_KPA.py) used the George Lasry's algorithm. The one he used to solved known plain text problems. I replaced his ADE fitness function with differents methods: IC, Unigram, Bigram, ...
  • The third program (HC_SULLIVAN.py) used the Geoff Sullivan's algorithm. With it, I solved only messages ciphered without overlaps.

In 2014, Georges Lasry solved several problems of my Challenge. He used computer programs based on the "Hill Climbing" algorithm. With his software he solved automaticaly a message of only 1000 characters. He said to me that he was able to solve messages of only 700 characters. His program is very complex. It is composed of four stages, the first one is inspired from Sullivan's algorithm. He and his colleagues Nils Kopal & Arno Wacker published this algorithm in Cryptologia online in October 2015.

In 2016, Georges Lasry invented a new algorithm (IC64) to solve M-209 messages. The software which implements it allows to solve messages more than 2000 characters but it is very simple and fast. It use the "Divide-and-conquer" approach. It searchs only for pins. For a Pins setting (131 pins), it divides cipher letters into 64 (2**6) states, and for each state it computes its IC, then it computes the weigted average. The Pins setting improvement is realised by Hill Climbing algorithm.

At the beginning of 2017, George Lasry again designed an another algorithm and with it, won my M-209 Challenge. He proved that it is possible to break a M-209 message of about 500 characters. For now, he has not yet published his algorithm.

Cryptanalysis in special cases

In special cases, it is possible to break a M-209 key without the knowledge of the plain text :
  • The same plain text ciphered several times
  • A partial Overlap (several wheels were in phase but not all of them)
  • The message keys aren't ciphered (or can be guessed)

During WWII, German cryptanalysts were able to take advantage of the presence of the same plain text ciphered several times but due to an error in the indicator. In 2015, George Lasry solved a more generic problem (the 21th bonus of my M-209 challenge). He used Hill Climbing approach.

In 1990, C.A. Deavours wrote an article in Cryptologia about the solution of C-35 messages with partial overlaps. We can remark that the C-35 is simpler than M-209 : the C-35 has lugs (cams) unmoving. I do not think that this method is usable against M-209 if you do not know the message key and probable words.

The message keys aren't ciphered (or can be guessed)

If the message keys aren't ciphered, or can be guessed, it is easy to place each message in the key-stream to form one big message. Then it is possible to use ciphertext-only attack. If we find superimpositions, we can use plaintext attack.

Using Chineses reminder theorem, it is easy to find the position of a message in the keystream.

An example of calculation of a position in the key-stream when we know the message key:

Message Key: YYQPFI: 
indx = 24, 23, 16, 15, 5, 8

total    =  89705175 * (24+1) + 56787276 * (23+1) + 
            92587950 * (16+1) + 82090450 * (15+1)  +
            42697200 * (5+1)  + 41755350 * (8+1)
position = (total % 101405850) – 1  
         = (7124947699 % 101405850) – 1 = 26,538,198  
An example of calculation of the key when we know the position in the key-stream:
Position:  26,538,198

indx[0] =  26,538,198 % 26 = 24
indx[1] =  26,538,198 % 25 = 23
indx[2] =  26,538,198 % 23 = 16
indx[3] =  26,538,198 % 21 = 15
indx[4] =  26,538,198 % 19 =  5
indx[5] =  26,538,198 % 17 =  8

Message Key: YYQPFI
I wrote the program distance which does these calculations.

A theorical Point of view

In another page, we have shown different calculations of the key size of M-209. This key is 176 bits long. It is therefore out of the question to conduct an Exhaustive search. Here, we present some mathematical complements.

In 1977, James Reeds discussed about Entropy calculations and particular methods of cryptanalysis. In case of M-209, he established several points:

a) The Unicity Distance
For the encipherment of a single message, the number of possible keys is

	K = 2 ** 131 * C(48,21)  = 2**131 * ((48!/(21!(48-21)!)  ~ 6.07 x 10 ** 52
	Log2 (K) = 175.3
	Remark :  C(k, n):  The number of k-combinations from a given set S of n elements.
Then, taking the redundancy of English to be about 3.7 bits per letter, the ordinary Shannon Unicity distance is
	U  = (Log2 (K)) / R  =  175.3 / 3.7 =  47
We may expect, therefore that Hagelin messages of length of 50 or 60 letters, say, have unique solutions.

b) Amount of Ciphertext needed
James Reed calculated that it is necessary to have 1219 letters of cipher-text to find M-209 key from the cipher-text alone.

In 1981, R.L. Rivest derived a formula which estimates how much cipher text is needed to solve a cryptogram produced by a Hagelin cryptograph (among them M-209), using the cryptanalytic technique presented by Barker (1977).

	Number of Wheels   Amount of Ciphertext
		1		 264 Characters
		2		 529 Characters
		3		1058 Characters
		4		2117 Characters
		5		4233 Characters
		6		8468 Characters

An Historical point of view

In the 1930s and 1940s, cryptanalysts have seen appear the cipher machines. The number of possible keys and the length of keystream seemed to give them an almost absolute security. Only the Kerckhoffs method seemed to be able to attack these machines. But we had to have several messages encrypted with the same key (messages in depth). It was the main concern of Germans about the security of Enigma. Their main security measures stemmed from this problem: message size limited to 250 letters, daily modification of internal key and traffic division into several networks.

The cryptanalysts were aware that in case of Hagelin series C machines, few messages (at least three) superimposed were sufficient to find the plain text and the key. By cons, they did not believe that only two messages were enough to break a key. Similarly, they seemed to them impossible to try all possible superimpositions. This blindness has meant that the M-209 has been used incorrectly for many years after the war (until the 1970s) by many small countries.

The books about Cryptanalysis of the time (for example the General Sacco's Manual of Cryptology, 1947) reflected this overall confidence in cipher machines. Conversely, Americans after World War II, knew the limits of M-209 and could continue to use it (especially during the Korean War) safely and could break messages from others countries. They were the only ones with machines and computers able to find and use depths on a large scale.

Cryptanalytic machines (RAM)

Until the 1940s, cryptanalysis was entirely done manually. Thus, the German M-209 decrypting during WWII were almost completely manual. From the years 1955 computer use is spreading as the ultimate weapon for breaking codes (Harvest system for example), . During WWII and until 1955, the allies (mainly) have developed specialized cryptanalytic machines known as RAM (Rapid Analytic Machinery). The best known are the Bombes and Colossus. We now know that were developed many others including specialized machines against Hagelin cipher machines. Most have been created and used by the US in different organisms (SIS, OP-20-G, SSA, ASA, AFSA, NSA). These machines were entirely mechanical or electronic or a combination of mechanical, photographic, electrical and electronic.

IC Machine (SIS, 1941)

The IC (Index of Coincidence) Machine was made to compare two texts and measure coincidences. It could compare texts up 600 letters long at all offsets in a few seconds.

Nightingale (GC&CS, 1942)

This was an electromechanical machine made of standard telephone parts. It was used to deal with Italian traffic encrypted on the C38m Hagelin machine. The information concerns it is still classified.

SATYR (OP-20-G, 1944)

SATYR is an electrical analog of a cipher device known variously as the HAGELIN C-38, CSP-1500 or M-209. It is used for enciphering or deciphering traffic and for various cryptanalytic processes. SATYR permits analytic study of any one element or combination of elements of the C-type HAGELIN.

Cryptanalytic device for solution of M-209 traffic (German, 1945)

A TICOM document (DF-114) describes a German analytic machine dedicated to M-209. The use of this machine is not obvious. I think it was destined to find the absolute setting.

HECATE (AFSAM,1948)

"HECATE (AFSAF-91, CXDD or Hagelin Cribdragger) is a special purpose high speed electronic cribdragger for solving C-38 type Hagelin messages by exhaustive trials {100 millions trials. I supposed the Internal Key was known}. At a hit, the machine stops to permit hand recording of each possible window setting {Message Key} it finds. Operation is at the rate of 75,000 trials per second, each window setting being tested in about fourteen microseconds" {then a complete run is about 20 minutes}. A crib is limited to 12 letters. It is configured thanks to dials. The cost of Hecate was equivalent to five Bombes. It was used for more than a decade. The Hecate machine is still classified.

SKATE (AFSA, 1949)

This machine is a high speed cribdragger for reading depths, slide-run and keyfinder jobs. IBM built the two models. Both models try each of 192 cribs at one position and checking against 2000 recognition groups {pentagrams} in 0.12 secronds. This machine is still classified.

ODD KICK DETECTION DEVICE (AFSA, 1949)

This machine is dedicated to Hagelin problems. About 800 or more characters of cipher are needed. The machine use is still classified.

WARLOCK (AFSA,1951)

This is an electronic message setter capable of high speed decryption and statistical recognition of plain text. The Model I is specifically for Hagelin. It is still classified.

Other machines

Americans created many other cryptanalytic machines: VIVIAN, SLED, PICCOLO, IDA, MERCUREY, DEMON, .... They are either similar to the machines described above or we don't know their using.

How to train and prove his skills?

The book

The book gives many problems with a growing difficulty. The solution and explanations are given. One starts with cipher machine having only one wheel. Then two wheels, and so on up to six wheels. Each time, different kind of problems are proposed: messages in depth, the same message encrypted with different keys, the beginning or end of messages is known, probable words are given. So you can be faced with all possible situations.

The Challenge

The M-209 Challenge and Bonus give many growing difficulties problems too ... but of course, you cannot get the solutions here.

References

  • "The Hagelin Cipher Machine (M-209): Reconstruction of the Internal Settings", by Robert Morris, in Cryptologia, Vol.2, No.3, July 1978 (The Abstract). I wrote an example using this method.
  • "Cryptanalysis of the Hagelin cryptograph", by Wayne G. Barker, Aegean Park Press (1977).
  • "Classical Cryptography Course, Lesson 21 : Cryptanalysis Of The Navy CSP 1500 Cipher Machine [Hagelin C-38 Family]", by Randy Nichols (LANAKI). This paper is based on Barker's methods.
  • "Machine Cryptography and Modern Cryptanalysis", by Cipher A. Deavours and Louis Kruh, Artech House (1985).
  • "Cipher Systems, the Protection of Communications", by Henry Beker & Fred Piper, Northwood Books (1982).
  • Report by Alfred Pokorn of OKH/CHI, on M-209. TICOM I-175.
  • "Report on the solution of messages in depth of the American cipher device M-209" TICOM DF-120.
  • "The Cryptographic services of the Royal (British) and Italian Navies A comparative analysis of their activities during World War II", by Rear Admiral Luigi Domini, Cryptologia (1990).
  • TICOM: Determination of the Absolute Setting of the AM-1 (M-209) by Using Two Messages with Different Indicators DF-105
  • "CRYPTANALYSIS OF HAGELIN MACHINE PIN WHEELS", by Geoff Sullivan, Cryptologia (2002); (The Abstract)
  • "Codes and Ciphers, Julius Caesar, the Enigma and the Internet", by Robert Churhouse, Cambridge University Press (2002).
  • "Statistical Analysis of the Hagelin Cryptograph", by Ronald L. Rivest, Cryptologia, Vol. 5, No. 1, (1981). (Online)
  • "Entropy Calculations and Particular Methods of Cryptanalysis", by James Reeds, Cryptologia (1977), (Online)
  • SMID, C-446A en M-209 Beschrijving en Analyse SMID
  • Dennis Ritchie 2000-05, Dabbling in the Cryptographic World --A Story (Official Link) (another link) This document is troubling. He tells how the NSA prevents Ritchie published an article about a crypto-only type of attack. The NSA probably wanting to reserve the exclusive use of this type of method against the numerous country which use Hagelin cipher machine serie C (among them the notorious M-209).
  • THE HAGELIN CIPHER MACHINE (M-209), Cryptanalysis from Ciphertext alone - James Reeds, Dennis Ritchie, Robert Morris - 1978 (the article).
  • Colossus, The Secrets of Bletchley Park's Codebreaking Computers, by B. Jack Copeland and others. The subject of this book is Colossus, the ancestor of Computers. We learn some information regarding the Nightingale machine.
  • NSA - Cryptanalytic Machines in NSA - 1953
  • Cryptanalysis of the uncaged Hagelin, by Paul. Greenough, Cryptologia, 14:2, 1990, (The Abstract)
  • WWII ARCHIVES - The modified Horizontal-Vertical Method - (Signal Security Agency)
  • SOLUTION OF C-35 TEXTS WITH PARTIAL KEY OVERLAPS, by C. A. Deavours, Cryptologia Volume 14, Issue 2, April 1990, Abstract of the online article
  • "Automated Known-Plaintext Cryptanalysis of Short Hagelin M-209 Messages", by George Lasry, Nils Kopal & Arno Wacker, Cryptologia, Published online: 06 Jul 2015. Published also in Volume 40, Issue 1, 2016 (The Abstract)
  • "Ciphertext-only cryptanalysis of Hagelin M-209 pins and lugs", by George Lasry, Nils Kopal & Arno Wacker, Cryptologia, Published online: 06 Oct 2015. Published also in Volume 40, Issue 2, 2016 (The Abstract). This article is also downloadable from cryptome.org