Hypothesis Testing

From Zodiac Killer Ciphers Wiki
Revision as of 03:22, 27 June 2016 by Admin (talk | contribs) (Features and statistics of Z340 that are desired in the generated test ciphers)
Jump to: navigation, search

Introduction

The Zodiac Killer's 340-character cryptogram (referred to here as Z340) has many things in common with his 408-character cryptogram (Z408), but still remains unsolved. It seems very likely that Z340 is not constructed using the same method as Z408, otherwise it would have been solved by now. Many software tools have been built to effectively solve homophonic substitution ciphers that have the same properties as Z408. If Z340 was a simple homophonic substitution cipher, it would have been solved by now by one of the numerous manual or automatic attempts.

Zodiac may have used some other scheme to produce Z340. In their attempts to crack Z340, people have explored many different schemes, but there is not yet a comprehensive study about which schemes are feasible. Moreover, it is not clear if a particular scheme can be broken if Z340 uses it. We need a comprehensive way to answer these questions about a scheme:

  • Is it possible to create a Z340-like cipher using this scheme?
  • Can a Z340-like cipher created using this scheme be cracked reliably?
  • Is this scheme more likely than other schemes to produce some of the unusual features observed in Z340?

Methodology

In this article I will present my approach to answer the questions above. Here is my strategy:

  • First, define a hypothesis for Z340's encipherment scheme. The hypothesis is a collection of statements about properties of Z340's plaintext, and the operations performed on the plaintext to turn it into ciphertext.
  • Create at least 100 test ciphers under the scheme.
  • Guide the generation of test ciphers so that they contain features and statistical properties that are very similar to Z340.
  • Cryptanalyze the test ciphers, and recover their plaintext messages.
  • If all the test ciphers can be solved reliably, but the same procedure fails to solve Z340, then we have stronger evidence that the chosen hypothesis is false.
  • Select another hypothesis and repeat the process.

This method is labor intensive but helps to rule out specific schemes. If we are lucky, one of the schemes will lead to a solution for Z340, or the newly gained knowledge will guide the search to more plausible encipherment schemes.

Approach for generation of test ciphers

Test cipher production consists of two phases. First, candidate plaintexts are randomly sampled from a large collection of material from Project Gutenberg. The combined material amounts to over 3.6 million words of English text. Plaintexts that, under a given scheme, do not retain some of the desired features described below are discarded. Then when a suitable plaintext is found, a multiobjective optimization algorithm explores a space of encipherments under a given scheme. The algorithm looks for encipherments that maximize the generated ciphers' similarity to Z340. The following section describes the features and statistics the search attempts to maximize.

Features and statistics of Z340 that are desired in the generated test ciphers

  • Unigram distribution: The generated cipher should have the same frequencies of individual symbols.
  • Bigram distribution: The generated cipher should have the same number of repeating bigrams.
  • Trigram distribution: The generated cipher should have the same number of repeating trigrams.
  • Periodic bigram distribution: The generated cipher should have the same unusual number of repeating bigrams at periods 19 and 15.
  • Z340 has several pseudo-words ("her", "god", "zodiac") that appear directly in the cipher text. The generated ciphers should have similar words.
  • The generated cipher should contain similar box corner patterns
  • The generated cipher should contain similar "fold marks"
  • The generated cipher should contain similar "pivot patterns", that are each oriented in the same directions.
  • The generated cipher should have similar degrees of symbol cycling, wherein regularity is found in the homophonic assignments of symbols to individual plaintext letters.

Base Hypothesis

TODO

Hypothesis 1: Z340 is a monoalphabetic homophonic substitution cipher.

TODO

Hypothesis 2: Columnar transposition was applied to the plaintext prior to homophonic substitution

TODO

Hypothesis 2: Scytale transposition was applied to the plaintext prior to homophonic substitution

TODO