Difference between revisions of "Hypothesis Testing"

From Zodiac Killer Ciphers Wiki
Jump to: navigation, search
(Introduction)
Line 2: Line 2:
 
= Introduction =
 
= Introduction =
  
The Zodiac Killer's 340-character cryptogram (referred to here as Z340) has many things in common with his 408-character cryptogram (Z408), but still remains unsolved.  It seems very likely that Z340 is not constructed using the same method as Z408, otherwise it would have been solved by now.  Within the last 10 years or so, many software tools have been built to effectively solve homophonic substitution ciphers that have the same properties as Z408.  If Z340 was a simple homophonic substitution cipher, it would have been solved by now by one of the numerous manual or automatic attempts.
+
The Zodiac Killer's 340-character cryptogram (referred to here as Z340) has many things in common with his 408-character cryptogram (Z408), but still remains unsolved.  It seems very likely that Z340 is not constructed using the same method as Z408, otherwise it would have been solved by now.  Many software tools have been built to effectively solve homophonic substitution ciphers that have the same properties as Z408.  If Z340 was a simple homophonic substitution cipher, it would have been solved by now by one of the numerous manual or automatic attempts.
  
Since Z340 remains unsolved, we must consider the possibility that some other scheme is used in its constructionHow do we rule them out?  There are many schemes to choose from.  People have tried many different schemes, but there is not yet a comprehensive study over which schemes are feasible.  Moreover, it is not clear if a particular scheme can be broken if Z340 uses it.  So we need a comprehensive way to answer these questions about a scheme:
+
Zodiac may have used some other scheme to produce Z340In their attempts to crack Z340, people have explored many different schemes, but there is not yet a comprehensive study about which schemes are feasible.  Moreover, it is not clear if a particular scheme can be broken if Z340 uses it.  We need a comprehensive way to answer these questions about a scheme:
 
* Is it possible to create a Z340-like cipher using this scheme?
 
* Is it possible to create a Z340-like cipher using this scheme?
 
* Can a Z340-like cipher created using this scheme be cracked reliably?
 
* Can a Z340-like cipher created using this scheme be cracked reliably?
Line 11: Line 11:
 
= Methodology =
 
= Methodology =
  
The strategy presented here is the following:
+
In this article I will present my approach to answer the questions above.  Here is my strategy:
 
* First, define a hypothesis for Z340's encipherment scheme.  The hypothesis is a collection of statements about properties of Z340's plaintext, and the operations performed on the plaintext to turn it into ciphertext.
 
* First, define a hypothesis for Z340's encipherment scheme.  The hypothesis is a collection of statements about properties of Z340's plaintext, and the operations performed on the plaintext to turn it into ciphertext.
 
* Create at least 100 test ciphers under the scheme.
 
* Create at least 100 test ciphers under the scheme.
* Guide the generation of test ciphers so that they contain statistical properties and features that are very similar to Z340.
+
* Guide the generation of test ciphers so that they contain features and statistical properties that are very similar to Z340.
 
* Cryptanalyze the test ciphers, and recover their plaintext messages.
 
* Cryptanalyze the test ciphers, and recover their plaintext messages.
 
* If all the test ciphers can be solved reliably, but the same procedure fails to solve Z340, then we have stronger evidence that the chosen hypothesis is false.
 
* If all the test ciphers can be solved reliably, but the same procedure fails to solve Z340, then we have stronger evidence that the chosen hypothesis is false.
If we are lucky, one of the schemes will lead to a solution for Z340, or the newly gained knowledge will guide the search to more plausible encipherment schemes.
+
* Select another hypothesis and repeat the process.
 +
This method is labor intensive but helps to rule out specific schemes.  If we are lucky, one of the schemes will lead to a solution for Z340, or the newly gained knowledge will guide the search to more plausible encipherment schemes.
  
 
= Approach for generation of test ciphers =
 
= Approach for generation of test ciphers =
 +
 +
Test cipher production consists of two phases.  First, candidate plaintexts are randomly selected from a large collection of material from [https://www.gutenberg.org/ Project Gutenberg].  The combined material amounts to over 3.6 million words of English text.  Plaintexts that, under a given scheme, do not retain some of the desired features described below are discarded. 
  
 
== Production of plaintexts ==
 
== Production of plaintexts ==
 +
  
 
== Mimicking features and statistics of Z340 ==
 
== Mimicking features and statistics of Z340 ==

Revision as of 03:03, 27 June 2016

Introduction

The Zodiac Killer's 340-character cryptogram (referred to here as Z340) has many things in common with his 408-character cryptogram (Z408), but still remains unsolved. It seems very likely that Z340 is not constructed using the same method as Z408, otherwise it would have been solved by now. Many software tools have been built to effectively solve homophonic substitution ciphers that have the same properties as Z408. If Z340 was a simple homophonic substitution cipher, it would have been solved by now by one of the numerous manual or automatic attempts.

Zodiac may have used some other scheme to produce Z340. In their attempts to crack Z340, people have explored many different schemes, but there is not yet a comprehensive study about which schemes are feasible. Moreover, it is not clear if a particular scheme can be broken if Z340 uses it. We need a comprehensive way to answer these questions about a scheme:

  • Is it possible to create a Z340-like cipher using this scheme?
  • Can a Z340-like cipher created using this scheme be cracked reliably?
  • Is this scheme more likely than other schemes to produce some of the unusual features observed in Z340?

Methodology

In this article I will present my approach to answer the questions above. Here is my strategy:

  • First, define a hypothesis for Z340's encipherment scheme. The hypothesis is a collection of statements about properties of Z340's plaintext, and the operations performed on the plaintext to turn it into ciphertext.
  • Create at least 100 test ciphers under the scheme.
  • Guide the generation of test ciphers so that they contain features and statistical properties that are very similar to Z340.
  • Cryptanalyze the test ciphers, and recover their plaintext messages.
  • If all the test ciphers can be solved reliably, but the same procedure fails to solve Z340, then we have stronger evidence that the chosen hypothesis is false.
  • Select another hypothesis and repeat the process.

This method is labor intensive but helps to rule out specific schemes. If we are lucky, one of the schemes will lead to a solution for Z340, or the newly gained knowledge will guide the search to more plausible encipherment schemes.

Approach for generation of test ciphers

Test cipher production consists of two phases. First, candidate plaintexts are randomly selected from a large collection of material from Project Gutenberg. The combined material amounts to over 3.6 million words of English text. Plaintexts that, under a given scheme, do not retain some of the desired features described below are discarded.

Production of plaintexts

Mimicking features and statistics of Z340

Features:

  • Unigram distribution
  • Bigram distribution
  • Trigram distribution
  • TODO

Base Hypothesis

TODO

Hypothesis 1: Z340 is a monoalphabetic homophonic substitution cipher.

TODO

Hypothesis 2: Columnar transposition was applied to the plaintext prior to homophonic substitution

TODO

Hypothesis 2: Scytale transposition was applied to the plaintext prior to homophonic substitution

TODO