Difference between revisions of "Hypothesis Testing"
(→Introduction) |
|||
Line 2: | Line 2: | ||
= Introduction = | = Introduction = | ||
− | The Zodiac Killer's 340-character cryptogram (referred to here as Z340) has many things in common with his 408-character cryptogram (Z408), but still remains unsolved. It seems very likely that Z340 is not constructed using the same method as Z408, otherwise it would have been solved by now. | + | The Zodiac Killer's 340-character cryptogram (referred to here as Z340) has many things in common with his 408-character cryptogram (Z408), but still remains unsolved. It seems very likely that Z340 is not constructed using the same method as Z408, otherwise it would have been solved by now. Many software tools have been built to effectively solve homophonic substitution ciphers that have the same properties as Z408. If Z340 was a simple homophonic substitution cipher, it would have been solved by now by one of the numerous manual or automatic attempts. |
− | + | Zodiac may have used some other scheme to produce Z340. In their attempts to crack Z340, people have explored many different schemes, but there is not yet a comprehensive study about which schemes are feasible. Moreover, it is not clear if a particular scheme can be broken if Z340 uses it. We need a comprehensive way to answer these questions about a scheme: | |
* Is it possible to create a Z340-like cipher using this scheme? | * Is it possible to create a Z340-like cipher using this scheme? | ||
* Can a Z340-like cipher created using this scheme be cracked reliably? | * Can a Z340-like cipher created using this scheme be cracked reliably? | ||
Line 11: | Line 11: | ||
= Methodology = | = Methodology = | ||
− | + | In this article I will present my approach to answer the questions above. Here is my strategy: | |
* First, define a hypothesis for Z340's encipherment scheme. The hypothesis is a collection of statements about properties of Z340's plaintext, and the operations performed on the plaintext to turn it into ciphertext. | * First, define a hypothesis for Z340's encipherment scheme. The hypothesis is a collection of statements about properties of Z340's plaintext, and the operations performed on the plaintext to turn it into ciphertext. | ||
* Create at least 100 test ciphers under the scheme. | * Create at least 100 test ciphers under the scheme. | ||
− | * Guide the generation of test ciphers so that they contain statistical properties | + | * Guide the generation of test ciphers so that they contain features and statistical properties that are very similar to Z340. |
* Cryptanalyze the test ciphers, and recover their plaintext messages. | * Cryptanalyze the test ciphers, and recover their plaintext messages. | ||
* If all the test ciphers can be solved reliably, but the same procedure fails to solve Z340, then we have stronger evidence that the chosen hypothesis is false. | * If all the test ciphers can be solved reliably, but the same procedure fails to solve Z340, then we have stronger evidence that the chosen hypothesis is false. | ||
− | If we are lucky, one of the schemes will lead to a solution for Z340, or the newly gained knowledge will guide the search to more plausible encipherment schemes. | + | * Select another hypothesis and repeat the process. |
+ | This method is labor intensive but helps to rule out specific schemes. If we are lucky, one of the schemes will lead to a solution for Z340, or the newly gained knowledge will guide the search to more plausible encipherment schemes. | ||
= Approach for generation of test ciphers = | = Approach for generation of test ciphers = | ||
+ | |||
+ | Test cipher production consists of two phases. First, candidate plaintexts are randomly selected from a large collection of material from [https://www.gutenberg.org/ Project Gutenberg]. The combined material amounts to over 3.6 million words of English text. Plaintexts that, under a given scheme, do not retain some of the desired features described below are discarded. | ||
== Production of plaintexts == | == Production of plaintexts == | ||
+ | |||
== Mimicking features and statistics of Z340 == | == Mimicking features and statistics of Z340 == |
Revision as of 03:03, 27 June 2016
Contents
- 1 Introduction
- 2 Methodology
- 3 Approach for generation of test ciphers
- 4 Base Hypothesis
- 5 Hypothesis 1: Z340 is a monoalphabetic homophonic substitution cipher.
- 6 Hypothesis 2: Columnar transposition was applied to the plaintext prior to homophonic substitution
- 7 Hypothesis 2: Scytale transposition was applied to the plaintext prior to homophonic substitution
Introduction
The Zodiac Killer's 340-character cryptogram (referred to here as Z340) has many things in common with his 408-character cryptogram (Z408), but still remains unsolved. It seems very likely that Z340 is not constructed using the same method as Z408, otherwise it would have been solved by now. Many software tools have been built to effectively solve homophonic substitution ciphers that have the same properties as Z408. If Z340 was a simple homophonic substitution cipher, it would have been solved by now by one of the numerous manual or automatic attempts.
Zodiac may have used some other scheme to produce Z340. In their attempts to crack Z340, people have explored many different schemes, but there is not yet a comprehensive study about which schemes are feasible. Moreover, it is not clear if a particular scheme can be broken if Z340 uses it. We need a comprehensive way to answer these questions about a scheme:
- Is it possible to create a Z340-like cipher using this scheme?
- Can a Z340-like cipher created using this scheme be cracked reliably?
- Is this scheme more likely than other schemes to produce some of the unusual features observed in Z340?
Methodology
In this article I will present my approach to answer the questions above. Here is my strategy:
- First, define a hypothesis for Z340's encipherment scheme. The hypothesis is a collection of statements about properties of Z340's plaintext, and the operations performed on the plaintext to turn it into ciphertext.
- Create at least 100 test ciphers under the scheme.
- Guide the generation of test ciphers so that they contain features and statistical properties that are very similar to Z340.
- Cryptanalyze the test ciphers, and recover their plaintext messages.
- If all the test ciphers can be solved reliably, but the same procedure fails to solve Z340, then we have stronger evidence that the chosen hypothesis is false.
- Select another hypothesis and repeat the process.
This method is labor intensive but helps to rule out specific schemes. If we are lucky, one of the schemes will lead to a solution for Z340, or the newly gained knowledge will guide the search to more plausible encipherment schemes.
Approach for generation of test ciphers
Test cipher production consists of two phases. First, candidate plaintexts are randomly selected from a large collection of material from Project Gutenberg. The combined material amounts to over 3.6 million words of English text. Plaintexts that, under a given scheme, do not retain some of the desired features described below are discarded.
Production of plaintexts
Mimicking features and statistics of Z340
Features:
- Unigram distribution
- Bigram distribution
- Trigram distribution
- TODO
Base Hypothesis
TODO
Hypothesis 1: Z340 is a monoalphabetic homophonic substitution cipher.
TODO
Hypothesis 2: Columnar transposition was applied to the plaintext prior to homophonic substitution
TODO
Hypothesis 2: Scytale transposition was applied to the plaintext prior to homophonic substitution
TODO