Pivots

= Introduction =

The unsolved 340 cipher contains a mysterious pattern of two pairs of repeating, intersecting tri-grams, which we will call "pivots". Each pair consists of three unique cipher symbols, joined by a shared fourth symbol.



Did these patterns appear naturally, simply from creating a normal homophonic substitution cipher, read from the left to right and top to bottom? Or are they clues that hint at a different kind of encipherment technique? Has the cipher remained unsolved for all these years because of some strange unconventional encipherment technique used by the cipher author?

Several patterns found in the 340 cipher can be easily explained by random chance. Here are some examples:

Jazzerman's "diagonal repetitions":



This kind of pattern occurs very naturally, without any intent by the cipher author. See the thread for details about how we can conclude this.

"Jazzerman pairs" are symmetrical patterns of linked pairs. At first glance they look quite interesting:



But as this analysis shows, those patterns, too, can be shown to be chance occurrences.

Another example is the "repeated structure" pattern. Here's one discovered by traveller1st:



This pattern looks very compelling, but many like it occur naturally, so there is no reason to conclude that one of the many occurrences is intentionally placed by the cipher author.

Can we say the same thing about the two pivots we find in the 340 cipher? Are they meaningless patterns or is there something more behind them?

In this thread we explore how often two such pivots can occur when the entire 340 cipher is randomly shuffled like a deck of cards. The result is that about one in a million random shuffles results in a pair of pivots, appearing somewhere in the shuffled cipher text, with the same orientation as the ones found in the original 340 cipher. This implies that the pivot patterns are quite difficult to arrive at by chance alone.

But let's look at this again, from the plaintext perspective. If the 340 is a standard homophonic substitution cipher, read normally from left to right and top to bottom, then every cipher symbol has exactly one plaintext letter associated with it. So, for a pivot to form in the cipher text, there must be a pivot in the plaintext, formed by a trigram that repeats at least twice, joined by a single plaintext letter. Here is an example of a pivot forming in some plain text:

The "ETH" trigrams appear twice, in two different directions, and are joined at the letter "E".

If the Zodiac killer's unsolved 340-character cipher is a standard homophonic substitution cipher, read normally from left to right and top to bottom, then two things must have happened for those two pivots to appear:


 * 1) At least two pivots, pointing in the same directions, must have appeared in the plain text
 * 2) The homophonic encoding of the letters in the pivots must have preserved the pivots.

We want to try to estimate the likelihood of those two things happening.

So, how often do pivots naturally form in plaintext? Let's investigate this question using an experiment that samples a large space of possible plaintext.

= Definitions =

First, some definitions:


 * A pivot is a pattern of text that has at least two straight legs that are identical. There may be up to four legs in a single pivot.  We ignore other directions such as diagonals.
 * A leg is a sequence of at least N characters. N is the leg length.
 * A pivot's legs are all joined via a single character. This character is not included in the leg length.
 * The order of characters within matching legs is the same.
 * Each leg is referred to by its orientation: North, East, South, or West.
 * A pair of legs is referred to by their combined orientations: North East, North South, North West, East South, East West, or South West

There are two pivots in the 340 cipher.


 * The first pivot has a pair of matching legs.
 * The leg length is 3.
 * The sequence of symbols is "RJ|".
 * The pair of matching legs is joined by the symbol "*".
 * The legs have North and West orientations.
 * The second pivot has a pair of matching legs.
 * The leg length is 3.
 * The sequence of symbols is "b.c".
 * The pair of matching legs is joined by the symbol "V".
 * The legs have North and West orientations.

= Pivot examples in plaintext = Here are some other examples of pivots, each with leg length 3, found in some blocks of plain text:


 * One pair of legs: East South (from an excerpt of Gulliver's Travels, by Jonathan Swift):


 * One pair of legs: North South: (from an excerpt of Gulliver's Travels, by Jonathan Swift):


 * Three legs, 3 possible pairs of legs: East South, East West, South West (from an excerpt of the King Jams Bible):


 * Four legs, six possible pairs of legs: North East, North South, North West, East South, East West, South West: (from an excerpt of The Book Of Mormon):

= Experiment details =

So, we want to see how often can we find two pivots, each with leg length of 3, occurring naturally in samples of plain text that are written in a 340-character block similar to the Zodiac cipher. We designed an experiment to try to determine this.

First, we obtained a large selection of free books from Project Gutenberg in text format. The experiment uses 392 books. We also used a corpus of all of the Zodiac's known correspondences. Next, an algorithm processes the text by converting all letters to upper case, and removing all spaces, new lines, punctuation, numbers, and any other non-alphabetic characters. This resulted in a single line of 142,398,107 characters.

We then used an algorithm to sample every 340 characters of the resulting 142,398,107 characters. Each 340 character sample is written in a block of 20 rows and 17 columns. Then, the algorithm searches for occurrences of pivots. This results in a search of 418,817 blocks, each 340 characters in length.

= Results for pivots pointing in any direction =

First, we ignored the orientations of pivots, and counted any test that produced pivots with legs of size 3 pointing in any direction.

Results:


 * 129940 tests found at least 1 pivot (31.03%)
 * 25770 tests found at least 2 pivots (6.15%)
 * 4084 tests found at least 3 pivots (0.98%)
 * 580 tests found at least 4 pivots (0.14%)
 * 125 tests found at least 5 pivots (0.030%)
 * 64 tests found at least 6 pivots (0.015%)

Here is an example test that found 2 pivots in the King James Bible:

Another from the King James Bible:

Some tests produced a very large number of pivots, generally caused by the appearance of many repeated words or phrases in the samples of text.

Consider this excerpt, also from the King James Bible:

''...thirty and nine. The Nethinims: the children of Ziha, the children of Hasupha, the children of Tabbaoth, The children of Keros, the children of Siaha, the children of Padon, The children of Lebanah, the children of Hagabah, the children of Akkub, The children of Hagab, the children of Shalmai, the children of Hanan, The children of Giddel, the children of Gahar, the children of Reaiah, The children of Rezin, the children of Ne...''

It contains many repetitions of the phrase "this children of", resulting in 21 detected pivots, clustered around the repetitions:

An excerpt of "Ulysses" by James Joyce produces 110 pivots, the most found during these tests, due to the repetitive nature of the text:

"...ercy Apjohn, the childman weary, the manchildin the womb. Womb? Weary? He rests. He has travelled.  With?  Sinbad the Sailor and Tinbad the Tailor and Jinbad the Jailer and Whinbad the Whaler and Ninbad the Nailer and Finbad the Failer and Binbad the Bailer and Pinbad the Pailer and Minbad the Mailer and Hinbad the Hailer and Rinbad the Railer and Dinbad the Kailer and Vinbad the Quailer and Linbad the Yailer and Xinbad..."

Our test results suggest there is a 6.15% chance of producing at least two pivots, in any orientation, when writing plain text out in a 20 by 17 block of text. But in the Zodiac killer's 340 character cipher, its two pivots appear with the same orientation. What are the chances that such pivots appear in our tests?

= Results for pivots pointing in the same direction =

The modified test, which only counts pivots that appear with the same orientation, had these results:


 * 129940 tests had at least 1 pivot (31.03%)
 * 6177 tests had at least 2 pivots (1.47%)
 * 340 tests had at least 3 pivots (0.081%)'''
 * 90 tests had at least 4 pivots (0.021%)
 * 50 tests had at least 5 pivots (0.012%)
 * 41 tests had at least 6 pivots (0.0098%)

Of the 6177 tests that had at least 2 pivots, 1085 of them (0.26% of all tests) were oriented in the North and West directions like the pivots in the Zodiac killer's 340 cipher.

Here are some examples that have at least 2 pivots with the same orientation:


 * Excerpts from Gulliver's Travels, by Jonathan Swift:

Here's a pair of pivots with the same orientation as the ones in the Zodiac's 340 cipher:


 * Literary Blunders, by Henry Wheatley

The highest number of pivots found was again an excerpt of "Ulysses" by James Joyce, in which 63 pivots were found with North and East orientations:

= Letter frequencies within the pivots =

Here are the letter distributions within pivots of tests that found at least two pivots in the same direction:

This appears to match the expected distribution of letters in English text, although the frequencies seem to be skewed higher for the more common letters. By comparison, here are the letter frequencies for the entire tested corpus:

= Homophonic encipherment after pivots are seen in the plain text =

The results suggest that when you write out some plain text in a 340-character block with 20 rows and 17 columns, there is about a 1.47% chance that at least two pivots will appear pointing in the same direction. The plain text must then be enciphered via homophonic substitution. Since the goal of homophonic substitution is to flatten out the frequencies of common plain text letters, there is a good chance that the pivots that appear in the plain text will no longer appear in the cipher text. This is because letters that are the same in legs of a pivot might be assigned to different cipher text symbols.

What are the odds that homophonic substitution will preserve the pivots? Let's assume the cipher author is not aware of pivot patterns in the plain text, and is just assigning symbols based on some simple homophonic scheme, such as sequential assignment or random assignment.

Let's use a simple example: a pivot that has two legs, each containing the trigram "THE". Each of the letters in "THE" is very common, so we'd expect some number of different cipher symbols to be assigned to the same plaintext letters. Let's say that we can assign two different symbols to each letter in "THE" to help flatten the letter frequencies.

The homophonic scheme is unknown, and the sequence of occurrences of "T" in the plaintext is unknown. So to simplify the analysis, we assume that a symbol is selected with a fair probability, like a coin toss. For instance, if "heads", we assign symbol A to plaintext T. If "tails", we assign symbol B to plaintext T.  The possible outcomes for each leg's "T"s are:  AA, AB, BA, BB. Two outcomes, AA and BB, result in symbol matches between the legs, therefore there is a probability of 0.5 that the symbols match.

Similarly, there is a probability of 0.5 that the symbols assigned for each "H" match. And, there is a probability of 0.5 that the symbols assigned for each "E" match.

For all three pairs of assignments to match, which preserves the matching legs of a pivot, the probability is: 0.5 * 0.5 * 0.5 = 0.125 (1 in 8).

Here is another way to look at it. If we write out the two legs like this (ignoring the joining character):

THE THE

We must assign one of two possible symbols for each plaintext letter. Let's say the possibilities for each letter are:


 * T: U and V
 * H: W and X
 * E: Y and Z

Then all the possible encodings of "THE THE" are:

UWY UWY, UWY UWZ, UWY UXY, UWY UXZ, UWY VWY, UWY VWZ, UWY VXY, UWY VXZ, UWZ UWY, UWZ UWZ, UWZ UXY, UWZ UXZ, UWZ VWY, UWZ VWZ, UWZ VXY, UWZ VXZ, UXY UWY, UXY UWZ, UXY UXY, UXY UXZ, UXY VWY, UXY VWZ, UXY VXY, UXY VXZ, UXZ UWY, UXZ UWZ, UXZ UXY, UXZ UXZ, UXZ VWY, UXZ VWZ, UXZ VXY, UXZ VXZ, VWY UWY, VWY UWZ, VWY UXY, VWY UXZ, VWY VWY, VWY VWZ, VWY VXY, VWY VXZ, VWZ UWY, VWZ UWZ, VWZ UXY, VWZ UXZ, VWZ VWY, VWZ VWZ, VWZ VXY, VWZ VXZ, VXY UWY, VXY UWZ, VXY UXY, VXY UXZ, VXY VWY, VXY VWZ, VXY VXY, VXY VXZ, VXZ UWY, VXZ UWZ, VXZ UXY, VXZ UXZ, VXZ VWY, VXZ VWZ, VXZ VXY, VXZ VXZ

Only 8 of the 64, or 1 in 8, possible encodings above have identical legs, preserving the pivot. They are in boldface.

So, the two events that need to happen are: Two pivots appear in the cipher text, AND encipherment preserves them. The probability that both of these events happen, given that there is a probability of 0.0147 of the two pivots appearing with the same orientation in the plaintext, is then: 0.0147 * 0.125 * 0.125 = 0.00023 (about 1 in 4,354).

It seems more likely that common letters would have more than 2 cipher symbol assignments. For instance, the Zodiac killer's solved 408 character cipher has the following number of symbol assignments for each of the common letters:


 * E: 7
 * T: 4
 * S: 4
 * O: 4
 * N: 4
 * I: 4
 * A: 4
 * R: 3
 * L: 3

What are the odds of cipher text pivots appearing if there are, on average, three possible symbol assignments for the plaintext letters in the legs of the two pivots?

The probability that each pair of symbol assignments matches for a single pivot is then: 0.33 * 0.33 * 0.33 = 0.037 (1 in 27).

Then the probability that two pivots will appear in the cipher text, given that there is a probability of 0.0147 of the two pivots appearing in the plaintext, is: 0.0147 * 0.037 * 0.037 = 0.00023 (about 1 in 49,592).

What if there are 4 possible symbols for each letter in the pivot legs?

The probability that each pair of symbol assignments matches for a single pivot is then: 0.25 * 0.25 * 0.25 = 0.016 (1 in 64)

The two-pivot probability then becomes 3.59 x 10^6 = about 1 in 278,639

Therefore, if there is more homophonic substitution going on, it gets harder for two pivots with the same orientation to appear in the cipher text. The unsolved 340 cipher has more unique symbols than the 408, so we'd expect a high amount of frequency flattening, reducing the chances that plaintext pivots are accidentally preserved in the cipher text.

= Conclusions =

Here are some of the possibilities for the 340 cipher based on this analysis (can you think of any others?):


 * The plaintext is written and encoded using the same approach the killer used with the 408 character cipher, but the plaintext has an unusual number of repeated words or phrases, increasing the chances of pivot formation.
 * The plaintext is written and encoded using the same approach the killer used with the 408 character cipher, and the plaintext has a "normal" distribution of words or phrases, but encipherment just happened to beat the odds on the accidental formation of pivots.
 * When coming up with symbol assignments, the cipher author intentionally preserved the pivots he saw in the plain text.
 * The cipher is encoded using some other scheme that increases the likelihood of pivot formation.
 * The cipher is balderdash, making articles such as this one quite silly.

Overall, it still seems unlikely that the pivots formed by accident. But it's still possible that they did.

So far, we've ignored the possibility that some cipher symbols may represent more than one plaintext letter. In the 408 cipher, some cipher symbols are known to behave this way, either intentionally by the author, or by accident via encipherment errors and/or misspellings. If the killer assigned the same symbols of the 340 cipher to multiple plain text letters, then the cipher text pivots could be appearing where there were no corresponding plain text pivots.

= Open questions, Ideas for future research =


 * Include a simulation of various kinds of homophonic encipherment (sequential and random, and a mix of the two), to get a better sense of how often the pivots are preserved after encipherment.
 * Do repetitions of other patterns in the 340 cipher, in multiple orientations, suggest a different encipherment technique, such as transposition?
 * Some of the tests found pivots that overlap. Perhaps we should modify the experiments to exclude them.  How greatly would this reduce the pivot counts?