Encyclopedia of observations

This page is meant to collect the many factual observations and studies people have made about the ciphers over the years. It is a work in progress.

=408-character cipher observations=

Ngram observations

 * Cipher contains 62 repeated bigrams (illustration), 11 repeated trigrams (illustration), and 2 repeated quadgrams (illustration).
 * In a test of 1,000,000 random shuffles, none had 62 or more repeated bigrams (Plot of distribution), and the average number of repeats was 27.
 * In a test of 1,000,000 random shuffles, none had 11 or more repeated trigrams (Plot of distribution), and the average number of repeats was 0.5.
 * In a test of 1,000,000 random shuffles, 249 (0.025%) had 2 or more repeated quadgrams (Plot of distribution), and the average number of repeats was 0.0095.
 * 5-gram repeating fragments study
 * TODO: Multidirectional ngram study (including multiple distances, and repeating fragments)
 * TODO: Periodicity of repeating fragments study
 * Reading the cipher text from left to right and top to bottom, the longest sequence containing no repeated bigrams starts at position 182, and has length 100 (Illustration)

Repeated symbols by columns and rows

 * TODO

Homophone cycles

 * TODO

Other observations

 * The last 18 letters of the 408 decode to a sequence of gibberish: EBEORIETEMETHHPITI
 * Many symbols among the last 18 symbols are found directly above in the same columns. The "QEHM" sequence is particularly noteworthy.  (Image) (Source: glurk)
 * There are misspelled words in the plaintext when the Harden key is applied. When some words are corrected, the symbols assigned to the corrected plaintext letters often resemble the symbols assigned to the original erroneous plaintext letter.  (Image).  Several homophone cycles show improved regularity when these corrections are made (More details)
 * Several symbol/letter assignments in the 408 key seem to reflect adjacencies on QWERTY-layout keyboards (the standard keyboard layout for typewriters and computer keyboards). (Source)
 * Some plaintext appears to be missing directly in the boundary between Parts 2 and 3 of the cipher ("ALL THE ???? I HAVE KILLED WILL BECOME MY SLAVES")  (Annotated plaintext)
 * More details on errors in the cipher text and on the Hardens' key
 * The homophone cycle sequence for plaintext letter "L" is not as regular as cycle sequences for other common letters (i.e., his symbols assignments for plaintext "L" were more random) (More details on cycles)  The Hardens said they guessed Zodiac would have used the word KILL repeatedly, and an enciphered bigram representing LL occurs 6 times in the cipher text.  Perhaps a more regular cycle for L, involving more symbols, would have thwarted this guess by the Hardens.
 * Many of the highly regular homophone cycles break down in Part 3 of the cipher text for some unknown reason (More details)
 * The "Concerned Citizen" key, sent to Sgt. Lynch of Vallejo PD 2 days after the Hardens' solution was published, contained some differences from the Hardens' key (More details)
 * "All roads lead to E" phenomenon of the key (Explanations and details) (Image 1) (Image 2)
 * Adjacent sequences in the 408's key (More info)
 * Some symbols in the 408 were not reused in the 340's alphabet (Image)

=340-character cipher observations=

Ngram observations

 * Cipher contains 25 repeated bigrams (illustration), 2 repeated trigrams (illustration), and 0 repeated quadgrams.
 * In a test of 1,000,000 random shuffles, 110,147 (11%) had 25 or more repeated bigrams (Plot of distribution), and the average number of repeats was 20.
 * In a test of 1,000,000 random shuffles, 67,573 (7%) had 2 or more repeated trigrams (Plot of distribution), and the average number of repeats was 0.4.
 * In a test of 1,000,000 random shuffles, 7,857 (0.8%) had at least one repeated quadgram (Plot of distribution), and the average number of repeats was 0.008.
 * 5-gram repeating fragments study
 * The IOF trigram repeats in the same columns, and are spaced 8 lines apart, reminiscent of the 8-line height of each of the original 3 parts of the 408 cipher. (Illustration)
 * TODO: Multidirectional ngram study (including multiple distances, and repeating fragments)
 * TODO: Periodicity of repeating fragments study
 * Reading the cipher text from left to right and top to bottom, the longest sequence containing no repeated bigrams starts at position 168, and has length 123 (Illustration)
 * TODO: periodic ngram / fragment test of individual halves of the 340 (horizontally and vertically), as well as individual "even/odd" transformations.

Unusual biases in the number of bigram repeats

 * The lower left of the cipher text seems to contain very few repeated bigrams (see illustration)
 * The top half of the cipher text, considered on its own, contains 9 repeated bigrams. However, the bottom half of the cipher text, considered on its own, contains only 1 repeated bigram.
 * In 1,000,000 shuffles, only 2.4% of them had halves with a repeated bigram discrepancy as large as the one observed in the 340 (i.e., a difference of at least 8 repeated bigrams between the halves).
 * (Distribution of repeated bigram discrepancy among shuffles)
 * By contrast, the 408's top half has 17 repeated bigrams and its bottom half has 14.
 * The cipher text also shows a bias in repeated bigram counts within even positions and odd positions.
 * If you remove all symbols that are in even-numbered positions, there are only 2 repeated bigrams.
 * In this case, there are also 0 repeated trigrams.
 * If you remove all symbols that are in odd-numbered positions, there are 10 repeated bigrams.
 * In this case, there are also 2 repeated trigrams.
 * (Illustration)
 * The difference in bigram repeats between both cases is 8. During shuffle tests, the difference was 8 or higher in only 2.4% of shuffles.  This is the same difference and percentage found for the top half / bottom half bias.
 * TODO: The appearance of the box corner patterns
 * TODO: ngram period bias

Repeated symbols by columns and rows

 * TODO

Homophone cycles

 * TODO: Basic study
 * TODO: Detection of cycles among all the manipulations we looked at for ngrams

Other observations

 * Some symbols in the 340 were not in the 408's alphabet (Image)
 * The paper the cipher is written on contains several Fifth Avenue watermarks (here's glurk's animation to help you spot where they appear).
 * The first 3 symbols of the 340 do not reappear soon. It starts with "HER", and H only appears 3 other times, E only appears 2 other times. (From Duman)
 * In the third column, the 3 occurrences of the "R" symbol are evenly spaced, each separated by four rows. (From Wrench)
 * The author of the cipher does not use the forward-facing letter Q as a cipher symbol, but uses a backwards-facing Q instead. Similarly, he does not use the forward-facing letter C as a cipher symbol in the 408-character cipher, but uses a backwards-facing C instead.  Why doesn't he use up all the normal alphabetic symbols before resorting to additional symbols and variations?
 * The first repeated symbol occurs at the 19th position. Thus the first 18 symbols contain no repeats.  Coincidentally, the last 18 symbols of the 408 cipher do not form a legible solution.  (From traveller1st)
 * The most frequently occurring symbol, +, occurs 24 times. Only once does it fall on a prime-numbered position in the cipher text (counting from 1 to 340), against expectations.  Also both occurrences of the X symbol fall on prime positions against expectations.  (More detail) (From Dan Johnson)  (TODO: Prime phobia distribution study for all symbols)
 * The second most frequently occurring symbol, B, occurs 12 times. Only once does it fall on a prime-numbered position in the cipher text (counting from 1 to 340), against expectations.  (More detail) (From Dan Johnson)
 * The upper loop of the "B" symbol on line 19 column 9 is larger than the lower loop, making the symbol look disproportional and distinct from the other "B" symbols. (From Doc.  Doc also speculates the symbol was original a "P" and the author corrected it by adding the bottom loop.)
 * The last occurrence of the "+" symbol is wider than the others. (From Doc.  Doc speculates that the author "hesitated" on this symbol.)
 * Cipher symbols seem to get larger as the cipher goes on. Row 20 seems larger than Row 1.  (From Doc.  Doc speculates the author wrote the cipher from top to bottom, and was tiring.)
 * The backwards D at the end of the first line appears to have a dot in it (Source: comment from "The McNX")
 * The symbols that represent "R" in the 408 are [[File:br.jpg]] and [[File:backslash.jpg]]. Both symbols are missing from the 340 cipher.  (Source: Wier)
 * The symbol '+' is frequently adjacent (in all directions, not just left/right) to the symbol 'R'. (Source: http://www.zodiackillersite.com/viewtopic.php?p=39791#p39791).  'B' is the 2nd most common symbol in the cipher (12 appearances), but for some reason it is adjacent to only one of the 24 '+' symbols.
 * A tiny 'R' is written at the bottom right corner of the page. Similarly, information is written at the bottom right corner of a section of the 408, and the bottom right corner of the letter containing the map code.
 * Cycles of length 2 are biased towards odd-numbered positions in the cipher text
 * The 340 has 9 rows that each have no repeated symbols. By comparison, the 408 has 6.  Moreover, in the 340, there are two triplets of rows that show symmetry about the vertical midpoint of the cipher text.  See .  (Image of row repeat comparisons)
 * A forward-K appears to be scratched out, and corrected with a backwards K symbol. (Image)

= 408 and 340 =
 * "The 408 and 340 ciphers fit exactly into two standard-size pages ruled into half-inch squares. The 408 will use one page plus one inch from another page. The second page will then fit the 340 exactly." (from Jem) (Open question: Is the 340 a continuation of the 408?)
 * "An 8 1/2 X 11 inch paper divided into 1/2" squares yields 17X22 or 374 separate squares. Two pages together: 374 X 2= 748. The two ciphers together? 748 symbols." (from entropy)