Quadrant analysis

From Zodiac Ciphers
Jump to: navigation, search

Introduction

There is speculation that the 340-character cipher is arranged differently than the 408-character cipher. The 408 is arranged as normal text, read from left to right, top to bottom. The solution to the 340 has not yet been found. Perhaps this is because the text of the 340 is arranged in some other fashion than left-to-right, up-to-down.

Some speculations of the 340 point out coincidences between it and the Zodiac killer's Halloween card (source). Could the 340 be arranged in a quadrant-based layout, as suggested by this drawing the killer included in the Halloween card?

Approach

This analysis investigates the repeated n-gram counts in the cipher texts, and the effects of quadrant-based layouts on those counts.

The cipher text is split into quadrants by placing a horizontal line and vertical line between rows and columns of cipher symbols. The intersection of these two lines creates 4 regions of rectangular cipher text, split from the original cipher block. A new cipher text is then created with the content of each region.

The location of the quadrant intersection is defined by a point (i,j), where i is a row, and j is a column. Let w be the width, in characters, of the cipher, and h be the height, in characters, of the cipher. Rows and columns are numbered starting with 0. When the point (i,j) is selected, the quadrants are determined as follows:

  • Upper-left quadrant: Rows 0 through i-1, Columns 0 through j-1
  • Upper-right quadrant: Rows 0 through i-1, Columns j through w-1
  • Lower-left quadrant: Rows i through h-1, Columns 0 through j-1
  • Lower-right quadrant: Rows i through h-1, Columns j through w-1

Additionally, to account for the possible presence of a border between the quadrants, this analysis includes the border flag. When the flag is false, no borders are considered, and all of the cipher text is used. When the flag is true, the cipher symbols in row i and column j are deleted before analysis. This results in the following quadrant definitions:

  • Upper-left quadrant: Rows 0 through i-1, Columns 0 through j-1
  • Upper-right quadrant: Rows 0 through i-1, Columns j+1 through w-1
  • Lower-left quadrant: Rows i+1 through h-1, Columns 0 through j-1
  • Lower-right quadrant: Rows i+1 through h-1, Columns j+1 through w-1

Finally, this analysis includes rotation and flip operations which are performed on the cipher text before quadrants are determined. Each cipher text is considered in four configurations:

  • No rotation, unflipped (x=R0F0)
  • Rotated 90 degrees CW, unflipped (x=R90F0)
  • No rotation, flipped horizontally (x=R0F1)
  • Rotated 90 degrees CW, flipped horizontally (x=R90F1)

Other combinations of rotations and flips are excluded because the n-gram counts would be equivalent to one of the above configurations.

So, the total number of combinations of selections of (i, j, border, x) tested here is:

  • h * w * 2 * 4 = 17 * 24 * 2 * 4 = 3264 (for the 408)
  • h * w * 2 * 4 = 17 * 20 * 2 * 4 = 2720 (for the 340)

How many of these combinations produce n-gram counts that are superior to the original, unaltered ciphers?

The n-gram counts are factored into a single score which measures the average rank of each n-gram count when compared to all counts. For example, the rank for a bi-gram measurement is assigned a value from 0 to 1 by dividing the bi-gram count by the maximum bi-gram count for all variations of the cipher tested. Each of the n-gram rankings where n={2, 3, 4, 5} is summed and divided by 4 to produce an average rank. We use this composite rank to order the resulting tested combinations of variables.

Each rank is split into two types of counts:

  • Total number of repeated n-grams
  • Total count of unique n-grams that are repeated

We are looking for combinations of variables that produce more "patterns", or repeated n-grams, in the resulting cipher text.

Data

Observations

  • Based on the rankings of unique n-gram counts, only three combinations of variables produced a ranking that is higher than the unmodified 408's ranking. The remaining 3261 possible combinations of variables produced a ranking that is equal to or lower than the unmodified 408's ranking.
  • However, based on the rankings of unique n-gram counts, 547 combinations of variables produced a ranking that is higher than the unmodified 340's ranking. This amounts to 20% of all possible combinations of variables. This is a significant difference from what we see in the 408's analysis.

Here are plots showing repeated 2-gram counts for the non-rotated and non-flipped 408 and 340 ciphers, showing how the 2-gram counts are affected by varying the quadrant intersection point (i,j). The bottom front axis is the column j. The right axis is the row i. The left axis is the count of repeated 2-grams.

408-2gram-counts.png

340-2gram-counts.png

On the plots, the points at column j=0 indicate 2-gram counts for the cipher when it is not using quadrants. Note how the maximums occur for the 408 near these points, but no such consistent maximums occur for the 340 at the same spot. This may suggest that the 340 uses an enciphering scheme that is different from the one used for the 408.

Open Questions

  • Does the large number of modified 340 ciphers with superior n-gram rank to the unmodified 340 give us confidence that its enciphering scheme is not exactly similar to how the 408 is enciphered?
  • Do ciphers constructed similarly to the 408 show similar results to the 408 when subjected to quadrant analysis? Specifically, do very few arrangements of quadrants produce superior n-gram rankings to the unmodified ciphers?
  • What affect do quadrant layouts have on the number of sequential homophones detected algorithmically?