Quadrant analysis Part 2
Contents
Introduction
I revisited quadrant analysis with two new goals:
- Exhaustively explore very many more combinations of arrangements of quadrants.
- Measure the quality of the arrangements by performing algorithmic searches for homophone candidates in the transformed cipher text.
The idea here is to create as many transformations of the cipher text as possible, and then applying a search for homophone candidates. We know that the killer used sequential homophones in the 408 character cipher, but he did not appear to do so in the 340 character cipher. However, what if the killer enciphered the plaintext using sequential homophones, and then applied a transformation to the cipher text, confounding any analysis that assumes the cipher text is read left-to-right, top-to-bottom?
Perhaps if we perform homophone measurements with many possible transformations of the cipher text, we might discover arrangements that have significantly more homophones than the original cipher text layout. Perhaps one of these arrangements is a correct transformation.
Algorithmic detection of homophones
Firstly, what is a homophone? In this article, I use the term to refer to any of a set of cipher text symbols that have the same plain text assignments. For example, if three different cipher symbols are assigned to the plain text letter "E", the symbols are homophones for "E". Homophonic substitution is the application of homophones to conventional single substitution ciphers.
Symbols within a set of homophones can be assigned in two different ways:
- Random: Symbols are selected randomly from the homophones assigned to a plain text letter.
- Sequential: Symbols are selected in a predetermined sequence from the homophones assigned to a plain text letter.
King and Bahler's 1993 paper An Algorithmic Solution of Sequential Homophonic Ciphers describes a very efficient algorithm for detecting homophone sequences in cipher texts that employ sequential assignments of homophones to plaintext letters. Their algorithm is designed to detect homophones in the input cipher text. The detected homophones are removed from the input cipher text, and then the algorithm outputs a simple single substitution cipher, which is much more vulnerable to attacks using frequency analysis and heuristics since the letter frequencies are no longer flattened by the use of homophones.
My version of the algorithm measures how many homophones are detected instead of removing them. The score gives us a way to compare transformations of the cipher text, so we can see if certain transformations have a larger number of homophones than the original cipher text. If a correct transformation is tested, then perhaps its homophone score will be higher than the score produced for other invalid transformations.
The control for this experiment is the 408. We know that the 408 does not use any sort of transformation; it reads left-to-right, top-to-bottom. Therefore, we can test the validity of this experiment by looking for transformations of the 408 that produce more detected homophones than the original 408. If too many of these transformations are found, then the premise of this experiment is weakened.
Transformations
The previous quadrant analysis defined transformations using the following variables:
- i,j: The center point that defines the intersection of all four quadrants.
- border: If true, the plain text within row i and column j are ignored.
- x: One of four possible rotation/flip operations that is applied to the original cipher text in its entirety before the quadrants are extracted and analyzed.
This resulted in tests of 3,264 possible transformations of the 408, and 2,720 possible transformations of the 340. But the are many more possible transformations to consider. In this new experiment, I define transformations using these variables:
- i,j: The center point that defines the intersection of all four quadrants. Some combinations of i and j produce quadrants of zero size. For example, if (i,j) = (0,0), then the original cipher text is treated as a single quadrant. If (i,j) = (10,0), then only two quadrants are produced (above and below row 10). This is the same as the previous experiment.
- border: If true, the plain text within row i and column j are ignored. This is the same as the previous experiment.
- p: One of 24 possible ways to select and combine the four quadrants. Let's take the original cipher text and produce four quadrants. Let's number them 0, 1, 2, and 3. In the original experiment, the transformed cipher text was created by selecting quadrant 0, and converting it to a line of text that reads from left to right. Then, quadrant 1 is also converted to a line of text. The same for quadrants 2 and 3. Then, each line of text is combined into a single line of text, which is the transformed cipher text. But there are 23 other ways this can be done, which are each considered in this new experiment. Here are all 24 ways the quadrants can be selected and combined together:
- {0, 1, 2, 3}, {0, 1, 3, 2}, {0, 2, 1, 3}, {0, 2, 3, 1}, {0, 3, 1, 2}, {0, 3, 2, 1}, {1, 0, 2, 3}, {1, 0, 3, 2}, {1, 2, 0, 3}, {1, 2, 3, 0}, {1, 3, 0, 2}, {1, 3, 2, 0}, {2, 0, 1, 3}, {2, 0, 3, 1}, {2, 1, 0, 3}, {2, 1, 3, 0}, {2, 3, 0, 1}, {2, 3, 1, 0}, {3, 0, 1, 2}, {3, 0, 2, 1}, {3, 1, 0, 2}, {3, 1, 2, 0}, {3, 2, 0, 1}, {3, 2, 1, 0}
- r1,r2,r3,r4: Rotations of each quadrant. Each quadrant can be rotated 0, 90, 180, or 270 degrees clockwise.
- f1,f2,f3,f4: Flips of each quadrant. If a flip is performed, is it done after the rotation is performed (if necessary). Each quadrant takes one of the following values:
- 0: No flip
- 1: Flip horizontally.
- 2: Flip vertically.
- 3: Flip horizontally and vertically.
Combinations of rotations and flips where the flips are done vertically, or both horizontally and vertically, are redundant with other combinations. Thus, we only consider combinations that include horizontal or no flips.
- c: Concatenation type. As stated above, the transformed cipher text is created by splicing together (concatenating) each quadrant's cipher text as a single string. This experiment considers three possible concatenation operations selected while splicing together each quadrant. The operations are:
- c2: Concatenate down. Consider two quadrants, A and B. This operation creates a new block consisting of the quadrant block A arranged above quadrant block B.
- c0: Concatenate right top. Consider two quadrants, A and B. This operation creates a new block consisting of the quadrant block B arranged to the right of quadrant block A. The tops of both blocks in the new block are aligned.
- c1: Concatenate right bottom. Consider two quadrants, A and B. This operation creates a new block consisting of the quadrant block B arranged to the right of quadrant block A. The bottoms of both blocks in the new block are aligned.
c0 and c1 will produce the same result upon quadrants A and B if they are of equal height. Let's look at the original cipher block split into four quadrants A, B, C and D. They appear like this in the original cipher text:
AB CD
These experiments consider the following rearrangements of concatenations that use the above operations:
- ((A c2 B) c2 C) c2 D: This produces the following arrangement:
A B C D
- (A c0 B) c2 (C c0 D): This produces the following arrangement:
AB CD
The tops of A and B are aligned, and the tops of C and D are aligned. Any gaps that form are ignored.
- (A c1 B) c2 (C c1 D): This produces the following arrangement:
AB CD
The bottoms of A and B are aligned, and the tops of C and D are aligned. Any gaps that form are ignored.
- ((A c0 B) c0 C) c0 D: This produces the following arrangement:
ABCD
The tops of all four quadrants are aligned.
- ((A c1 B) c1 C) c1 D: This produces the following arrangement:
ABCD
The bottoms of all four quadrants are aligned.
These possible combinations of concatenations are considered in part because of the FBI analysis from Dan Olsen. Dan Olsen suggests, due to vertical symmetry of some of the statistics of the original 340, that the cipher text may have been first written out and then divided in half. Then, the halves are placed one on top of the other, forming the 340 cipher text block as we know it. This scenario, and a large number of additional variations, are accounted for in these experiments.
The experiment exhaustively tests the cipher texts generated by all combinations of these variables. The total numbers of combinations tested for each cipher are:
- 408 cipher: 25 * 18 * 2 * 24 * 4 * 4 * 4 * 4 * 2 * 2 * 2 * 2 * 5 = 442,368,000 combinations.
- 340 cipher: 21 * 18 * 2 * 24 * 4 * 4 * 4 * 4 * 2 * 2 * 2 * 2 * 5 = 371,589,120 combinations.
TODO: equivalence between quadrant transformation and "zigzag" paths.
Homophone detection algorithm
The algorithm used for this experiment is a modified version of REMOVE_HOMOPHONES described in King and Bahler's paper. The original algorithm performs the following steps to reduce the amount of computational work needed to search for homophone candidates:
- Each cipher text symbol is assigned a number. The numbers increase with each new cipher symbol that appears in the cipher text.
- For each cipher text symbol that occurs more than once in the cipher text, strings are extracted from the cipher text. Each string begins with the cipher text symbol, and ends with the symbol that occurs immediately before the repeated symbol's next occurrence.
- Consider the set of strings associated with a symbol. Each string in the set is reduced in length according to a few rules:
- Rule 1: Remove any symbol that is assigned a number less than the number assigned to the symbol that begins the string.
- Rule 2: Within each string, remove any symbol that appears more than once.
- Rule 3: Remove any symbols that do not appear in all strings within the set.
Homophone candidates are then extracted from these sets of reduced strings. This is done by inspecting every combination of symbols within the set of strings, and counting the number of occurrences. Combinations that appear in many strings within a set are considered strong homophone candidates. King and Bahler's algorithm removes the strong homophone candidates, replacing them with a single symbol that represents the multiple symbols within a homophone candidate. In this way, the algorithm reduces the key space of the homophonic cipher, into hopefully a simple substitution cipher, which is generally more vulnerable to decipherment.
My algorithm modifies the King and Bahler algorithm in the following ways:
- Instead of removing the strong homophone candidates, they are given a score which is tabulated when the algorithm is run on the many transformations of the original cipher text.
- Rule 3 in the above string reduction rules is modified thusly: Remove any symbols that do not appear in at least the first F strings. In the current experiments, F is set to 4. The purpose for this modification is to account for the observation that homophones in the 408 are not consistent. In much of the 408, homophones begin with orderly, sequential assignments, but then turn into random assignments. So, this modification is necessary to detect these "sloppy" homophones of the 408.
- The algorithm now counts the number of symbol combinations that remain in the set of reduced strings. If this number exceeds a threshold, the search for homophone candidates is halted. Otherwise, the algorithm becomes very slow, and testing hundreds of millions of transformed cipher texts becomes infeasible.
Example: Running the algorithm on the 408
Let's see how this works with one of the known sequential homophones of the 408. Consider the webtoy transcription of the 408:
9%P/Z/UB%kOR=pX=B WV+eGYF69HP@K!qYe MJY^UIk7qTtNQYD5) S(/9#BPORAU%fRlqE k^LMZJdr\pFHVWe8Y @+qGD9KI)6qX85zS( RNtIYElO8qGBTQS#B Ld/P#B@XqEHMU^RRk cZKqpI)Wq!85LMr9# BPDR+j=6\N(eEUHkF ZcpOVWI5+tL)l^R6H I9DR_TYr\de/@XJQA P5M8RUt%L)NVEKH=G rI!Jk598LMlNA)Z(P zUpkA9#BVW\+VTtOP ^=SrlfUe67DzG%%IM Nk)ScE/9%%ZfAP#BV peXqWq_F#8c+@9A9B %OT5RUc+_dYq_^SqW VZeGYKE_TYA9%#Lt_ H!FBX9zXADd\7L!=q _ed##6e5PORXQF%Gc Z@JTtq_8JI+rBPQW6 VEXr9WI6qEHM)=UIk
The algorithm assigns numbers for each unique symbol that appears in the cipher text. Here is the result:
9 01 % 02 P 03 / 04 Z 05 / 04 U 06 B 07 % 02 k 08 O 09 R 10 = 11 p 12 X 13 = 11 B 07 W 14 V 15 + 16 e 17 G 18 Y 19 F 20 6 21 9 01 H 22 P 03 @ 23 K 24 ! 25 q 26 Y 19 e 17 M 27 J 28 Y 19 ^ 29 U 06 I 30 k 08 7 31 q 26 T 32 t 33 N 34 Q 35 Y 19 D 36 5 37 ) 38 S 39 ( 40 / 04 9 01 # 41 B 07 P 03 O 09 R 10 A 42 U 06 % 02 f 43 R 10 l 44 q 26 E 45 k 08 ^ 29 L 46 M 27 Z 05 J 28 d 47 r 48 \ 49 p 12 F 20 H 22 V 15 W 14 e 17 8 50 Y 19 @ 23 + 16 q 26 G 18 D 36 9 01 K 24 I 30 ) 38 6 21 q 26 X 13 8 50 5 37 z 51 S 39 ( 40 R 10 N 34 t 33 I 30 Y 19 E 45 l 44 O 09 8 50 q 26 G 18 B 07 T 32 Q 35 S 39 # 41 B 07 L 46 d 47 / 04 P 03 # 41 B 07 @ 23 X 13 q 26 E 45 H 22 M 27 U 06 ^ 29 R 10 R 10 k 08 c 52 Z 05 K 24 q 26 p 12 I 30 ) 38 W 14 q 26 ! 25 8 50 5 37 L 46 M 27 r 48 9 01 # 41 B 07 P 03 D 36 R 10 + 16 j 53 = 11 6 21 \ 49 N 34 ( 40 e 17 E 45 U 06 H 22 k 08 F 20 Z 05 c 52 p 12 O 09 V 15 W 14 I 30 5 37 + 16 t 33 L 46 ) 38 l 44 ^ 29 R 10 6 21 H 22 I 30 9 01 D 36 R 10 _ 54 T 32 Y 19 r 48 \ 49 d 47 e 17 / 04 @ 23 X 13 J 28 Q 35 A 42 P 03 5 37 M 27 8 50 R 10 U 06 t 33 % 02 L 46 ) 38 N 34 V 15 E 45 K 24 H 22 = 11 G 18 r 48 I 30 ! 25 J 28 k 08 5 37 9 01 8 50 L 46 M 27 l 44 N 34 A 42 ) 38 Z 05 ( 40 P 03 z 51 U 06 p 12 k 08 A 42 9 01 # 41 B 07 V 15 W 14 \ 49 + 16 V 15 T 32 t 33 O 09 P 03 ^ 29 = 11 S 39 r 48 l 44 f 43 U 06 e 17 6 21 7 31 D 36 z 51 G 18 % 02 % 02 I 30 M 27 N 34 k 08 ) 38 S 39 c 52 E 45 / 04 9 01 % 02 % 02 Z 05 f 43 A 42 P 03 # 41 B 07 V 15 p 12 e 17 X 13 q 26 W 14 q 26 _ 54 F 20 # 41 8 50 c 52 + 16 @ 23 9 01 A 42 9 01 B 07 % 02 O 09 T 32 5 37 R 10 U 06 c 52 + 16 _ 54 d 47 Y 19 q 26 _ 54 ^ 29 S 39 q 26 W 14 V 15 Z 05 e 17 G 18 Y 19 K 24 E 45 _ 54 T 32 Y 19 A 42 9 01 % 02 # 41 L 46 t 33 _ 54 H 22 ! 25 F 20 B 07 X 13 9 01 z 51 X 13 A 42 D 36 d 47 \ 49 7 31 L 46 ! 25 = 11 q 26 _ 54 e 17 d 47 # 41 # 41 6 21 e 17 5 37 P 03 O 09 R 10 X 13 Q 35 F 20 % 02 G 18 c 52 Z 05 @ 23 J 28 T 32 t 33 q 26 _ 54 8 50 J 28 I 30 + 16 r 48 B 07 P 03 Q 35 W 14 6 21 V 15 E 45 X 13 r 48 9 01 W 14 I 30 6 21 q 26 E 45 H 22 M 27 ) 38 = 11 U 06 I 30 k 08
The algorithm then inspects each unique symbol of the cipher text. Let's look at the processing for the cipher text symbol "p", which is assigned the numeric value "12". The algorithm splits the cipher text into a set of strings based on where the p's occur. Here is the result:
Symbolic version: | Numerical version: |
pX=BWV+eGYF69HP@K!qYeMJY^UIk7qTtNQYD5)S(/9#BPORAU%fRlqEk^LMZJdr\ pFHVWe8Y@+qGD9KI)6qX85zS(RNtIYElO8qGBTQS#BLd/P#B@XqEHMU^RRkcZKq pI)Wq!85LMr9#BPDR+j=6\N(eEUHkFZc pOVWI5+tL)l^R6HI9DR_TYr\de/@XJQAP5M8RUt%L)NVEKH=GrI!Jk598LMlNA)Z(PzU pkA9#BVW\+VTtOP^=SrlfUe67DzG%%IMNk)ScE/9%%ZfAP#BV peXqWq_F#8c+@9A9B%OT5RUc+_dYq_^SqWVZeGYKE_TYA9%#Lt_H!FBX9zXADd\7L!=q_ed##6e5PORXQF%GcZ@JTtq_8JI+rBPQW6VEXr9WI6qEHM)=UI |
12 13 11 07 14 15 16 17 18 19 20 21 01 22 03 23 24 25 26 19 17 27 28 19 29 06 30 08 31 26 32 33 34 35 19 36 37 38 39 40 04 01 41 07 03 09 10 42 06 02 43 10 44 26 45 08 29 46 27 05 28 47 48 49 12 20 22 15 14 17 50 19 23 16 26 18 36 01 24 30 38 21 26 13 50 37 51 39 40 10 34 33 30 19 45 44 09 50 26 18 07 32 35 39 41 07 46 47 04 03 41 07 23 13 26 45 22 27 06 29 10 10 08 52 05 24 26 12 30 38 14 26 25 50 37 46 27 48 01 41 07 03 36 10 16 53 11 21 49 34 40 17 45 06 22 08 20 05 52 12 09 15 14 30 37 16 33 46 38 44 29 10 21 22 30 01 36 10 54 32 19 48 49 47 17 04 23 13 28 35 42 03 37 27 50 10 06 33 02 46 38 34 15 45 24 22 11 18 48 30 25 28 08 37 01 50 46 27 44 34 42 38 05 40 03 51 06 12 08 42 01 41 07 15 14 49 16 15 32 33 09 03 29 11 39 48 44 43 06 17 21 31 36 51 18 02 02 30 27 34 08 38 39 52 45 04 01 02 02 05 43 42 03 41 07 15 12 17 13 26 14 26 54 20 41 50 52 16 23 01 42 01 07 02 09 32 37 10 06 52 16 54 47 19 26 54 29 39 26 14 15 05 17 18 19 24 45 54 32 19 42 01 02 41 46 33 54 22 25 20 07 13 01 51 13 42 36 47 49 31 46 25 11 26 54 17 47 41 41 21 17 37 03 09 10 13 35 20 02 18 52 05 23 28 32 33 26 54 50 28 30 16 48 07 03 35 14 21 15 45 13 48 01 14 30 21 26 45 22 27 38 11 06 30 |
Next, for each string, any symbol that has a numerical assignment lower than the numerical assignment for that of the symbol "p" is removed. So, it removes any symbol whose number is less than 12:
Symbolic version: | Numerical version: |
pXWV+eGYF6H@K!qYeMJY^I7qTtNQYD5)S(#AflqE^LMJdr\ pFHVWe8Y@+qGDKI)6qX85zS(NtIYEl8qGTQS#Ld#@XqEHM^cKq pI)Wq!85LMr#D+j6\N(eEHFc pVWI5+tL)l^6HID_TYr\de@XJQA5M8tL)NVEKHGrI!J58LMlNA)(z pA#VW\+VTt^Srlfe67DzGIMN)ScEfA#V peXqWq_F#8c+@AT5c+_dYq_^SqWVeGYKE_TYA#Lt_H!FXzXADd\7L!q_ed##6e5XQFGc@JTtq_8JI+rQW6VEXrWI6qEHM)I |
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 19 17 27 28 19 29 30 31 26 32 33 34 35 19 36 37 38 39 40 41 42 43 44 26 45 29 46 27 28 47 48 49 12 20 22 15 14 17 50 19 23 16 26 18 36 24 30 38 21 26 13 50 37 51 39 40 34 33 30 19 45 44 50 26 18 32 35 39 41 46 47 41 23 13 26 45 22 27 29 52 24 26 12 30 38 14 26 25 50 37 46 27 48 41 36 16 53 21 49 34 40 17 45 22 20 52 12 15 14 30 37 16 33 46 38 44 29 21 22 30 36 54 32 19 48 49 47 17 23 13 28 35 42 37 27 50 33 46 38 34 15 45 24 22 18 48 30 25 28 37 50 46 27 44 34 42 38 40 51 12 42 41 15 14 49 16 15 32 33 29 39 48 44 43 17 21 31 36 51 18 30 27 34 38 39 52 45 43 42 41 15 12 17 13 26 14 26 54 20 41 50 52 16 23 42 32 37 52 16 54 47 19 26 54 29 39 26 14 15 17 18 19 24 45 54 32 19 42 41 46 33 54 22 25 20 13 51 13 42 36 47 49 31 46 25 26 54 17 47 41 41 21 17 37 13 35 20 18 52 23 28 32 33 26 54 50 28 30 16 48 35 14 21 15 45 13 48 14 30 21 26 45 22 27 38 30 |
A string might have symbols in it that appear more than once. These symbols are removed:
Symbolic version: | Numerical version: |
pXWV+GF6H@K!I7TtNQD5)S(#AflELdr\ pFVWe+D)65z(NtlTQLdM^c pI)Wq!85LMr#D+j6\N(eEHFc pW+^6D_TY\de@XQEKG!(z pW\+Tt^rle67DzGIMN)cE p^SKzD\7M) |
12 13 14 15 16 18 20 21 22 23 24 25 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 12 20 15 14 17 16 36 38 21 37 51 40 34 33 44 32 35 46 47 27 29 52 12 30 38 14 26 25 50 37 46 27 48 41 36 16 53 21 49 34 40 17 45 22 20 52 12 14 16 29 21 36 54 32 19 49 47 17 23 13 35 45 24 18 25 40 51 12 14 49 16 32 33 29 48 44 17 21 31 36 51 18 30 27 34 38 52 45 12 29 39 24 51 36 49 31 27 38 |
Finally, any symbols that do not appear at least in the first four strings are removed:
Symbolic version: | Numerical version: |
pW+6D( pW+D6( pWD+6( pW+6D( pW+6D pD |
12 14 16 21 36 40 12 14 16 36 21 40 12 14 36 16 21 40 12 14 16 21 36 40 12 14 16 21 36 12 36 |
At this point, the algorithm must count all combinations of symbols that appear in the set. These combinations are treated as homophone candidates. If a candidate appears many times in the set, then it is a strong candidate. Here is the result of counting all combinations in the reduced strings for symbol "p":
- Length 2: +( (4) +6 (5) +D (4) 6( (4) 6D (3) D( (4) D+ (1) D6 (2) W( (4) W+ (5) W6 (5) WD (5) p( (4) p+ (5) p6 (5) pD (6) pW (5)
- Length 3: +6( (4) +6D (3) +D( (3) +D6 (1) 6D( (2) D+( (1) D+6 (1) D6( (2) W+( (4) W+6 (5) W+D (4) W6( (4) W6D (3) WD( (4) WD+ (1) WD6 (2) p+( (4) p+6 (5) p+D (4) p6( (4) p6D (3) pD( (4) pD+ (1) pD6 (2) pW( (4) pW+ (5) pW6 (5) pWD (5)
- Length 4: +6D( (2) +D6( (1) D+6( (1) W+6( (4) W+6D (3) W+D( (3) W+D6 (1) W6D( (2) WD+( (1) WD+6 (1) WD6( (2) p+6( (4) p+6D (3) p+D( (3) p+D6 (1) p6D( (2) pD+( (1) pD+6 (1) pD6( (2) pW+( (4) pW+6 (5) pW+D (4) pW6( (4) pW6D (3) pWD( (4) pWD+ (1) pWD6 (2)
- Length 5: W+6D( (2) W+D6( (1) WD+6( (1) p+6D( (2) p+D6( (1) pD+6( (1) pW+6( (4) pW+6D (3) pW+D( (3) pW+D6 (1) pW6D( (2) pWD+( (1) pWD+6 (1) pWD6( (2)
- Length 6: pW+6D( (2) pW+D6( (1) pWD+6( (1)
If we consider strong homophone candidates to be sequences that occur 4 or more times, then the list of strong candidates becomes:
Length 2: +( +6 +D 6( D( W( W+ W6 WD p( p+ p6 pD pW Length 3: +6( W+( W+6 W+D W6( WD( p+( p+6 p+D p6( pD( pW( pW+ pW6 pWD Length 4: W+6( p+6( pW+( pW+6 pW+D pW6( pWD( Length 5: pW+6(
The full set of known homophones for the plaintext letter "e" in the 408 cipher is:
ENWZ6p+
Thus the algorithm detects real homophones for plaintext letter "e". These homophones are indicated in boldface:
Length 2: +( +6 +D 6( D( W( W+ W6 WD p( p+ p6 pD pW Length 3: +6( W+( W+6 W+D W6( WD( p+( p+6 p+D p6( pD( pW( pW+ pW6 pWD Length 4: W+6( p+6( pW+( pW+6 pW+D pW6( pWD( Length 5: pW+6(
Heuristic scoring
How do we know if one set of transformations is better than another based on the detected homophone candidates? Some transformations produce larger sets of homophone candidates. Consider the set of distinct cipher text symbols used in the cipher text. If perfectly assigned sequential homophones are used for each symbol that occurs more than once in the cipher text, then each symbol will be associated with a sequence that repeats as many times as the symbol appears in the cipher text. For example, if the cipher text symbol M appears 8 times in the cipher text, and perfect sequential homophone assignment is used, then we expect M to be involved with exactly 8 occurrences of a repeating sequence of symbols.
We can quantify this by tracking the longest repeated sequence found for each symbol. For each homophone candidate, we inspect each unique symbol in the cipher text, and associate with it the maximum number of repeated subsequences found. For example, in the original 408 cipher text, let's look for the longest repeated subsequence of length 2 involving the symbol "T". The first one we encounter is "@T" which repeats 5 times. So we assign a score of 5 to the symbol "T". But then we find the subsequence "eT" which repeats 7 times. So we update the score for "T" to 7.
A final score is produced by summing all such per-symbol scores.
Example homophone sequence detection on the 408 cipher
When run with the 408 cipher as input, the algorithm produces the below list of homophone candidates. Only homophones with lengths between 2 and 7 are considered. The table below shows the largest repeating sequences found for each distinct symbol in the cipher text, grouped by homophone candidate length.
Description of column values (click on a column header's arrows to sort by that column):
- Length: The number of symbols in the homophone candidate.
- Symbol: The symbol under consideration.
- Candidate: The detected homophone candidate that has the highest number of repetitions for this symbol.
- Sequence: The full sequence of this candidate's symbols as they appear in the cipher text.
- Num repeats: The number of times the homophone candidate repeats in the sequence.
- Ratio: The amount of the sequence covered by repetitions of the homophone candidate (an estimation of how "perfect" the homophone candidate is)
- Decoded: The known plaintext letter assignments for the homophone candidate.
Candidates that match known homophones of the 408 are colored green. Here are the results:
Length | Symbol | Candidate | Sequence | Num Repeats | Ratio | Decoded |
---|---|---|---|---|---|---|
2 | ! | F! | F!F!F!F!F!F | 5 | 1.00 | SO |
2 | ( | (d | (d(d(d(ddd | 4 | 1.00 | NO |
2 | ) | M) | M)M)M)M)M)M)M)M) | 8 | 1.00 | HH |
2 | + | +6 | +6+6+6+6+6++6+66 | 7 | 0.88 | EE |
2 | 5 | H5 | H5H5H5H5H5H55H5H | 7 | 0.88 | TT |
2 | 6 | W6 | W6W6W6W6W6WW6W6W6 | 8 | 1.00 | EE |
2 | 8 | I8 | I8I8I8I8II8I8I88III | 7 | 0.88 | TA |
2 | 9 | 9U | 9U9U9U9U9U9U9U9U999U999U | 10 | 0.71 | II |
2 | @ | @T | @T@T@T@T@TT@T | 6 | 1.00 | SO |
2 | D | pD | pDpDpDpDpDpD | 6 | 1.00 | EN |
2 | F | Fd | FdFdFdFdFddF | 5 | 0.83 | SO |
2 | G | OG | OGOGOGOGOGOGOG | 7 | 1.00 | NA |
2 | H | H) | H)H)H)H)H)H))HH) | 7 | 0.88 | TH |
2 | I | I8 | I8I8I8I8II8I8I88III | 7 | 0.64 | TA |
2 | J | JQ | JQJQJQJQJJQ | 5 | 0.83 | FF |
2 | K | Kd | KdKdKdKdKdd | 5 | 1.00 | SO |
2 | L | HL | HLHLHLHLHLHLLHLH | 7 | 0.88 | TT |
2 | M | M) | M)M)M)M)M)M)M)M) | 8 | 1.00 | HH |
2 | N | Nr | NrNrNrNrNrNrr | 6 | 1.00 | ER |
2 | O | OX | OXOXOXOXOXOXXOXX | 7 | 1.00 | NO |
2 | P | PU | PUPUPUPUPUPUPUPUPUPPU | 10 | 0.91 | II |
2 | Q | JQ | JQJQJQJQJJQ | 5 | 1.00 | FF |
2 | S | GS | GSGSGSGSGSSGG | 5 | 0.83 | AA |
2 | T | @T | @T@T@T@T@TT@T | 6 | 0.86 | SO |
2 | U | PU | PUPUPUPUPUPUPUPUPUPPU | 10 | 1.00 | II |
2 | V | Vr | VrVrVrVrVVrVVrVr | 7 | 0.78 | BR |
2 | W | W6 | W6W6W6W6W6WW6W6W6 | 8 | 0.89 | EE |
2 | X | OX | OXOXOXOXOXOXXOXX | 7 | 0.78 | NO |
2 | Z | Z6 | Z6Z6Z6Z6Z6ZZ6Z66 | 7 | 0.88 | EE |
2 | \ | l\ | l\l\l\l\l\ | 5 | 1.00 | AR |
2 | ^ | ^D | ^D^D^D^D^D^D | 6 | 1.00 | NN |
2 | d | Kd | KdKdKdKdKdd | 5 | 0.83 | SO |
2 | k | Uk | UkUkUkUkUkUkUkUkUUk | 9 | 1.00 | II |
2 | l | l\ | l\l\l\l\l\ | 5 | 1.00 | AR |
2 | p | p+ | p+p+p+p+p+p+++ | 6 | 1.00 | EE |
2 | r | Vr | VrVrVrVrVVrVVrVr | 7 | 1.00 | BR |
2 | t | tr | trtrtrtrtrttrr | 6 | 0.86 | RR |
3 | ! | Fd! | F!dFd!Fd!Fd!Fd!dF | 4 | 0.80 | SOO |
3 | ( | K(d | K(dK(dK(dK(dKdd | 4 | 1.00 | SNO |
3 | ) | )5L | 5)L)5L)5L5L)5L)5L))5LL5) | 5 | 0.62 | HTT |
3 | + | pW+ | pW+pW+pW+pW+pW+pW++W+WW | 6 | 0.75 | EEE |
3 | 5 | H5L | H5LH5LH5LH5LH5LH5L5LHL5H | 6 | 0.75 | TTT |
3 | 6 | Z+6 | Z+6Z+6Z+6Z+6Z+6Z++Z6Z+66 | 6 | 0.75 | EEE |
3 | 9 | 9Uk | 9Uk9Uk9Uk9Uk9Uk9Uk9Uk9Uk999U999Uk | 9 | 0.64 | III |
3 | @ | @KT | @KT@KT@KT@KT@TKT@T | 4 | 0.67 | SSO |
3 | D | O^D | O^DO^DO^DO^DO^DO^DO | 6 | 1.00 | NNN |
3 | F | Fd! | F!dFd!Fd!Fd!Fd!dF | 4 | 0.67 | SOO |
3 | G | OG6 | OG6OG6OG6O6GO6GOG6OG66 | 5 | 0.71 | NAE |
3 | H | H5L | H5LH5LH5LH5LH5LH5L5LHL5H | 6 | 0.75 | TTT |
3 | K | K(d | K(dK(dK(dK(dKdd | 4 | 0.80 | SNO |
3 | L | H5L | H5LH5LH5LH5LH5LH5L5LHL5H | 6 | 0.75 | TTT |
3 | M | M)L | M)LM)LM)LML)ML)LM)M)LLM) | 4 | 0.50 | HHT |
3 | N | Nlr | NlrNlrNlrNrlNrlNrr | 3 | 0.50 | EAR |
3 | O | O^D | O^DO^DO^DO^DO^DO^DO | 6 | 0.86 | NNN |
3 | P | PUk | PUkPUkPUkPUkPUkPUkPUkPUkPUPPUk | 9 | 0.82 | III |
3 | S | GS( | GS(GS(GS(G(SGSSGG | 3 | 0.50 | AAN |
3 | T | @KT | @KT@KT@KT@KT@TKT@T | 4 | 0.57 | SSO |
3 | U | PUk | PUkPUkPUkPUkPUkPUkPUkPUkPUPPUk | 9 | 0.90 | III |
3 | V | Vlr | VlrVlrVlrVrlVVrlVVrVr | 3 | 0.33 | BAR |
3 | W | pW+ | pW+pW+pW+pW+pW+pW++W+WW | 6 | 0.67 | EEE |
3 | X | OX( | OX(OX(OX(OX(OXOXXOXX | 4 | 0.44 | NON |
3 | Z | Z+6 | Z+6Z+6Z+6Z+6Z+6Z++Z6Z+66 | 6 | 0.75 | EEE |
3 | \ | tl\ | tl\tl\tl\tl\tlt\t | 4 | 0.80 | RAR |
3 | ^ | O^D | O^DO^DO^DO^DO^DO^DO | 6 | 1.00 | NNN |
3 | d | K(d | K(dK(dK(dK(dKdd | 4 | 0.67 | SNO |
3 | k | PUk | PUkPUkPUkPUkPUkPUkPUkPUkPUPPUk | 9 | 1.00 | III |
3 | l | ^Dl | ^Dl^Dl^Dl^Dl^lD^D | 4 | 0.80 | NNA |
3 | p | pW+ | pW+pW+pW+pW+pW+pW++W+WW | 6 | 1.00 | EEE |
3 | r | tr\ | tr\tr\tr\tr\trt\trr | 4 | 0.57 | RRR |
3 | t | tl\ | tl\tl\tl\tl\tlt\t | 4 | 0.57 | RAR |
4 | ( | p+6( | p+6(p+6(p+6(p+6(p+6p++6+66 | 4 | 1.00 | EEEN |
4 | ) | H5L) | H5)LH)5LH)5LH5L)H5L)H5L))5LHL5H) | 3 | 0.38 | TTTH |
4 | + | ZpW+ | ZpW+ZpW+ZpW+ZpW+ZpW+ZpW++WZZ+WW | 6 | 0.75 | EEEE |
4 | 5 | H5L) | H5)LH)5LH)5LH5L)H5L)H5L))5LHL5H) | 3 | 0.38 | TTTH |
4 | 6 | pW+6 | pW+6pW+6pW+6pW+6pW+6pW++W6+W6W6 | 5 | 0.62 | EEEE |
4 | 9 | 9PUk | 9PUk9PUk9PUk9PUk9PUk9PUk9PUk9PUk9P99U99PP9Uk | 8 | 0.57 | IIII |
4 | @ | O@K( | O@K(O@K(O@K(O@K(O@OKO@ | 4 | 0.67 | NSSN |
4 | D | ZpWD | ZpWDZpWDZpWDZpWDZpWDZpWWZDZWW | 5 | 0.83 | EEEN |
4 | G | OGK( | OGK(OGK(OGK(OKG(OGOGKOG | 3 | 0.43 | NASN |
4 | H | H5L) | H5)LH)5LH)5LH5L)H5L)H5L))5LHL5H) | 3 | 0.38 | TTTH |
4 | K | O@K( | O@K(O@K(O@K(O@K(O@OKO@ | 4 | 0.80 | NSSN |
4 | L | H5L) | H5)LH)5LH)5LH5L)H5L)H5L))5LHL5H) | 3 | 0.38 | TTTH |
4 | O | O^D( | O^D(O^D(O^D(O^D(O^DO^DO | 4 | 0.57 | NNNN |
4 | P | 9PUk | 9PUk9PUk9PUk9PUk9PUk9PUk9PUk9PUk9P99U99PP9Uk | 8 | 0.73 | IIII |
4 | T | @K(T | @KT(@K(T@K(T@K(T@TKT@T | 3 | 0.43 | SSNO |
4 | U | 9PUk | 9PUk9PUk9PUk9PUk9PUk9PUk9PUk9PUk9P99U99PP9Uk | 8 | 0.80 | IIII |
4 | W | ZpW+ | ZpW+ZpW+ZpW+ZpW+ZpW+ZpW++WZZ+WW | 6 | 0.67 | EEEE |
4 | X | O@X( | OX@(O@X(O@X(O@X(OX@OXXOX@X | 3 | 0.33 | NSON |
4 | Z | ZpW+ | ZpW+ZpW+ZpW+ZpW+ZpW+ZpW++WZZ+WW | 6 | 0.75 | EEEE |
4 | \ | tlr\ | tlr\tlr\tlr\trl\trlt\trr | 3 | 0.60 | RARR |
4 | ^ | O^D( | O^D(O^D(O^D(O^D(O^DO^DO | 4 | 0.67 | NNNN |
4 | k | 9PUk | 9PUk9PUk9PUk9PUk9PUk9PUk9PUk9PUk9P99U99PP9Uk | 8 | 0.89 | IIII |
4 | l | ^D(l | ^D(l^D(l^D(l^Dl(^lD^D | 3 | 0.60 | NNNA |
4 | p | ZpW+ | ZpW+ZpW+ZpW+ZpW+ZpW+ZpW++WZZ+WW | 6 | 1.00 | EEEE |
4 | r | tlr\ | tlr\tlr\tlr\trl\trlt\trr | 3 | 0.43 | RARR |
4 | t | tlr\ | tlr\tlr\tlr\trl\trlt\trr | 3 | 0.43 | RARR |
5 | ( | pW+6( | pW+6(pW+6(pW+6(pW+6(pW+6pW++W6+W6W6 | 4 | 1.00 | EEEEN |
5 | + | ZpW+6 | ZpW+6ZpW+6ZpW+6ZpW+6ZpW+6ZpW++WZ6Z+W6W6 | 5 | 0.62 | EEEEE |
5 | 6 | ZpW+6 | ZpW+6ZpW+6ZpW+6ZpW+6ZpW+6ZpW++WZ6Z+W6W6 | 5 | 0.62 | EEEEE |
5 | @ | O^@X( | OX@^(O^@X(O@X^(O^@X(O^X@O^XXOX@X | 2 | 0.33 | NNSON |
5 | D | ZpW+D | ZpW+DZpW+DZpWD+ZpW+DZpW+DZpW++WZDZ+WW | 4 | 0.67 | EEEEN |
5 | G | OGD6( | OG6D(OGD6(OGD6(O6DG(O6DGOGD6OG66 | 2 | 0.29 | NANEN |
5 | K | ZpW+K | ZpW+KZpW+KZKpW+ZpW+KZpW+ZpW++WZKZ+WW | 3 | 0.60 | EEEES |
5 | O | O^@X( | OX@^(O^@X(O@X^(O^@X(O^X@O^XXOX@X | 2 | 0.29 | NNSON |
5 | W | ZpW+6 | ZpW+6ZpW+6ZpW+6ZpW+6ZpW+6ZpW++WZ6Z+W6W6 | 5 | 0.56 | EEEEE |
5 | X | O^@X( | OX@^(O^@X(O@X^(O^@X(O^X@O^XXOX@X | 2 | 0.22 | NNSON |
5 | Z | ZpW+6 | ZpW+6ZpW+6ZpW+6ZpW+6ZpW+6ZpW++WZ6Z+W6W6 | 5 | 0.62 | EEEEE |
5 | ^ | O^@X( | OX@^(O^@X(O@X^(O^@X(O^X@O^XXOX@X | 2 | 0.33 | NNSON |
5 | p | ZpW+6 | ZpW+6ZpW+6ZpW+6ZpW+6ZpW+6ZpW++WZ6Z+W6W6 | 5 | 0.83 | EEEEE |
6 | ( | pW+6D( | pW+6D(pW+D6(pWD+6(pW+6D(pW+6DpW++WD6+W6W6 | 2 | 0.50 | EEEENN |
6 | + | ZpW+6D | ZpW+6DZpW+D6ZpWD+6ZpW+6DZpW+6DZpW++WZD6Z+W6W6 | 3 | 0.38 | EEEEEN |
6 | 6 | ZpW+6D | ZpW+6DZpW+D6ZpWD+6ZpW+6DZpW+6DZpW++WZD6Z+W6W6 | 3 | 0.38 | EEEEEN |
6 | @ | OG@KD( | OG@KD(O@GDK(OG@KD(OD@KG(ODG@OGKDOG@ | 2 | 0.33 | NASSNN |
6 | D | ZpW+6D | ZpW+6DZpW+D6ZpWD+6ZpW+6DZpW+6DZpW++WZD6Z+W6W6 | 3 | 0.50 | EEEEEN |
6 | G | OG@KD( | OG@KD(O@GDK(OG@KD(OD@KG(ODG@OGKDOG@ | 2 | 0.29 | NASSNN |
6 | K | OG@KD( | OG@KD(O@GDK(OG@KD(OD@KG(ODG@OGKDOG@ | 2 | 0.40 | NASSNN |
6 | O | OG@KD( | OG@KD(O@GDK(OG@KD(OD@KG(ODG@OGKDOG@ | 2 | 0.29 | NASSNN |
6 | W | ZpW+6D | ZpW+6DZpW+D6ZpWD+6ZpW+6DZpW+6DZpW++WZD6Z+W6W6 | 3 | 0.33 | EEEEEN |
6 | Z | ZpW+6D | ZpW+6DZpW+D6ZpWD+6ZpW+6DZpW+6DZpW++WZD6Z+W6W6 | 3 | 0.38 | EEEEEN |
6 | ^ | OG@^D( | OG@^D(O^@GD(OG@^D(O^D@G(O^DG@O^GDOG@ | 2 | 0.33 | NASNNN |
6 | p | ZpW+6D | ZpW+6DZpW+D6ZpWD+6ZpW+6DZpW+6DZpW++WZD6Z+W6W6 | 3 | 0.50 | EEEEEN |
The final score for this set of homophone candidate detections is the sum of all numbers of repeats found. So, for the original 408 cipher, the final score is 578.
Results
Score summaries
The algorithm was run with the 408 and the 340 to try to find quadrant transformations that result in detection of many high-scoring homophone candidates. Let S = the final score computed for a given cipher text.
- - The original unchanged 408 cipher has S = 578.
- - Of the 442,368,000 tested transformations of the 408, 414,882 of them (0.094%) yielded higher values for S.
- - The highest S found was 719. This is a 24.4% increase over the unchanged cipher's score.
- - The original unchanged 340 cipher has S = 247.
- - Of the 371,589,120 tested transformations of the 340, 7,338,467 of them (1.975%) yielded higher values for S. This proportion is 21 times higher than the proportion of high-scoring transformations of the 408.
- - The highest S found was 420. This is a 70.0% increase over the unchanged cipher's score.
Best-scoring transformation of the 408
Let's look at the top-scoring transformation of the 408.
The values of variables leading to highest value of S for the 408: i=17, j=10, border=false, p=7, r1=180, r2=180, r3=180, r4=270, f1=0, f2=0, f3=0, f4=1, c=1.
This corresponds to (i, j) = (17, 10), which defines the quadrants as follows:
9%P/Z/UB%k OR=pX=B WV+eGYF69H P@K!qYe MJY^UIk7qT tNQYD5) S(/9#BPORA U%fRlqE k^LMZJdr\p FHVWe8Y @+qGD9KI)6 qX85zS( RNtIYElO8q GBTQS#B Ld/P#B@XqE HMU^RRk cZKqpI)Wq! 85LMr9# BPDR+j=6\N (eEUHkF ZcpOVWI5+t L)l^R6H I9DR_TYr\d e/@XJQA P5M8RUt%L) NVEKH=G rI!Jk598LM lNA)Z(P zUpkA9#BVW \+VTtOP ^=SrlfUe67 DzG%%IM Nk)ScE/9%% ZfAP#BV peXqWq_F#8 c+@9A9B %OT5RUc+_d Yq_^SqW VZeGYKE_TY A9%#Lt_ H!FBX9zXAD d\7L!=q _ed##6e5PO RXQF%Gc Z@JTtq_8JI +rBPQW6 VEXr9WI6qE HM)=UIk |
The border, which would have been defined as row 17 and column 10, is not ignored in this transformation.
p=1: This corresponds to the following permutation of quadrant orderings: {1, 0, 3, 2}, which translates to "upper right, upper left, lower right, lower left." Thus the quadrants are selected in the following order:
OR=pX=B P@K!qYe tNQYD5) U%fRlqE FHVWe8Y qX85zS( GBTQS#B HMU^RRk 85LMr9# (eEUHkF L)l^R6H e/@XJQA NVEKH=G lNA)Z(P \+VTtOP DzG%%IM ZfAP#BV |
9%P/Z/UB%k WV+eGYF69H MJY^UIk7qT S(/9#BPORA k^LMZJdr\p @+qGD9KI)6 RNtIYElO8q Ld/P#B@XqE cZKqpI)Wq! BPDR+j=6\N ZcpOVWI5+t I9DR_TYr\d P5M8RUt%L) rI!Jk598LM zUpkA9#BVW ^=SrlfUe67 Nk)ScE/9%% |
c+@9A9B Yq_^SqW A9%#Lt_ d\7L!=q RXQF%Gc +rBPQW6 HM)=UIk |
peXqWq_F#8 %OT5RUc+_d VZeGYKE_TY H!FBX9zXAD _ed##6e5PO Z@JTtq_8JI VEXr9WI6qE |
Then, the rotations and flips are applied:
Rotated 180 degrees: VB#PAfZ MI%%GzD POtTV+\ P(Z)ANl G=HKEVN AQJX@/e H6R^l)L FkHUEe( #9rML58 kRR^UMH B#SQTBG (Sz58Xq Y8eWVHF EqlRf%U )5DYQNt eYq!K@P B=Xp=RO |
Rotated 180 degrees: %%9/EcS)kN 76eUflrS=^ WVB#9AkpUz ML895kJ!Ir )L%tUR8M5P d\rYT_RD9I t+5IWVOpcZ N\6=j+RDPB !qW)IpqKZc EqX@B#P/dL q8OlEYItNR 6)IK9DGq+@ p\rdJZML^k AROPB#9/(S Tq7kIU^YJM H96FYGe+VW k%BU/Z/P%9 |
Rotated 180 degrees: kIU=)MH 6WQPBr+ cG%FQXR q=!L7\d _tL#%9A WqS^_qY B9A9@+c |
Rotated 270 degrees, then flipped horizontally: EIODYd8 qJPAT_# 685X_+F I_ezEc_ Wq69KUq 9t#XYRW rT#BG5q XJdFeTX E@e!ZOe VZ_HV%p |
The concatenation operator selected is 1. This corresponds to the following operations: (A c0 B) c2 (C c0 D). A, B, C, and D are quadrants or blocks. The c0 operation means to take two blocks and arrange them left to right, aligning their tops. The c2 operations means to take two blocks and arrange them top to bottom. So the selected operation performs the following steps:
- Take the first two quadrants and arrange them from left to right, aligning their tops. Let's call the resulting block X.
- Take the remaining two quadrants and arrange them from left to right, aligning their tops. Let's call the resulting block Y.
- Take blocks X and Y and place them from top to bottom. The resulting cipher text is read from left to right, top to bottom.
Here is an illustration of the new layout resulting from this transformation:
VB#PAfZ %%9/EcS)kN MI%%GzD 76eUflrS=^ POtTV+\ WVB#9AkpUz P(Z)ANl ML895kJ!Ir G=HKEVN )L%tUR8M5P AQJX@/e d\rYT_RD9I H6R^l)L t+5IWVOpcZ FkHUEe( N\6=j+RDPB #9rML58 !qW)IpqKZc kRR^UMH EqX@B#P/dL B#SQTBG q8OlEYItNR (Sz58Xq 6)IK9DGq+@ Y8eWVHF p\rdJZML^k EqlRf%U AROPB#9/(S )5DYQNt Tq7kIU^YJM eYq!K@P H96FYGe+VW B=Xp=RO k%BU/Z/P%9 kIU=)MH EIODYd8 6WQPBr+ qJPAT_# cG%FQXR 685X_+F q=!L7\d I_ezEc_ _tL#%9A Wq69KUq WqS^_qY 9t#XYRW B9A9@+c rT#BG5q XJdFeTX E@e!ZOe VZ_HV%p |
Then the cipher is read left to right, top to bottom. Here is what it looks like if you reformat the above transformation back into the 24 row, 17 column format of the original 408:
VB#PAfZ%%9/EcS)kN MI%%GzD76eUflrS=^ POtTV+\WVB#9AkpUz P(Z)ANlML895kJ!Ir G=HKEVN)L%tUR8M5P AQJX@/ed\rYT_RD9I H6R^l)Lt+5IWVOpcZ FkHUEe(N\6=j+RDPB #9rML58!qW)IpqKZc kRR^UMHEqX@B#P/dL B#SQTBGq8OlEYItNR (Sz58Xq6)IK9DGq+@ Y8eWVHFp\rdJZML^k EqlRf%UAROPB#9/(S )5DYQNtTq7kIU^YJM eYq!K@PH96FYGe+VW B=Xp=ROk%BU/Z/P%9 kIU=)MHEIODYd86WQ PBr+qJPAT_#cG%FQX R685X_+Fq=!L7\dI_ ezEc__tL#%9AWq69K UqWqS^_qY9t#XYRWB 9A9@+crT#BG5qXJdF eTXE@e!ZOeVZ_HV%p |
This transformation produces a score of 719 which is 24.4% higher than the non-transformed 408.
Best-scoring transformation of the 340
Let's look at the top-scoring transformation of the 340.
The values of variables leading to highest value of S for the 340: i=4, j=7, border=false, p=7, r1=180, r2=180, r3=180, r4=180, f1=1, f2=0, f3=0, f4=0, c=1.
This corresponds to (i, j) = (4, 7), which defines the quadrants as follows:
HER>pl^ VPk|1LTG2d Np+B(#O %DWY.<*Kf) By:cM+U ZGW()L#zHJ Spp7^l8 *V3pO++RK2 _9M+ztj d|5FP+&4k/ p8R^FlO -*dCkF>2D( #5+Kq%; 2UcXGV.zL| (G2Jfj# O+_NYz+@L9 d<M+b+Z R2FBcyA64K -zlUV+^ J+Op7<FBy- U+R/5tE |DYBpbTMKO 2<clRJ| *5T4M.+&BF z69Sy#+ N|5FBc(;8R lGFN^f5 24b.cV4t++ yBX1*:4 9CE>VUZ5-+ |c.3zBK (Op^.fMqG2 RcT+L16 C<+FlWB|)L ++)WCzW cPOSHT/()p |FkdW<7 tB_YOB*-Cc >MDHNpk SzZO8A|K;+ |
The border, which would have been defined as row 4 and column 7, is not ignored in this transformation.
p=1: This corresponds to the following permutation of quadrant orderings: {1, 0, 3, 2}, which translates to "upper right, upper left, lower right, lower left." It is interesting that the same permutation of the 408 produced the maximum score. Is this just a coincidence, or is there some significance to this?
The quadrants are selected in the following order:
VPk|1LTG2d %DWY.<*Kf) ZGW()L#zHJ *V3pO++RK2 |
HER>pl^ Np+B(#O By:cM+U Spp7^l8 |
d|5FP+&4k/ -*dCkF>2D( 2UcXGV.zL| O+_NYz+@L9 R2FBcyA64K J+Op7<FBy- |DYBpbTMKO *5T4M.+&BF N|5FBc(;8R 24b.cV4t++ 9CE>VUZ5-+ (Op^.fMqG2 C<+FlWB|)L cPOSHT/()p tB_YOB*-Cc SzZO8A|K;+ |
_9M+ztj p8R^FlO #5+Kq%; (G2Jfj# d<M+b+Z -zlUV+^ U+R/5tE 2<clRJ| z69Sy#+ lGFN^f5 yBX1*:4 |c.3zBK RcT+L16 ++)WCzW |FkdW<7 >MDHNpk |
Then, the rotations and flips are applied:
Rotated 180 degrees, then flipped horizontally: *V3pO++RK2 ZGW()L#zHJ %DWY.<*Kf) VPk|1LTG2d |
Rotated 180 degrees: 8l^7ppS U+Mc:yB O#(B+pN ^lp>REH |
Rotated 180 degrees: +;K|A8OZzS cC-*BOY_Bt p)(/THSOPc L)|BWlF+<C 2GqMf.^pO( +-5ZUV>EC9 ++t4Vc.b42 R8;(cBF5|N FB&+.M4T5* OKMTbpBYD| -yBF<7pO+J K46AycBF2R 9L@+zYN_+O |Lz.VGXcU2 (D2>FkCd*- /k4&+PF5|d |
Rotated 180 degrees: kpNHDM> 7<WdkF| WzCW)++ 61L+TcR KBz3.c| 4:*1XBy 5f^NFGl +#yS96z |JRlc<2 Et5/R+U ^+VUlz- Z+b+M<d #jfJ2G( ;%qK+5# OlF^R8p jtz+M9_ |
The concatenation operator selected is 1. This corresponds to the following operations: (A c0 B) c2 (C c0 D). A, B, C, and D are quadrants or blocks. The c0 operation means to take two blocks and arrange them left to right, aligning their tops. The c2 operations means to take two blocks and arrange them top to bottom. So the selected operation performs the following steps:
- Take the first two quadrants and arrange them from left to right, aligning their tops. Let's call the resulting block X.
- Take the remaining two quadrants and arrange them from left to right, aligning their tops. Let's call the resulting block Y.
- Take blocks X and Y and place them from top to bottom. The resulting cipher text is read from left to right, top to bottom.
Here is an illustration of the new layout resulting from this transformation:
*V3pO++RK2 8l^7ppS ZGW()L#zHJ U+Mc:yB %DWY.<*Kf) O#(B+pN VPk|1LTG2d ^lp>REH +;K|A8OZzS kpNHDM> cC-*BOY_Bt 7<WdkF| p)(/THSOPc WzCW)++ L)|BWlF+<C 61L+TcR 2GqMf.^pO( KBz3.c| +-5ZUV>EC9 4:*1XBy ++t4Vc.b42 5f^NFGl R8;(cBF5|N +#yS96z FB&+.M4T5* |JRlc<2 OKMTbpBYD| Et5/R+U -yBF<7pO+J ^+VUlz- K46AycBF2R Z+b+M<d 9L@+zYN_+O #jfJ2G( |Lz.VGXcU2 ;%qK+5# (D2>FkCd*- OlF^R8p /k4&+PF5|d jtz+M9_ |
Then the cipher is read left to right, top to bottom. The above concatenation operators result in the same 20 row by 17 column layout of the cipher text.
This transformation produces a score of 420 which is 70.0% higher than the non-transformed 340.
Open questions, issues, and future research
Conclusions
The algorithm found transformations that improved the homophone candidate scores of both the 408 and the 340. But the algorithm found 21 times as many transformations of the 340 (proportionally). It is easier to re-arrange the 340 to produce a higher score, than it is to re-arrange the 408 to produce a higher score.
But why would this be so? This continues to be an open question. Some possibilities:
- The 340 uses 63 symbols, and the 408 uses only 54 symbols. Perhaps this yields a greater degree of freedom on transformations, making it easier to produce higher scores.
- The 340 employs random homophone assignments instead of sequential.
- The 340 does indeed use sequential homophone assignments combined with quadrant-based transformations, and one of the high-scoring transformations is the correct one.
- The 340 does indeed use sequential homophone assignments combined with some other type of transformation.
- The 340 is a hoax or gibberish, and we are just seeing a greater degree of freedom in the transformed cipher text.
There are many high-scoring transformations of both ciphers. Can other measurements be made to determine if transformations are valid? For example, what happens to the n-gram scores? Does a reduced n-gram score imply that the transformation is incorrect? What other statistics can be applied to the transformations to estimate their validity?
- For example, the 408 has consistently fewer repeated symbols per row than per column. This pattern is broken under certain transformations of the cipher text
Problems with the algorithm
- The string reduction rules prevent some strong homophone candidates from being detected. It also reduces the scores of some strong candidates. Here are some examples of how this happens:
- Consider the symbols | and c in the 340. These appear in the 340 in the following sequence: |c|c|c|c||cc|cc|c|c|. The algorithm splits this sequence into the following strings: |c, |c, |c, |c, |, |cc, |cc, |c, |c, |. The homophone candidate |c appears 8 times in these strings. But the repeated symbol c is removed from two of the strings, so the score is reduced to 6.
- Consider the symbols B and # in the 408. These appear in the 408 in the following sequence: BB#BB#B#B#B#B#B#B#B##B. The algorithm splits this sequence into the following strings: B, B#, B, B#, B#, B#, B#, B#, B#, B#, B##, B. One of the string reduction rules is to remove any symbol that does not appear in the first four strings. Homophone candidate B# appears 9 times in these strings, but # is removed because it does not appear in the first four strings.
- Remedies for these problems tend to greatly increase the number of combinations of symbols that must be exhaustively searched during the algorithm. This leads to much greater execution time for the search, vastly increasing the time to test hundreds of millions of possible transformations.
- A better approach might be to perform brute force searches for homophone candidates using every possible combination of symbols. If this can be done efficiently, we might be able to exhaustively search the entire set of transformations.
Further questions and research
- I suspect there a relationship between S and the number of distinct cipher symbols used in the cipher text. Specifically, it seems intuitive that an increase in the number of distinct cipher symbols would lower S, since the cipher text would be of a fixed size, and the number of repeated strings would decrease due to increased diversity of symbols. In this experiment, we see that S is lower for the 340. Even the best transformation's score is still less than the score for the unchanged 408 cipher which has fewer unique cipher text symbols.
- In the 408, there is a pattern of homophone assignments in which the killer was faithful to sequential assignment for the first part of the cipher text, but then abandons the sequences and begins to make assignments randomly. Is there any value in marking or measuring the points at which the killer abandons sequential assignments in the 408? Is this pattern detectable in transformed cipher texts? For example, if one transformation contains a mix of homophone candidates that shows no pattern of "abandonment", and another transformation contains candidates that do show a pattern of abandonment, then perhaps the latter is a more valuable transformation to explore further.
- This experiment needs more test cases. Specifically, we only have the 408 as a test case, and it contains no transformations, but we wish to detect a transformation. We need to create more ciphers that share the 340's statistical qualities and are subjected to known quadrant-based transformations accounted for by the many combinations tested by the algorithm. Successful detection of the correct transformations in test cases increases the confidence that this approach is sound.
- What about other kinds of transformations instead of quadrants, such as spiral paths?