CryptoScope Help
NOTE: On September 30, 2011 I released a new version with many new features that are documented here.
Contents
- 1 Introduction
- 2 Working with cipher texts
- 3 Finding repeated symbols
- 4 Statistical analysis
- 5 N-gram analysis
- 6 Non-repeating sequences
- 7 Transposed sequences
- 8 Highly-constrained sections of cipher text
- 9 Unusual repeated sequences
- 10 Search for partial matches of repeated sequences
- 11 Discovering sequential homophones
- 12 Letter contacts
Introduction
CryptoScope is a utility that performs various statistical calculations and pattern searches on built in and user-entered substitution cipher texts.
Working with cipher texts
When you visit CryptoScope, it defaults to displaying information about the unsolved 340-character Zodiac cipher. But you can change the cipher text by clicking on the cipher text box. It will change to an editable text box and you can make your changes. To save your changes, click the Update button. Or if you change your mind, click Cancel.
You can rotate and flip the cipher text using the controls at the bottom of the cipher text box. Hovering your mouse pointer over a cipher text letter will show its position (row and column), symbol, and number of occurrences at the bottom of the box.
The "Available ciphers" pulldown list has a variety of built-in ciphers for you to explore. Selecting one will automatically reload and update CryptoScope with the selected cipher text. Here's a list of the available ciphers:
- Zodiac killer's solved 408-character cipher: webtoy transcription
- Zodiac killer's solved 408-character cipher: zkdecrypto transcription
- Zodiac killer's solved 408-character cipher: zkdecrypto transcription with rod's corrections: More info on this can be found here.
- Zodiac killer's solved 408-character cipher: solution: This is included to see how the statistics and patterns of the 408 plaintext compare to those of the 408 cipher text.
- Zodiac killer's unsolved 340-character cipher: webtoy transcription
- Zodiac killer's unsolved 340-character cipher: zkdecrypto transcription:
- Zodiac killer's unsolved 340-character cipher: in an oxcart path by kfreeze: A top-down, left-to-right, right-to-left, etc, routing of the 340's cipher text. Created by kfreeze.
- Zodiac killer's unsolved 340-character cipher: pluses changed to unique symbols: A variation of the 340 that permits interpretation of + symbols as wildcards.
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: Top half: FBI cryptananlyst Dan Olsen's hypothesis is that the 340 cipher can be divided into two segments: the first 10 rows, and the remaining 10 rows. This is the top half (first 10 rows).
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: Bottom half: Bottom half of the 340 (last 10 rows).
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: L1A0B0: Bottom half of the 340 placed to the right of the first half.
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: L2A0B0: Bottom half of the 340 placed above the first half.
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: L3A0B0: Bottom half of the 340 placed to the left of the first half.
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: L0A0B1: Horizontally mirrored bottom half of the 340 placed below the first half.
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: L1A0B1: Horizontally mirrored bottom half of the 340 placed to the right of the first half.
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: L2A0B1: Horizontally mirrored bottom half of the 340 placed above the first half.
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: L3A0B1: Horizontally mirrored bottom half of the 340 placed to the left of the first half.
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: L0A0B2: Vertically mirrored bottom half of the 340 placed below the first half.
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: L1A0B2: Vertically mirrored bottom half of the 340 placed to the right of the first half.
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: L2A0B2: Vertically mirrored bottom half of the 340 placed above the first half.
- Zodiac killer's unsolved 340-character cipher: Dan Olsen variation: L3A0B2: Vertically mirrored bottom half of the 340 placed to the left of the first half.
- 318-character Bryianzum Deviantart cipher: A Zodiac-inspired cipher created by Brianzum. (More info.)
- 330-character cipher by Kiuku: A test cipher created by Kiuku.
- 340-character cipher by Chris Cactus: A test cipher created by Chris Cactus.
- 340-character cipher by Gardibolt: A test cipher created by Gardibolt.
- 340-character cipher by Mike Cole: A test cipher created by Mike Cole.
- 340-character cipher by Michael Eaton: A test cipher created by Michael Eaton.
- 340-character cipher #1 by Tony Baloney: A test cipher created by Tony Baloney.
- 340-character cipher #2 by Tony Baloney: A test cipher created by Tony Baloney.
- 348-character cipher by Ray N: A test cipher created by Ray N.
- Zodiac killer's unsolved 13-character cipher: (Images from the letter.)
- Dorabella Cipher: Read about it here.
- Beale Cipher Part 1: Part 1 of the Beale ciphers. Unsolved. The cipher is included here with symbolic representations of each number that appears in the original code.
- Beale Cipher Part 2: Part 2 of the Beale ciphers. Solved. The cipher is included here with symbolic representations of each number that appears in the original code.
- Beale Cipher Part 3: Part 3 of the Beale ciphers. Unsolved. The cipher is included here with symbolic representations of each number that appears in the original code.
Some basic information is displayed when you load a cipher:
- Cipher text length (number of characters)
- Number of rows of cipher text
- Number of columns of cipher text
- Number of unique symbols used in the cipher text's alphabet. Each symbol is displayed as a link. If you click a link, CryptoScope highlights all occurrences of the letter in the cipher text.
- Multiplicity (number of unique symbols, divided by total cipher text length). Ciphers with high multiplicity are very hard or impossible to solve because of the lack of unique solutions.
- Key search space size. This is computed by assuming that the plaintext alphabet has 26 letters (A through Z). The size is shown as a link. If you click the link, a Wolfram Alpha page opens with information about the scale of the search space size. For many homophonic ciphers, the search space size is very large.
Finding repeated symbols
- Repeated symbols by row: For each row of the cipher text, the number of repeated symbols is displayed. It appears as a link. Hovering over the link reveals the row in the cipher text. If the number is greater than zero, all symbols participating as repeats along the row are displayed. You can hover over a symbol to show its occurrences along the row.
- Repeated symbols by column: For each column of the cipher text, the number of repeated symbols is displayed. It appears as a link. Hovering over the link reveals the column in the cipher text. If the number is greater than zero, all symbols participating as repeats along the column are displayed. You can hover over a symbol to show its occurrences along the column.
- Column repeats by row: For each row, CryptoScope counts the number of characters in the row that repeat in columns of the cipher text. A notable example of this is the last row of the 408-character cipher (see this forum post). Hover over each number to highlight the corresponding row and its columnar repetitions.
- Row repeats by column: For each column, CryptoScope counts the number of characters in the column that repeat in rows of the cipher text. Hover over each number to highlight the corresponding column and its row-wise repetitions.
- Symbol density map: Clicking this link renders each symbol in the cipher text with a grayscale background. High-frequency symbols are given a darker background color. Low-frequency symbols are given a lighter background color.
- Row repeat map: Clicking this link renders each row with a grayscale background. Rows with the highest numbers of symbol repetitions are given a darker background color. Rows with the lowest numbers of symbol repetitions are given a lighter background color.
- Column repeat map: Clicking this link renders each column with a grayscale background. Columns with the highest numbers of symbol repetitions are given a darker background color. Columns with the lowest numbers of symbol repetitions are given a lighter background color.
Five charts, related to symbol repetitions, are displayed:
- Reps by Row: Shows the number of repetitions as a function of row number.
- Reps by Col: Shows the number of repetitions as a function of column number.
- Col Reps by Row: Shows the number of columnar repetitions for symbols in a row as a function of row number.
- Row Reps by Col: Shows the number of row-wise repetitions for symbols in a column as a function of column number.
- Appearance of new symbols: Plots the number of new symbols accumulated as you read the cipher from left to right in a linear fashion. The X-axis is the linear position in the cipher text, and the Y-axis is the number of new symbols seen up to that point.
Statistical analysis
Various statistical tests are calculated for the entire cipher text, and for each column and row.
- IoC: The index of coincidence. The overall IoC is computed, then the IoC computations for each row and column are displayed. Clicking a link highlights the corresponding row or column. The average per-column and per-row IoCs are also computed.
- Entropy: The overall entropy is computed, then the entropy computations for each row and column are displayed. Clicking a link highlights the corresponding row or column. The average per-column and per-row entropy are also computed.
- Chi^{2}: The overall Chi^{2} is computed, then the Chi^{2} computations for each row and column are displayed. Clicking a link highlights the corresponding row or column. The average per-column and per-row Chi^{2} are also computed.
Per-row and per-column charts are displayed for each type of statistical test.
N-gram analysis
- N-grams: CryptoScope performs N-gram analysis, which is a simple linear search for repeated character sequences of length N in the cipher text. The search begins with N=1, and increases N until no more repetitions are found. Results are displayed in histograms and pie charts. Clicking the links in the histograms highlights the corresponding sequences as they appear in the cipher text. By default, only the top 10 repetitions are displayed. Click the "Show All" link to display all of them.
Non-repeating sequences
- Largest non-repeating sequences: CryptoScope inspects the cipher text for the largest non-repeating sequences. A non-repeating sequence is a chunk of cipher text that contains symbols that do not repeat. Results are displayed in descending order of length. The percentage value shows how much of the entire cipher text is covered by each sequence. Clicking a sequence highlights it in the cipher text.
Transposed sequences
- Repeating transposed sequences: The search for transposed sequences is similar to the search for repeated N-grams, except the order of symbols in the individual sequences is ignored. The results are displayed in a similar format as the repeated N-grams.
Highly-constrained sections of cipher text
- Ciphertext segments with the most repeated symbols: Click the Search button to perform the search for segments of cipher text with the highest numbers of repeated symbols. This is useful for locating areas of the cipher text that are highly constrained. A score is calculated for each segment. The score is simply the ratio of the number of repetitions in the segment to segment length. Higher scores indicate high amounts of repetition within the segment. Click a segment to highlight its appearance in the cipher text. The search for segments is limited to segment lengths of up to a third of the total cipher text length.
Unusual repeated sequences
- Repeated sequences (any direction): Some cipher texts contain sequences that are repeated in unusual directions. Click "Search" to perform a search for such sequences. If any results are found, you can click them to highlight them in the cipher text.
Search for partial matches of repeated sequences
- All sequences that share at least the first and last symbols (any direction): Click "Search" to perform searches for partially-matching patterns that occur in any orientation in the cipher text. The following information is displayed for each result found:
- Length: The length of the found pattern. CryptoScope performs searches for partially matching patterns up to 20 characters in length.
- Count: The number of matches for this result.
- Score: Each result is ranked by a combined score of the count, and the largest number of internal matches found. The score is formed by scaling the count to a number between 0 and 1, and adding it to the largest number of internal matches.
- Symbols: The beginning and ending symbols of partially matched sequences. Click the link to highlight the matched sequences in the cipher text.
- Internal matches: Partial matches occur when sequences share the first and last symbols. Internal matches are additionally matching symbols, in the same positions of each sequence, that are found between the first and last symbols.
Discovering sequential homophones
- Brute force search for sequential homophones: A homophonic substitution cipher conceals plaintext letter frequencies by assigning multiple cipher text symbols (called homophones) to plaintext letters. If sequential homophonic substitution is used, then the enciphering scheme will assign homophones in an orderly, repeated sequence to the corresponding plaintext letters as they are read from left to right. CryptoScope's brute force search for sequential homophones detects such sequences by inspecting every permutation of a small number of symbols to see if they repeat in the cipher text. Four searches are supported: N=2, N=3, and N=4. N indicates the length of the homophone pattern. The N=2 search is very fast since it is only looking for A*(A-1)/2 combinations of symbols (where A is the size of the cipher text alphabet). But the search for N=3 will take longer since A*(A-1)*(A-2)/2 combinations are searched. The search is even longer with an N=4 search (A*(A-1)*(A-2)*(A-3)/2 combinations), so be prepared to wait for a few minutes while this search is taking place.
If you perform the homophone search on the solved 408, using the webtoy transcription, then the actual homophones of the 408 are displayed. In the search results, CryptoScope also determines if the algorithmically found sequences are true homophones of the known solution of the 408.
Each pattern found in the search is displayed with some additional information:
- Score: Each pattern is ranked with a score that is a combination of the longest run and percentage of pattern (explanation below).
- Sequence: This is the pattern that was found repeating in the cipher text. The pattern is found by removing every symbol in the original cipher text except for the N symbols in the search. CryptoScope then looks for repetitions in the pattern that remains. If you click the "merge" link, the cipher text is updated by replacing all occurrences of the selected symbols with a single symbol (the first in the sequence).
- Pattern: This is what the cipher text looks like when all symbols except for those in the searched sequence are removed. If the searched sequence is a sequential homophone, or part of a sequential homophone, then the pattern that remains will contain many repeats. Click the pattern link to highlight all occurrences of the symbols in the cipher text.
- Repeats: The number of times the sequence was found in the pattern. A high number of repeats is desired.
- Longest run: A result is compelling if it has many contiguous repeats of the searched sequence. The longest run is the maximum number of contiguous repeats of the searched sequence in the pattern. The longest run covers some portion of the inspected pattern. This portion is represented as a percentage, and is included in the overall score for the result. The score is the average of two ranks:
- Longest run: The maximum possible longest run value is given a rank of 1, and the minimum is given a rank of 0.
- Percentage of pattern: The maximum possible percentage of pattern (100%) is given a rank of 1, and the minimum (0%) is given a rank of 0).
- Actual homophone of the 408: If you are performing the homophone search on the webtoy transcription of the 408, the search tells you if the algorithmically found result is an actual homophone of the 408's known solution. You will see that many of the found sequences are actual homophones, and some are not.
Results scoring less than 0.5 are not displayed.
The score combines two measurements: The ratio of the number of contiguous repeats in a pattern to that of the best overall pattern, and the ratio of the length of those repeats to the overall length of the pattern.
An example might help illustrate this:
The homophone candidate "KM", found in the brute force search for the 408 (N=2), has a score of 0.845. The pattern that was found is: KMKMKMKMKMKMMK.
There are 5 contiguous (unbroken) repeats of "KM" in the pattern (the first occurrence is not counted): KM KM KM KM KM KM. I call this the "longest run" of the pattern.
The best candidate, "lM", has a longest run of 6.
The first part of the score, S1, is the ratio of the longest run of the pattern, to the longest run of the best candidate. So, S1 = 5/6. This is a measurement of how good this pattern is overall.
The second part of the score, S2, is the total length of the longest run (including the first occurrence of "KM"), divided by the total length of the pattern. So, S2 = 12 / 14. This is a measurement of how few "sequence errors" appear in the pattern.
The total score is the average of S1 and S2. S = (S1+S2)/2 = 0.845.
Summary information is displayed at the end of the sequential homophone search results.
- Number of combinations of symbols tested.
- Number and percentage of combinations that scored 0.5 or higher.
- Min, max, sum and average of score, number of repeats, longest run count, and run ratio (percentage of pattern covered by the longest run)
- Histogram of longest run lengths vs number of occurrences.
Letter contacts
- Letter contacts: Click "Compute" to display the letter contacts, a grid that shows how many times every symbol occurs before every other symbol.