Corpus search results

From Zodiac Ciphers
Jump to: navigation, search

This experiment involved a systematic search for words and phrases shared between Zodiac's correspondences and a large corpus. The content of Zodiac's correspondences were reduced to a stream of alphabet characters with no spacing or punctuation. The corpus was similarly reduced. Then, all possible substrings of each of Zodiac's correspondences were compared to items in the corpus, and matches are organized from largest to smallest. Matches of the same length are organized from most frequently found to least frequently found. If a substring was fully contained within another substring, it was repeated in the results only if it was found in the corpus more frequently than the larger substring.

The corpus used for this experiment was the almost 30,000 books from the Project Gutenberg April 2010 DVD.

No more than 99 matches are displayed for any particular substring that was found in the corpus.