Difference between revisions of "Cipher Legitimacy"

From Zodiac Killer Ciphers Wiki
Jump to: navigation, search
(Created page with "'''(This article by true crime author [http://www.amazon.com/Michael-D.-Kelleher/e/B000APCA94/ref=ntt_athr_dp_pel_1 Michael Kelleher] is copied from the now defunct site ZodiacKi...")
 
 
(2 intermediate revisions by 2 users not shown)
Line 25: Line 25:
 
The 340 cipher has 63 unique symbols. If one of 26 plaintext letters can be assigned to each symbol, then the number of possible solutions is 26 raised to the 63rd power. So, how many of those solutions would computers need to examine PER SECOND to finish within 100 years? The answer is about 4.4 x 10^80 solutions, or 440 million billion billion billion billion billion billion billion billion. This is LIGHT YEARS beyond what current computers are capable of.
 
The 340 cipher has 63 unique symbols. If one of 26 plaintext letters can be assigned to each symbol, then the number of possible solutions is 26 raised to the 63rd power. So, how many of those solutions would computers need to examine PER SECOND to finish within 100 years? The answer is about 4.4 x 10^80 solutions, or 440 million billion billion billion billion billion billion billion billion. This is LIGHT YEARS beyond what current computers are capable of.
  
According to [Top500.org Top500.org], the world's fastest supercomputer as of November, 2011, achieved a computational power of 10.51 Petaflop/s, or 10.51 quadrillion floating point operations per second. If you were somehow able to check a decryption solution of the 340 cipher using a single floating point operation, then this supercomputer could go through all of the possible solutions to the 340 cipher in about 1.3×10^57 years. By comparison, our humble universe is estimated to be only 13.7 x 10^9 years old!
+
According to [http://Top500.org Top500.org], the world's fastest supercomputer as of November, 2011, achieved a computational power of 10.51 Petaflop/s, or 10.51 quadrillion floating point operations per second. If you were somehow able to check a decryption solution of the 340 cipher using a single floating point operation, then this supercomputer could go through all of the possible solutions to the 340 cipher in about 1.3×10^57 years. By comparison, our humble universe is estimated to be only 13.7 x 10^9 years old!
  
 
So, we have to rely on smarter computer programs that take shortcuts by examining only the solutions that seem worth examining. This is what "hill- climbing" solvers such as [http://zkdecrypto.googlecode.com zkdecrypto] do. They look at some randomized solution, then try out new solutions to see if they are better than the older solutions. If they are better, the program continues to explore around the new solution. In this way, the programs "climb up a hill" of solutions of increasing quality. Sometimes they can find the right answer and other times they get stuck on smaller hills, failing to see the real hill off in the distance. Thus far, hill-climbing algorithms seem to be getting stuck on the smaller hills of the 340 cipher's landscape of solutions. Either there is still a distant, undiscovered hill out there with the correct solution perched at its top, or the entire landscape is covered in gibberish. You have to search the entire space to know for sure. And, even if we discovered that every possible point in the landscape is gibberish, there might be some "twist" to the 340 cipher's construction that we have failed to account for so far.
 
So, we have to rely on smarter computer programs that take shortcuts by examining only the solutions that seem worth examining. This is what "hill- climbing" solvers such as [http://zkdecrypto.googlecode.com zkdecrypto] do. They look at some randomized solution, then try out new solutions to see if they are better than the older solutions. If they are better, the program continues to explore around the new solution. In this way, the programs "climb up a hill" of solutions of increasing quality. Sometimes they can find the right answer and other times they get stuck on smaller hills, failing to see the real hill off in the distance. Thus far, hill-climbing algorithms seem to be getting stuck on the smaller hills of the 340 cipher's landscape of solutions. Either there is still a distant, undiscovered hill out there with the correct solution perched at its top, or the entire landscape is covered in gibberish. You have to search the entire space to know for sure. And, even if we discovered that every possible point in the landscape is gibberish, there might be some "twist" to the 340 cipher's construction that we have failed to account for so far.
Line 35: Line 35:
 
1) It contains repetitions of bigrams and trigrams, as would normally occur if you created a homophonic substitution cipher.
 
1) It contains repetitions of bigrams and trigrams, as would normally occur if you created a homophonic substitution cipher.
  
2) There are repeated patterns of sequences that suggest the cipher author used a sequential homophonic assignment strategy, similar to what he did in the 408 cipher. The idea is to assign a few different cipher symbols to a common plaintext letter (such as "E"), to conceal its frequent occurrences. You can see some of the detected sequences here:
+
2) There are repeated patterns of sequences that suggest the cipher author used a sequential homophonic assignment strategy, similar to what he did in the 408 cipher. The idea is to assign a few different cipher symbols to a common plaintext letter (such as "E"), to conceal its frequent occurrences. You can see some of the detected sequences [http://zodiackillerciphers.com/wiki/index.php?title=Brute_force_search_for_homophone_sequences#Results_for_340_cipher at this link.]
 
 
[http://zodiackillerciphers.com/wiki/Brute_force_search_for_homophone_sequences#Results_for_340_cipher]
 
  
 
For example, if you removed every symbol except for l, *, and M, you get this sequence: [l*M] [l*M] [l*M] lM [l*M] [l*M] [l*M]. This sequence suggests a strong relationship between those symbols and some common plaintext letter.
 
For example, if you removed every symbol except for l, *, and M, you get this sequence: [l*M] [l*M] [l*M] lM [l*M] [l*M] [l*M]. This sequence suggests a strong relationship between those symbols and some common plaintext letter.
  
3) The pivots suggest some sort of pattern in the underlying plaintext, or the inclusion of some other transposition strategy or other encipherment technique. More information on the pivots can be found here:
+
3) The pivots suggest some sort of pattern in the underlying plaintext, or the inclusion of some other transposition strategy or other encipherment technique. More information on the pivots can be found [http://zodiackillerciphers.com/wiki/index.php?title=Pivots at this link.]
 
 
[http://zodiackillerciphers.com/index.php?title=Pivots]
 
  
 
4) The distribution of individual symbols is similar to that of other real ciphers.
 
4) The distribution of individual symbols is similar to that of other real ciphers.
Line 63: Line 59:
 
'''David Oranchak'''
 
'''David Oranchak'''
 
Computer scientist, software engineer, and puzzle enthusiast.
 
Computer scientist, software engineer, and puzzle enthusiast.
Creator/maintainer of the Zodiac Ciphers wiki: [http://zodiackillerciphers.com].  
+
Creator/maintainer of the [http://zodiackillerciphers.com Zodiac Ciphers wiki].  
 
You can contact Dave at doranchak at gmail dot com.
 
You can contact Dave at doranchak at gmail dot com.

Latest revision as of 14:31, 3 June 2012

(This article by true crime author Michael Kelleher is copied from the now defunct site ZodiacKillerCase.com)

Like many other people who follow the Zodiac case, I have long been intrigued and regularly frustrated by the killer's ciphers. However, I simply don't have the kind of brain that can really come to grips with this aspect of the case. My mind seems to go on strike every time I try to approach the ciphers from an objective point of view.

Fortunately, there is a small community of exceptionally bright and tenacious individuals who have worked on Zodiac's ciphers for years. Among them is David Oranchak (see the Contributors Section), who has contributed to this site and who clearly has a grasp on this complex part of the Zodiac mystery.

Dave has become my cipher-guru. When I need to understand something about the ciphers, Dave always has a ready answer and he is able to put his thoughts into words in a direct and understandable way. So, I went to Dave with a question that has been rattling around in my mind for a long time. Here is the result of that exchange:

Dave, I have a question, probably a dumb one, because I know so very little about the Zodiac ciphers. However, it is something that has been bothering me for years.

How do we know that Zodiac actually created legitimate ciphers, other than the one that was broken? In other words, in the world of ciphers, is it possible to know that a legitimate cipher exists but has not yet been broken? Is there a way to test for a real cipher that would eliminate the possibility that Zodiac just created gibberish with all his ciphers following the rather quick decoding of his first one?

It seems to me that it may have been in Zodiac's nature to do something like hoaxing ciphers. I have always ASSUMED that the unbroken ciphers were legitimate but just not yet cracked. What if, however, they were never intended to be broken simply because they were never true ciphers in the first place?

And Dave's reply . . .

It is certainly not a dumb question. But I don't think I have any smart answers.

(Ed: it's nice to see that humility is not a lost art. Dave's answer, as you will see, is on the mark and easily understood.)

I think the short answer is that we don't know that they are legitimate ciphers. Imagine a scenario in which the cipher author creates a stream of gibberish plaintext, and then enciphers it using a valid cryptographic approach, making the end result seem like a real cipher. In this case, there would be no real way to determine that the original plaintext is gibberish. I don't think there is a test that can determine this, especially considering the possibilities of other "schemes" at work, other than the one that was used on the 408-character cipher.

There is a real possibility that the 340 cipher contains no message whatsoever. The only way I can think of to prove this is to perform an exhaustive search of all possible solutions to the cipher under the assumption that it is a homophonic substitution cipher. But this is currently impossible because there are WAY too many solutions for computers to search through. Even if every computer on Earth was working on this problem, they would never finish going through all possible solutions any time soon.

The 340 cipher has 63 unique symbols. If one of 26 plaintext letters can be assigned to each symbol, then the number of possible solutions is 26 raised to the 63rd power. So, how many of those solutions would computers need to examine PER SECOND to finish within 100 years? The answer is about 4.4 x 10^80 solutions, or 440 million billion billion billion billion billion billion billion billion. This is LIGHT YEARS beyond what current computers are capable of.

According to Top500.org, the world's fastest supercomputer as of November, 2011, achieved a computational power of 10.51 Petaflop/s, or 10.51 quadrillion floating point operations per second. If you were somehow able to check a decryption solution of the 340 cipher using a single floating point operation, then this supercomputer could go through all of the possible solutions to the 340 cipher in about 1.3×10^57 years. By comparison, our humble universe is estimated to be only 13.7 x 10^9 years old!

So, we have to rely on smarter computer programs that take shortcuts by examining only the solutions that seem worth examining. This is what "hill- climbing" solvers such as zkdecrypto do. They look at some randomized solution, then try out new solutions to see if they are better than the older solutions. If they are better, the program continues to explore around the new solution. In this way, the programs "climb up a hill" of solutions of increasing quality. Sometimes they can find the right answer and other times they get stuck on smaller hills, failing to see the real hill off in the distance. Thus far, hill-climbing algorithms seem to be getting stuck on the smaller hills of the 340 cipher's landscape of solutions. Either there is still a distant, undiscovered hill out there with the correct solution perched at its top, or the entire landscape is covered in gibberish. You have to search the entire space to know for sure. And, even if we discovered that every possible point in the landscape is gibberish, there might be some "twist" to the 340 cipher's construction that we have failed to account for so far.

There may be ways to detect if a cipher is a hoax, however. In this interesting article about the Beale ciphers, the author concludes that the unsolved cipher is a meaningless hoax because of the appearance of the strange sequence "ABFDE FGHII JKLMM NOHPP" when you decipher the code using the same key that was used on the solved Beale cipher. So, in this case, a researcher has produced real evidence that suggests the unsolved Beale cipher is a hoax. There is still debate, however, over whether that sequence is truly an artifact of an intentional hoax or of some other cryptographic remnant that might mean something else. So, the jury is apparently still out on that one and the mystery lingers on.

The 340 cipher seems like a real cipher because it contains several resemblances to real homophonic substitution ciphers:

1) It contains repetitions of bigrams and trigrams, as would normally occur if you created a homophonic substitution cipher.

2) There are repeated patterns of sequences that suggest the cipher author used a sequential homophonic assignment strategy, similar to what he did in the 408 cipher. The idea is to assign a few different cipher symbols to a common plaintext letter (such as "E"), to conceal its frequent occurrences. You can see some of the detected sequences at this link.

For example, if you removed every symbol except for l, *, and M, you get this sequence: [l*M] [l*M] [l*M] lM [l*M] [l*M] [l*M]. This sequence suggests a strong relationship between those symbols and some common plaintext letter.

3) The pivots suggest some sort of pattern in the underlying plaintext, or the inclusion of some other transposition strategy or other encipherment technique. More information on the pivots can be found at this link.

4) The distribution of individual symbols is similar to that of other real ciphers.

5) If you look at the number of repeated symbols per line, you'll see that some lines have few or no repeats, suggesting intentional assignment of unique symbols to those areas first.

So, there is evidence of intentional, deliberate encryption in the 340 cipher. There is also the little corrected backwards “K” that the cipher author made, which suggests a careful, deliberate process of encipherment. And, it is easy to create home-made ciphers that share many of the 340's traits. All of these factors continue to tantalize everyone.

Sorry I can't be more conclusive. It's still a big mystery, ultimately.

When I read Dave's analysis, it became clear to me that Zodiac could well have created several legitimate ciphers. However, perhaps he made them too complex, or just did not design them well. Since his first cipher was decoded rather quickly, and by an amateur husband/wife team, perhaps Zodiac decided to make his future efforts “crack proof.”

At any rate, as Dave said, this is a genuine mystery. I now feel more certain that Zodiac did create meaningful ciphers. I also believe that there is reason to hope that, someday, the family of Zodiac cipher-breakers, like Dave, will surprise us all with some interesting solutions.

Thanks, Dave, for giving me my brain back!

Michael D. Kelleher, PhD

David Oranchak Computer scientist, software engineer, and puzzle enthusiast. Creator/maintainer of the Zodiac Ciphers wiki. You can contact Dave at doranchak at gmail dot com.