One difficult aspect of deciphering encrypted data is recognizing that you've actually got plaintext. For example, here are some decryption results from my attempts to solve hw1_0:

key=0x13739 results in:Clearly, we want the program to automatically find the key=0x13795 result. There are several ways to do this:

* UR]]UN KR[M ]XUM VN J UXLJU /JR[KJWT\ \]XUNW VRUURWP VJLQRWN `X^UM QJ_N KNNW NW]R[NUb \Q[NMMNM Kb JW RVYX[]NM ,[X]XKJU]R\UJ_XWRJW JW\`N[RWP VJLQRWN(

key=0x13795 results in:

A little bird told me a local Fairbanks stolen milling machine would have been entirely shredded by an imported Crotobaltislavonian answering machine?!

key=0x13799 results in:

B!mjuumf!cjse!upme!nf!b!mpdbm!Gbjscbolt!tupmfo!njmmjoh!nbdijof!xpvme!ibwf!cffo!foujsfmz!tisfeefe!cz!bo!jnqpsufe!Dspupcbmujtmbwpojbo!botxfsjoh!nbdijof@"

key=0x1379d results in:

C"nkvvng"dktf"vqnf"og"c"nqecn"Hcktdcpmu"uvqngp"oknnkpi"ocejkpg"yqwnf"jcxg"dggp"gpvktgn{"ujtgffgf"d{"cp"korqtvgf"Etqvqdcnvkuncxqpkcp"cpuygtkpi"ocejkpgA#

key=0x137a1 results in:

D#olwwoh#elug#wrog#ph#d#orfdo#Idluedqnv#vwrohq#ploolqj#pdfklqh#zrxog#kdyh#ehhq#hqwluho|#vkuhgghg#e|#dq#lpsruwhg#Furwredowlvodyrqldq#dqvzhulqj#pdfklqhB$

key=0x137a5 results in:

E$pmxxpi$fmvh$xsph$qi$e$psgep$Jemvferow$wxspir$qmppmrk$qeglmri${syph$lezi$fiir$irxmvip}$wlvihhih$f}$er$mqtsvxih$Gvsxsfepxmwpezsrmer$erw{ivmrk$qeglmriC%

**Permitted values.**Decide which byte values could be in the plaintext, and throw out decrypts that include any other byte values. This is by far the fastest and simplest approach, and above I'm actually filtering out the many decryption keys that result in some binary output data (above 127, or non-whitespace below 32). This approach is very fast, especially since you can throw out a possible key as soon as it spits out bad data, which usually happens very early in the decryption process. The downside? If we assume the weird characters like ] ! " # $**never**occur, we will throw out any decrypt where the do occur. This can even be gamed, by mixing blocks of binary garbage into the actual plaintext.**Permitted sets**. This works as above, but we reduce the chance of false negatives by checking several bytes. For example, "!m" rarely occurs in English, but "! " or "!!" are possible. This is just a generalization of the above across multiple bytes, and suffers from the same flaws. In particular, now we'll be thrown off by things like proper names or foreign words.**Likely**values or sets. Since a binary choice is error-prone and easy to subvert, we can instead compute the**probability**that a message is actually English given, that it contains this value or set of values. We can then save decrypts that have a high enough probability, or rank decrypts by their probability. This is essentially a generalization of the histogram approach.

P(

(The bar | means "given that", making this a conditional probability.)

This is unfortunately not the probability we want. For example, it tells us that the message "Hello" has less than a (12%)

P(message is English |

We know conditional probability says:

P(A and B) = P(A|B)P(B) = P(B|A)P(A)

So solving for the conditional probability:

P(A|B) = P(B|A) * P(A) / P(B)

Substituting A="message is English" and B="this letter is 'e'" gives:

We can compute this from:

- P('e' | English) comes straight from the letter frequency
tables, here 12%. For binary data with uppercase,
lowercase, spaces, and punctuation, this is a smaller value.

- P('e') is 1/26, assuming all letters are equally likely in the
ciphertext. For binary data, this would be 1/256.

- P(English) is 1/(size of key space), assuming you only get
English with the right key.

P(English | 'e') = P(English) * P('e' | English) / P('e')

or

Chance of English *= chance of this letter in English / chance of letter overall

This of course can and should be repeated with pairs of letters, triplets, etc. You're only limited by the quantity of plaintext and the care to which you'd like to do your statistical frequency analysis.