Information Theory, Entropy, and Key Strength

CS 463 Lecture, Dr. Lawlor

"Entropy" has a fairly technical definition in crypto work: it's a measure of the unexpectedness of a message.  The concept and equations come from information theory. 

Here's the conceptual model:
Sentence start
Low entropy ending
Higher entropy ending
Hey, what's
going on?
on fire?!
Want to
get lunch?
TiVo Le Voyage dans la Lune?
How are you doing today? my face spiders?

(I find entropy funny and refreshing, at least for a while.  The random sentence generator in the homework, or SCIgen, are really amusing for the first few tries, though it does get old eventually.)

Mathematically, entropy can be quantified as:
    entropy = sum over messages of (probability of message  * - log(probability of message) );

The base used for the logarithm is a matter of taste.  Computer folks often use a base-2 log, which gives you an entropy in bits.

Probability runs from 0 to 1, while -log runs from +infinity to 0 over that range.  This means probability-1 events or probability-0 events both have no entropy--there's no uncertainty if it's guaranteed to happen, or can't happen.
Probability
-log2
bits of entropy
Why?
1
0
0
It always happens, so there's no uncertainty.
0.5
1
0.5
Happens half the time.
0.1
3.3
0.33
-log is bigger, but probability shrinks faster.
0.01
6.6
0.066
as above, but worse.
0
inf
0
Never happens, so doesn't contribute any entropy.

Probability is a fairly slippery concept (what's the probability for a byte of 0x61?), but there are several ways to measure entropy directly.  For example, if we run a data compression algorithm like gzip or bzip2 on a dataset, it will remove some of the redundancies from a low-entropy dataset, but won't be able to shrink a high-entropy dataset.  Here are the data sizes, in bytes, that I measured for ten types of data:


Raw Gzip -9 Bzip2 7zip Comments
Binary zeros 1048576 1057 45 327 All zeros has the lowest possible entropy: 0 bits.
Wikipedia XML 1048576 359184 287769 294871 Good compression rate, typical for text.  Inferred entropy is about 2.5 bits of entropy per byte of the file (30%).
H.P. Lovecraft Text 1048576 402926 309576 328106 Similar.
gdb ELF64 1048576 406290 386042 332042 Nominally a binary file, but has ASCII strings, and x64 machine code and tables with lots of zeros.
Random ASCII hex
1048576 607628
544118
553549
Nominally ASCII, but very unpredictable.  Expected value is 4 bits of entropy per byte of the file (50%).
Base64 random data
1048576 796856
795415
806685
Expected value is 6 bits of entropy per byte of the file (75%).
Tarball 1048576 1039974 1050758 1046726 Already gzip'd, so most entropy already gone.
Zipfile 1048576 1041544 1050592 1047977 Similar.
JPEG 1048576 1047418 1044382 1055571 After the DCT phase, JPEG uses a huffman encoding already.
Random binary data
1048576 1048761 1053340 1063068 Purely random binary data is almost perfectly incompressible--very nearly 1 bit of entropy per bit of the file (100% entropy).

Generally, any kind of human-readable text has quite low entropy, which means there are redundancies that can be found even by a general-purpose compression program. 

Entropy for crypto

For an attacker, entropy is the enemy.  Low entropy anywhere, especially via repeating keys or text, will produce measurable statistical correlations, which are the basis of much of cryptanalysis.  By contrast, more entropy means a bigger and less predictable key search space, with fewer and more difficult to detect redundancies and correlations.  The central issue is this:

An n-bit key can represent 2n possible values.

The entropy in a cipher's key (its "key strength, in bits") is thus a measure of how difficult it is to break the cipher via brute force.
Cipher
Key Entropy
Cryptanalysis?
Rot-13
0 bits (no key)
Trivial
Rot-K
< 5 bits (k)
Easy--brute force, known plaintext, letter frequency statistics, ...
DES
<= 56 bits
Medium--brute force is currently feasible
AES-128
<= 128 bits
Very Hard
OTP
1 bit per message bit
Not possible (all plaintexts equally likely)

The operation of the cipher typically sets an upper limit on the number of bits of key entropy that can be used, although this number is somewhat flexible in both directions.  You can increase the effective key entropy somewhat by running the cipher repeatedly with different keys; for example, two passes of AES with two different keys approximately doubles the key length.  If the attacker doesn't know which cipher you used, this gains you a few more bits (although there aren't very many likely possibilities, so you don't gain many bits this way).

Sadly, it's very easy to decrease the key entropy.  For example, a 10-byte password has only 10*8=80 bits of entropy, so even if it's 100% binary garbage you're still well below AES-128's potential.  If you restrict yourself to printable ASCII, you're down to only 10*6=60 bits of entropy, which is approaching feasible to brute force.  Letter frequency and coincidence analysis (more 'a' than 'z'; more 'qu' than 'qk') can cut this to only a few bits of entropy per byte, putting an attacker well into brute force territory.  If you use a random word from the dictionary, you're at only 17 bits of entropy (assuming 217=128K words in the dictionary).  If you use the word "password", you're at nearly 0 bits of entropy. 

Generally, humans are poor entropy sources, and poor storage devices, yet passwords rely on both of these.  Longer passphrases have been recommended, although I've found many sites have limits as low as 16 characters.

Known Plaintext/Ciphertext Attacks

Whatever entropy is in the key, poor ciphers tend to leak out this entropy into the ciphertext.  This is especially hard to prevent wherever the plaintext has low entropy, such as a leading "ACHTUNG!", or a trailing "END TRANSMISSION".  A well-designed cipher keeps the key secret even against a known-plaintext, known-ciphertext attack, typically by scrambling it with a well-designed substitution box (coming soon!).

For a trivial cipher, if we know the plaintext and ciphertext, we can reconstruct the key:
Name
Encryption
Decryption
To crack?
Rot-13
out[i]='A'+((in[i]-'A'+13)%26);
(same)
Run decryption.
Rot-K
out[i]='A'+((in[i]-'A'+K)%26); K = 26-K;
K=(out[i]-in[i]+26)%26; // for any i
XOR-fixed
out[i]=in[i]^K;
(same)
K=out[i]^in[i]; // for any i
XOR-cycle
out[i]=in[i]^K[i%K.size()];
(same)
K[i]=out[i]^in[i]; // for each i
Bits-1
out[i]=ror(in[i],K);
rol
Try the 8 possible K's.
Bits-2
x=ror(in[i],K);
out[i]=x^L;
x=out[i]^L;
in[i]=rol(x,K);
Try the 8 possible K's.
L=ror(in[i],K)^out[i];
Bits-3
x=ror(in[i],K);
y=x^L;
out[i]=ror(y,M);
y=rol(out[i],M);
x=y^L;
in[i]=rol(x,K);
Try the 8*8 possible M's and K's.
L=ror(in[i],K)^rol(out[i],M);
"ror" is a circular bit rotate to the right: uint ror(uint v,uint bits) { return (v>>bits)|(v<<(32-bits)); }
"rol" rotates the other direction.
"^" is the bitwise XOR operation.

Bit rotation does a good job of mixing up values, resulting in binary ciphertext that *looks* random (and hence high entropy).  But even a complex series of bit rotations and arithmetic can normally be unwound from either end, allowing an analyst to solve for the operations in the middle, especially if they can (1) reverse-engineer, guess, or bruteforce the algorithm; and (2) guess at least a portion of the plaintext.