Measuring Entropy: Information Theory

CS 463/480 Lecture, Dr. Lawlor

"Entropy" has a fairly technical definition in crypto work: it's a measure of the unexpectedness of a message.  The concept and equations come from Shannon information theory 

Here's the conceptual model:
Sentence start
Low entropy ending
Higher entropy ending
Hey, what's ...
going on?
on fire?!
Want to ...
get lunch?
TiVo Le Voyage dans la Lune?
How are ...
you doing today? my face spiders?

(I find entropy funny and refreshing, at least for a while.  Random sentence generators like SCIgen are really amusing for the first few tries, though it does get old eventually.)

Mathematically, entropy can be quantified as:
    entropy = sum over messages of (probability of message  * - log(probability of message) );

The base used for the logarithm is a matter of taste.  Computer folks often use a base-2 log, which gives you an entropy in bits.

Probability runs from 0 to 1, while -log runs from +infinity to 0 over that range.  This means probability-1 events or probability-0 events both have no entropy--there's no uncertainty if it's guaranteed to happen, or can't happen.
bits of entropy
It always happens, so there's no uncertainty.
Happens half the time.  In binary, two messages 0 and 1 with equal probability have total entropy of 1 bit.
-log is bigger, but probability shrinks faster.
as above, but worse.
Never happens, so doesn't contribute any entropy.

Probability is a fairly slippery concept (what's the probability for a byte of 0x61?), but there are several ways to measure entropy directly.  For example, if we run a data compression algorithm like gzip or bzip2 on a dataset, it will remove some of the redundancies from a low-entropy dataset, but won't be able to shrink a high-entropy dataset.  Here are the data sizes, in bytes, that I measured for ten types of data:

Raw Gzip -9 Bzip2 7zip Comments
Binary zeros 1048576 1057 45 327 All zeros has the lowest possible entropy: 0 bits.
Wikipedia XML 1048576 359184 287769 294871 Good compression rate, typical for text.  Inferred entropy is about 2.5 bits of entropy per byte of the file (30%).
H.P. Lovecraft Text 1048576 402926 309576 328106 Similar.
gdb ELF64 1048576 406290 386042 332042 Nominally a binary file, but has ASCII strings, and x64 machine code and tables with lots of zeros.
Random ASCII hex
1048576 607628
Nominally ASCII, but very unpredictable.  Expected value is 4 bits of entropy per byte of the file (50%).
Base64 random data
1048576 796856
Expected value is 6 bits of entropy per byte of the file (75%).
Tarball 1048576 1039974 1050758 1046726 Already gzip'd, so most entropy already gone.
Zipfile 1048576 1041544 1050592 1047977 Similar.
JPEG 1048576 1047418 1044382 1055571 After the DCT phase, JPEG uses a huffman encoding already.
Random binary data
1048576 1048761 1053340 1063068 Purely random binary data is almost perfectly incompressible--very nearly 1 bit of entropy per bit of the file (100% entropy).

Generally, any kind of human-readable text has quite low entropy, which means there are redundancies that can be found even by a general-purpose compression program.  

Entropy for crypto

For an attacker, entropy is the enemy.  Low entropy anywhere, especially via repeating keys, ciphertext, or plaintext, will produce measurable statistical correlations, which are the basis of much of cryptanalysis.  By contrast, more entropy means a bigger and less predictable key search space, with fewer and more difficult to detect redundancies and correlations.  The central issue is this:

An n-bit key can represent 2n possible values.

The entropy in a cipher's key (its "key strength, in bits") is thus a measure of how difficult it is to break the cipher via brute force.
Key Entropy
0 bits (no key)
< 5 bits (k)
Easy--brute force, known plaintext, letter frequency statistics, ...
<= 56 bits
Medium--brute force is currently feasible
<= 128 bits
Very Hard
1 bit per message bit
Not possible (all plaintexts and keys are equally likely)

The operation of the cipher typically sets an upper limit on the number of bits of key entropy that can be used, although this number is somewhat flexible in both directions.  You can increase the effective key entropy somewhat by running the cipher repeatedly with different keys; for example, two passes of AES with two different keys approximately doubles the key length.  If the attacker doesn't know which cipher you used, this gains you a few more bits (although there aren't very many likely possibilities, so you don't gain many bits this way).

Sadly, it's very easy to decrease the key entropy.  For example, a 10-byte password has only 10*8=80 bits of entropy, so even if it's 100% binary garbage you're still well below AES-128's potential.  If you restrict yourself to printable ASCII, you're down to only 10*6=60 bits of entropy, which is approaching feasible to brute force.  Letter frequency and coincidence analysis (more 'a' than 'z'; more 'qu' than 'qk') can cut this to only a few bits of entropy per byte, putting an attacker well into brute force territory.  If you use a random word from the dictionary, you're at only 17 bits of entropy (assuming 217=128K words in the dictionary).  If you use the word "password" (and people do!), you're at nearly 0 bits of entropy.  

Generally, humans are poor entropy sources, and poor storage devices, yet passwords rely on both of these.  Once you've memorized a password, people use the same password at multiple sites, inviting disaster.  Longer passphrases have been recommended, although I've found many sites have limits as low as 16 characters.