Measuring Entropy: Information Theory

CS 463/480 Lecture, Dr. Lawlor

"Entropy" has a fairly technical definition in crypto work: it's a measure of the unexpectedness of a message. The concept and equations come from Shannon information theory.

Here's the conceptual model:

Sentence start	Low entropy ending	Higher entropy ending
Hey, what's ...	going on?	on fire?!
Want to ...	get lunch?	TiVo Le Voyage dans la Lune?
How are ...	you doing today?	my face spiders?

(I find entropy funny and refreshing, at least for a while. Random sentence generators like SCIgen are really amusing for the first few tries, though it does get old eventually.)

Mathematically, entropy can be quantified as:
entropy = sum over messages of (probability of message * - log(probability of message) );

The base used for the logarithm is a matter of taste. Computer folks often use a base-2 log, which gives you an entropy in bits.

Probability runs from 0 to 1, while -log runs from +infinity to 0 over that range. This means probability-1 events or probability-0 events both have no entropy--there's no uncertainty if it's guaranteed to happen, or can't happen.

Probability	-log₂	bits of entropy	Why?
1	0	0	It always happens, so there's no uncertainty.
0.5	1	0.5	Happens half the time. In binary, two messages 0 and 1 with equal probability have total entropy of 1 bit.
0.1	3.3	0.33	-log is bigger, but probability shrinks faster.
0.01	6.6	0.066	as above, but worse.
0	inf	0	Never happens, so doesn't contribute any entropy.

Probability is a fairly slippery concept (what's the probability for a byte of 0x61?), but there are several ways to measure entropy directly. For example, if we run a data compression algorithm like gzip or bzip2 on a dataset, it will remove some of the redundancies from a low-entropy dataset, but won't be able to shrink a high-entropy dataset. Here are the data sizes, in bytes, that I measured for ten types of data:

	Raw	Gzip -9	Bzip2	7zip	Comments
Binary zeros	1048576	1057	45	327	All zeros has the lowest possible entropy: 0 bits.
Wikipedia XML	1048576	359184	287769	294871	Good compression rate, typical for text. Inferred entropy is about 2.5 bits of entropy per byte of the file (30%).
H.P. Lovecraft Text	1048576	402926	309576	328106	Similar.
gdb ELF64	1048576	406290	386042	332042	Nominally a binary file, but has ASCII strings, and x64 machine code and tables with lots of zeros.
Random ASCII hex	1048576	607628	544118	553549	Nominally ASCII, but very unpredictable. Expected value is 4 bits of entropy per byte of the file (50%).
Base64 random data	1048576	796856	795415	806685	Expected value is 6 bits of entropy per byte of the file (75%).
Tarball	1048576	1039974	1050758	1046726	Already gzip'd, so most entropy already gone.
Zipfile	1048576	1041544	1050592	1047977	Similar.
JPEG	1048576	1047418	1044382	1055571	After the DCT phase, JPEG uses a huffman encoding already.
Random binary data	1048576	1048761	1053340	1063068	Purely random binary data is almost perfectly incompressible--very nearly 1 bit of entropy per bit of the file (100% entropy).

Generally, any kind of human-readable text has quite low entropy, which means there are redundancies that can be found even by a general-purpose compression program.

Entropy for crypto

For an attacker, entropy is the enemy. Low entropy anywhere, especially via repeating keys, ciphertext, or plaintext, will produce measurable statistical correlations, which are the basis of much of cryptanalysis. By contrast, more entropy means a bigger and less predictable key search space, with fewer and more difficult to detect redundancies and correlations. The central issue is this:

An n-bit key can represent 2ⁿ possible values.

The entropy in a cipher's key (its "key strength, in bits") is thus a measure of how difficult it is to break the cipher via brute force.

Cipher	Key Entropy	Cryptanalysis?
Rot-13	0 bits (no key)	Trivial
Rot-K	< 5 bits (k)	Easy--brute force, known plaintext, letter frequency statistics, ...
DES	<= 56 bits	Medium--brute force is currently feasible
AES-128	<= 128 bits	Very Hard
OTP	1 bit per message bit	Not possible (all plaintexts and keys are equally likely)

The operation of the cipher typically sets an upper limit on the number of bits of key entropy that can be used, although this number is somewhat flexible in both directions. You can increase the effective key entropy somewhat by running the cipher repeatedly with different keys; for example, two passes of AES with two different keys approximately doubles the key length. If the attacker doesn't know which cipher you used, this gains you a few more bits (although there aren't very many likely possibilities, so you don't gain many bits this way).

Sadly, it's very easy to decrease the key entropy. For example, a 10-byte password has only 10*8=80 bits of entropy, so even if it's 100% binary garbage you're still well below AES-128's potential. If you restrict yourself to printable ASCII, you're down to only 10*6=60 bits of entropy, which is approaching feasible to brute force. Letter frequency and coincidence analysis (more 'a' than 'z'; more 'qu' than 'qk') can cut this to only a few bits of entropy per byte, putting an attacker well into brute force territory. If you use a random word from the dictionary, you're at only 17 bits of entropy (assuming 2¹⁷=128K words in the dictionary). If you use the word "password" (and people do!), you're at nearly 0 bits of entropy.

Generally, humans are poor entropy sources, and poor storage devices, yet passwords rely on both of these. Once you've memorized a password, people use the same password at multiple sites, inviting disaster. Longer passphrases have been recommended, although I've found many sites have limits as low as 16 characters.