Measuring Entropy: Information Theory
463/480 Lecture, Dr.
"Entropy" has a fairly technical
definition in crypto work: it's a measure of the unexpectedness of
a message. The concept and equations come from Shannon information
Here's the conceptual model:
|Low entropy ending
|Higher entropy ending
|Hey, what's ...
|Want to ...
Voyage dans la Lune?
|How are ...
|you doing today?
||my face spiders?
(I find entropy funny and refreshing, at
least for a while. Random sentence generators like SCIgen are really amusing for the first few
tries, though it does get old eventually.)
can be quantified as:
entropy = sum over
messages of (probability of message * - log(probability of
The base used for the logarithm is a
matter of taste. Computer folks often use a base-2 log,
which gives you an entropy in bits.
Probability runs from 0 to 1, while -log
runs from +infinity to 0 over that range. This means
probability-1 events or probability-0 events both have no
entropy--there's no uncertainty if it's guaranteed to happen, or
|bits of entropy
|It always happens, so there's no uncertainty.
|Happens half the time. In binary, two
messages 0 and 1 with equal probability have total entropy
of 1 bit.
|-log is bigger, but probability shrinks
|as above, but worse.
|Never happens, so doesn't contribute any
Probability is a fairly slippery concept
(what's the probability for a byte of 0x61?), but there are
several ways to measure entropy directly. For example, if we
run a data compression algorithm like gzip or bzip2 on a dataset,
it will remove some of the redundancies from a low-entropy
dataset, but won't be able to shrink a high-entropy dataset.
Here are the data sizes, in bytes, that I measured for ten types
||All zeros has the lowest possible entropy:
||Good compression rate, typical for
text. Inferred entropy is about 2.5 bits of entropy
per byte of the file (30%).
||Nominally a binary file, but has ASCII
strings, and x64 machine code and tables with lots of zeros.
|Random ASCII hex
|Nominally ASCII, but very
unpredictable. Expected value is 4 bits of entropy per
byte of the file (50%).
|Base64 random data
|Expected value is 6 bits of entropy per
byte of the file (75%).
||Already gzip'd, so most entropy already
||After the DCT phase, JPEG uses a huffman
||Purely random binary data is almost
perfectly incompressible--very nearly 1 bit of entropy per
bit of the file (100% entropy).
Generally, any kind of human-readable
text has quite low entropy, which means there are redundancies
that can be found even by a general-purpose compression
For an attacker, entropy is the
enemy. Low entropy anywhere, especially via repeating keys,
ciphertext, or plaintext, will produce measurable statistical
correlations, which are the basis of much of cryptanalysis.
By contrast, more entropy means a bigger and less predictable key
search space, with fewer and more difficult to detect redundancies
and correlations. The central issue is this:
An n-bit key can
represent 2n possible
The entropy in a cipher's key (its "key
strength, in bits") is thus a measure of how difficult it is to
break the cipher via brute force.
|0 bits (no key)
|< 5 bits (k)
|Easy--brute force, known plaintext, letter
frequency statistics, ...
|<= 56 bits
|Medium--brute force is currently
|<= 128 bits
|1 bit per message bit
|Not possible (all plaintexts and keys are
The operation of the cipher typically
sets an upper limit on the number of bits of key entropy that can
be used, although this number is somewhat flexible in both
directions. You can increase the effective key entropy
somewhat by running the cipher repeatedly with different keys; for
example, two passes of AES with two different keys approximately
doubles the key length. If the attacker doesn't know which
cipher you used, this gains you a few more bits (although there
aren't very many likely possibilities, so you don't gain many bits
Sadly, it's very easy to decrease the
key entropy. For example, a 10-byte password has only
10*8=80 bits of entropy, so even if it's 100% binary garbage
you're still well below AES-128's potential. If you restrict
yourself to printable ASCII, you're down to only 10*6=60 bits of
entropy, which is approaching feasible to brute force.
Letter frequency and coincidence analysis (more 'a' than 'z'; more
'qu' than 'qk') can cut this to only a few bits of entropy per
byte, putting an attacker well into brute force territory.
If you use a random word from the dictionary, you're at only 17
bits of entropy (assuming 217=128K words in the dictionary). If
you use the word "password" (and people
do!), you're at nearly 0 bits of entropy.
Generally, humans are poor entropy
sources, and poor storage devices, yet passwords rely on both of
these. Once you've memorized a password, people use the same password at
multiple sites, inviting disaster. Longer passphrases
have been recommended, although I've found
many sites have limits as low as 16 characters.