Protecting Passwords with Hashing
CS
463/480 Lecture, Dr.
Lawlor
Given a password p, it's extremely dangerous to transmit or store
the password in plaintext, since the data can be stolen from the
network or storage device. Still, "passwords in plaintext" is
a tragically common situation, such as HTTP
Basic authentication, or most quick and dirty PHP forum sites,
which often also suffer from SQL injection vulnerabilities, allowing
the entire user table with plaintext passwords to be dumped.
It's much safer to store some hash related to the password, such as
H(p), because the hash's preimage resistance directly protects the
password. The big downside, however, is hashing is completely
deterministic, so attackers can precompute big tables of the hashes
of common passwords, and just look any newly discovered password
hash up in their table.
Salted Hashes
Safer yet is to store the hash of not just the password, but also
some unique string called the "salt". HTTP
Digest authentication recommends this approach:
H1=H(username:realm:password); // store
this hash in your database
H2=H(method:digestURI); // you can precompute
this per accessed page
response=H(H1:nonce:H2); // "nonce" is a random
number sent with the request
The trick here is H1 can be more safely stored in the database than
H(password), because the same password will result in different
hashes for different users, and the same username:password will
still result in different hashes at different sites ("realm" is
basically the DNS name).
To reduce CPU load on busy webservers, typically the hash function
used is a very fast simple hash like MD5. The engineered hash
collision vulnerabilities in MD5 aren't of much attack value here
(they'd allow two really long weird passwords to be used
interchangeably), and MD5 is still basically OK for preimage
resistance. One downside is MD5 can be computed very quickly,
typically many millions of hashes per second per core; or billions
of hashes per second on a GPU (oclHashcat is very fast).
This means even fairly
long complex passwords can often be cracked if they're based
on a dictionary word, or derivable from a common phrase, although
long (12+ digit) well randomized passwords still usually survive.
Key Stretching
The basic idea with key stretching is to slow down the hash
computation, to make brute forcing more expensive. The
downside is this automatically makes normal use of the hash more
expensive, and by the same factor, but even a busy webserver is only
going to get a few hundred logins per second. This means it
might be OK to burn a millisecond of CPU time on the server for each
login, to force an attacker to burn a millisecond for each guessed
password.
Password-Based Key
Derivation Function 2 (PBKDF2) is a technique based on
iterated hashing:
template <class hash>
hash::digest PBKDF2(password, salt, iterations) {
hash::digest ret=0;
for
(int c=0;c<iterations;c++) {
salt = hash(password:salt); // update salt
ret = ret^salt; // fold into return value
}
return
ret;
}
Note the digest at a given iteration is fed back into the subsequent
iterations as a salt, which prevents parallel computation of the
iterations. You can save a little time in this definition by
precomputing the hash state including the password, although in
practice this only helps for really long passphrases.
A typical application, like WPA or WPA2, uses 4096 iterations of
SHA-1. This is enough to push you from several billion hashes
per second with raw SHA-1, down to only a few hundred thousand
hashes per second with PBKDF2.
On Linux systems, "SHA-crypt"
stores shadow file passwords hashed with $5$ SHA-256 or $6$ SHA-512,
repeated a default of 5000 times.
The key derivation
function bcrypt has a similar intent to PBKDF2, except the
stretching happens during the key setup phase of a Blowfish
encryption.
Memory-Bound Hashing
Because today's primary attack vector is massively parallel
hardware, there is some interest in memory-bound hash
algorithms. The algorithm Scrypt uses PBKDF2
to generate a large pseudorandom bit string, which is then modified
in a data-dependent (serialized) fashion. The idea is to bound
parallelism by the required memory usage. Litecoin uses Scrypt
as a proof-of-work for mining, but they only use N=1024, so they
only require 64-128KB of RAM, which doesn't limit parallelism very
much!
Bonus: Picking a Good Password
Humans are really bad at generating or storing passwords. You
can check your password strength using an entropy estimator like zxcvbn (try my online version in an
incognito / private window). This should scare you!
On a UNIX system, this perl script generates 14 character (82 bit)
random alphanumeric passwords. Truly randomly generated
passwords are much harder to predict than variations on dictionary
words. At a hash checking rate of 100 billion hashes/second (a
cluster of 8 GPUs), enumerating the entire key space will still take
approximately 1 million years (58^14/100e9/3600/24/365 = 1.55e6).
One disadvantage is they're quite unmemorable!
#!/usr/bin/perl -w
#
# Generate a random alphanumeric password
# Dr. Orion Sky Lawlor, olawlor@acm.org, 2007-02-15 (Public Domain)
# Return a good random number between zero and (arg-1).
# Only works for arg up to 65000; larger values will overflow.
sub good_rand
{
my $max_val=shift;
open(RF,"/dev/urandom") or return rand();
my $val=256*ord(getc(RF))+ord(getc(RF));
# print $val." ".$max_val."\n";
return $val*$max_val/65536;
}
# This function generates random strings of a given length
# Original written by Guy Malachi http://guymal.com
# 18 August, 2002
sub generate_random_string
{
my $length_of_randomstring=shift;# the length of
# the random string to generate
# Skip l, I, 1, 0, and O (unreadable)
my @chars=('_','a'..'k','m'..'z','A'..'H','J'..'N','P'..'Z','2'..'9');
my $random_string;
foreach (1..$length_of_randomstring)
{
$random_string.=$chars[good_rand(58)];
}
return $random_string;
}
#Generate the random string
my $len=shift;
if ( ! $len ) { $len=14; }
my $random_string=&generate_random_string($len);
print "$random_string\n";