Protecting Passwords with Hashing

CS 463/480 Lecture, Dr. Lawlor

Given a password p, it's extremely dangerous to transmit or store the password in plaintext, since the data can be stolen from the network or storage device.  Still, "passwords in plaintext" is a tragically common situation, such as HTTP Basic authentication, or most quick and dirty PHP forum sites, which often also suffer from SQL injection vulnerabilities, allowing the entire user table with plaintext passwords to be dumped.

It's much safer to store some hash related to the password, such as H(p), because the hash's preimage resistance directly protects the password.  The big downside, however, is hashing is completely deterministic, so attackers can precompute big tables of the hashes of common passwords, and just look any newly discovered password hash up in their table. 

Salted Hashes

Safer yet is to store the hash of not just the password, but also some unique string called the "salt".  HTTP Digest authentication recommends this approach:

    H1=H(username:realm:password);  // store this hash in your database
    H2=H(method:digestURI); // you can precompute this per accessed page
    response=H(H1:nonce:H2); // "nonce" is a random number sent with the request

The trick here is H1 can be more safely stored in the database than H(password), because the same password will result in different hashes for different users, and the same username:password will still result in different hashes at different sites ("realm" is basically the DNS name).

To reduce CPU load on busy webservers, typically the hash function used is a very fast simple hash like MD5.  The engineered hash collision vulnerabilities in MD5 aren't of much attack value here (they'd allow two really long weird passwords to be used interchangeably), and MD5 is still basically OK for preimage resistance.  One downside is MD5 can be computed very quickly, typically many millions of hashes per second per core; or billions of hashes per second on a GPU (oclHashcat is very fast).  This means even fairly long complex passwords can often be cracked if they're based on a dictionary word, or derivable from a common phrase, although long (12+ digit) well randomized passwords still usually survive.

Key Stretching

The basic idea with key stretching is to slow down the hash computation, to make brute forcing more expensive.  The downside is this automatically makes normal use of the hash more expensive, and by the same factor, but even a busy webserver is only going to get a few hundred logins per second.  This means it might be OK to burn a millisecond of CPU time on the server for each login, to force an attacker to burn a millisecond for each guessed password.

Password-Based Key Derivation Function 2 (PBKDF2) is a technique based on iterated hashing:
    template <class hash>
    hash::digest PBKDF2(password, salt, iterations) {
           hash::digest ret=0;
           for (int c=0;c<iterations;c++) {
                  salt = hash(password:salt); // update salt
                  ret = ret^salt;  // fold into return value
           return ret;

Note the digest at a given iteration is fed back into the subsequent iterations as a salt, which prevents parallel computation of the iterations.  You can save a little time in this definition by precomputing the hash state including the password, although in practice this only helps for really long passphrases.

A typical application, like WPA or WPA2, uses 4096 iterations of SHA-1.  This is enough to push you from several billion hashes per second with raw SHA-1, down to only a few hundred thousand hashes per second with PBKDF2.

On Linux systems, "SHA-crypt" stores shadow file passwords hashed with $5$ SHA-256 or $6$ SHA-512, repeated a default of 5000 times. 

The key derivation function bcrypt has a similar intent to PBKDF2, except the stretching happens during the key setup phase of a Blowfish encryption.

Memory-Bound Hashing

Because today's primary attack vector is massively parallel hardware, there is some interest in memory-bound hash algorithms.  The algorithm Scrypt uses PBKDF2 to generate a large pseudorandom bit string, which is then modified in a data-dependent (serialized) fashion.  The idea is to bound parallelism by the required memory usage.  Litecoin uses Scrypt as a proof-of-work for mining, but they only use N=1024, so they only require 64-128KB of RAM, which doesn't limit parallelism very much!

Bonus: Picking a Good Password

Humans are really bad at generating or storing passwords.  You can check your password strength using an entropy estimator like zxcvbn (try my online version in an incognito / private window).  This should scare you!

On a UNIX system, this perl script generates 14 character (82 bit) random alphanumeric passwords.  Truly randomly generated passwords are much harder to predict than variations on dictionary words.  At a hash checking rate of 100 billion hashes/second (a cluster of 8 GPUs), enumerating the entire key space will still take approximately 1 million years (58^14/100e9/3600/24/365 = 1.55e6).

One disadvantage is they're quite unmemorable!
#!/usr/bin/perl -w
# Generate a random alphanumeric password
# Dr. Orion Sky Lawlor,, 2007-02-15 (Public Domain)

# Return a good random number between zero and (arg-1).
# Only works for arg up to 65000; larger values will overflow.
sub good_rand
my $max_val=shift;
open(RF,"/dev/urandom") or return rand();
my $val=256*ord(getc(RF))+ord(getc(RF));
# print $val." ".$max_val."\n";
return $val*$max_val/65536;

# This function generates random strings of a given length
# Original written by Guy Malachi
# 18 August, 2002
sub generate_random_string
my $length_of_randomstring=shift;# the length of
# the random string to generate

# Skip l, I, 1, 0, and O (unreadable)
my @chars=('_','a'..'k','m'..'z','A'..'H','J'..'N','P'..'Z','2'..'9');
my $random_string;
foreach (1..$length_of_randomstring)
return $random_string;

#Generate the random string
my $len=shift;
if ( ! $len ) { $len=14; }

my $random_string=&generate_random_string($len);
print "$random_string\n";