Randomness and Pseudorandom Generators

CS 463/480 Lecture, Dr. Lawlor

If you need random numbers for crypto, you need *real* random numbers, and there's only one safe place to get them: the OS, which watches the hardware interrupts and has access to nanosecond-resolution timers to give you unpredictable values.
Windows
UNIX
#pragma comment(lib, "advapi32.lib")
BYTE buf[32]; // however much you need
H CRYPTPROV h;
if (!CryptAcquireContext(&h,0,0,
         PROV_RSA_FULL, CRYPT_VERIFYCONTEXT))
    die("CryptAquire failed");
if (!CryptGenRandom(h,sizeof(buf),buf))
    die("CryptGen failed");
Read bytes from the file
    /dev/urandom

(Don't bother with /dev/random)

(If you don't trust your OS, your program is doomed.)

Pseudorandom Numbers

For scientific programming, where repeatability is critical, we often want a series of "pseudorandom" numbers.  These normally don't need to be as unpredictable as cryptographic random numbers, in fact for debugging or controlled experiments we want to be able to spit out the same numbers again, and we often need a lot of them to be generated quickly.

Name
Usage
Timing
Comments
Mersenne Twister
std::mt19937 generator;

          next=generator();
12ns
Initialization is slow, and the algorithm is complicated (but fast).
C++11 default_random_engine
std::default_random_engine generator;

          next=generator();
12ns
This is just the 1988 version of Park-Miller, minstd_rand0.  It cycles after two billion steps.
Park-Miller
  next = (last * 48271) % 2147483647UL;

or std::minstd_rand.
5ns
Gets stuck if given an input of zero.
Surprisingly fast because the compiler transforms the mod to a multiply.
drand48
48-bit arithmetic:
  next = last*0x5DEECE66DuL + 13;
10ns
Low hex digits cycle through this sequence:
0d6bc92785e341af
randu
31-bit arithmetic:
   next = last * 65539;
5ns
Only useful with odd numbers.
Gets stuck if given an input of zero.
Low hex digits cycle through 1,3,9,B.
All values fall into 15 planes in 3D (fails spectral test).

There's an exhaustive table of random number generator performance at the Boost site.

State Recovery

One serious problem with most scientific random number generators is they have a small internal state, as small as 32 bits.  This means an attacker can use any output derived from their random number stream to brute-force the generator's state, and predict all future outputs of the generator.

Blum Blum Shub (BBS) is designed to be provably robust to this sort of attack.  Generation is simply as follows:
     next = (last * last) % (P*Q);
Your random number stream consists of the low few bits of "next".

P and Q are two large prime numbers, with thousands of bits each, meaning this requires multi-precision arithmetic, and generating random numbers is very slow.  But there is a proof reducing the problem of reconstructing this random number stream from its low order bits to factoring the product P*Q, which appears to be intractable on current machines (although not on quantum computers!).

PRN to Encryption

If you can reliably generate a stream of pseudorandom numbers, you can use them to encrypt data in a variety of ways (most of them bad!).  For example, since srand/rand gives a deterministic stream (if you stay on the same OS), you can just XOR each character like this:
srand(3); // the key
std::string plaintext=bar, ciphertext="";
for (unsigned int i=0;i<plaintext.size();i++) {
	int keyi=rand()&0xff; // 8 bits of pseudorandom data
	keyi&=0x1f; // drop to 5 bits to stay in printable ASCII
	char c=plaintext[i]; // extract char from input
	c=c^keyi; // encryption! (or decryption, same thing)
	ciphertext+=c; // add char to output
}
return ciphertext;

(Try this in NetRun now!)

Note we only XOR the low 5 bits of the character data, solely to avoid the hassle of reading and writing binary data.  For a real cipher, you would use all 8 bits, resulting in binary garbage as ciphertext, which can't be safely printed to the screen.

But the plaintext "To_be_or_not_to_be" still becomes the ciphertext " N~GjtCgb^~ep^uVjj", which actually isn't too bad at first glance.  However, note that decryption is the same operation as encryption, which is our first bug:
Also, note that changing one bit in the plaintext changes exactly one bit in the ciphertext, which allows differential cryptanalysis--given the same key, the bit differences between two messages are exactly the bit differences between the two ciphertexts, leaking information about the plaintext.  Also, changing one bit in the ciphertext changes the same bit in the plaintext.  This is really disaster, because this means the attacker can modify our messages in a very controlled way--for example, by flipping bits on ATM-to-bank messages to increase the apparent amount of an ATM deposit, or flipping the proceed/deny bit on a bank-to-ATM message to allow a huge withdrawal.
Finally, the key size is limited by the random number generator's state, which is only 32 bits here.  This allows explicit enumeration of all possible keys in only a few seconds.

Lawlor Cryptosystem 0 (LC0)

Back in 1996, as a skinny undergraduate in Nerland hall, I wrote an encryption scheme based on pseudorandom number generation.  The key setup phase went like this:
unsigned long randS ,mulBy ,addBy ;  // In the 1990's, these were 4 bytes each
unsigned long randS2,mulBy2,addBy2;

randS=(long)newDialog;randS2=(long)*myH; // OS memory handles as entropy source
randS2^=TickCount(); // 60ths of a second as entropy source
mulBy =184038653L;addBy =103701121L;//start with good, arbitrary numbers
mulBy2=194625377L;addBy2=203465495L;
short len=strlen(i);
for (short i=len;i<41;i++) // repeat passphrase to fill up buffer
pass[i]=pass[i%len]-i;
for (short i=0;i<8;i++)
{//pervert these numbers by the password in a deterministic way.
//Recall that the low four bits of the password are the most random ones.
//hence, we will xor by the low two bits each iteration (masking the passwd)
mulBy^=(long(pass[i<<2])<<(i<<1))+pass[i+3];
addBy^=(long(pass[(i<<2)+1])<<(i<<1))+pass[i+1];
mulBy2^=(long(pass[(i<<2)+2])<<(i<<1))+pass[i+2];
addBy2^=(long(pass[(i<<2)+3])<<(i<<1))+pass[i+0];
}
The integers at the top are generated from a combination of entropy sources like a high-resolution timer and an expanded version of the passphrase.  randS and randS2 get stored to the file.  Because there are six integers, a total of 192 bits, we're actually pretty good on total entropy--it'd be hard to enumerate all the keys.  Now we encrypt the data by combining two pseudorandom streams:
	dat[i]^=(randS =randS *mulBy +addBy )
^(randS2=randS2*mulBy2+addBy2);
We do have a potential differential cryptanalysis problem because we're just using XOR to combine the keys with the data, but the randS and randS2 entropy sources make this hard to pull off in practice.  There was also a checksum, designed to prevent data modification attacks, but the checksum is also a simple XOR of the data, so it'd be easy enough for an attacker to modify the checksum as well.

The big hole is that I don't do any checks or quality control on the multiplier "mulBy".  If it's zero, all the data gets XOR'd by the fixed value "addBy" (a very "weak key").  If it's even, repeatedly multiplying will shift all the entropy up and out of randS in under 32 steps.  Due to the way multiplication works, the high bits of randS have higher entropy, but I use the low bits (implicitly) to do the XOR.

Don't use this!