RSA Public Key Encryption

CS 463/480 Lecture, Dr. Lawlor

RSA encryption and decryption are actually shockingly simple--they're just modular exponentiation.
To encrypt a message, raise it to the power e modulo n.
To decrypt a message, raise it to the power d modulo n.  This will recover the original message.

To sign a message, raise it to the power d modulo n.
To check a signed message, raise it to the power e modulo n.   This will recover the original message.
# Encrypt/decrypt a test message
plaintext=1234567 # Secret message, mod n (DANGER! must pad a real message!)
ciphertext=pow(plaintext,e,n) # Encrypted (using public key e and public modulus n)
decrypted=pow(ciphertext,d,n) # Decrypted (using secret key d)

(Try this in NetRun now!)

The hard part about the implementation is creating a public modulus n and your secret key d.  One option is to use an existing program to generate them, such as OpenSSL:

	openssl genrsa -out private_key.pem 1024
	openssl rsa -pubout -in private_key.pem -out public_key.pem
	openssl rsa -text -in private_key.pem

One run of this on my machine produced this private key:

Private-Key: (1024 bit)
publicModulus:
00:e0:ae:82:8f:6a:92:0a:bb:66:95:34:04:e6:82:
03:9a:fd:93:1d:6c:ed:e7:50:7a:74:da:ba:70:8f:
f9:a4:b4:16:de:f9:9c:30:bf:15:d5:0e:6d:27:24:
9f:ea:69:4d:a2:21:22:6e:47:fe:cc:9f:8d:b0:84:
3f:f3:8e:fc:04:83:44:71:0d:ba:fd:7d:3f:f3:28:
05:49:1e:47:a9:a7:14:94:57:71:5f:47:4f:4a:54:
9f:b3:e4:48:6d:28:13:50:48:37:56:8b:33:d5:fa:
b1:f5:89:b9:a6:16:2f:47:c2:9b:fd:14:0d:d1:ba:
3a:41:4b:88:88:ed:a8:a1:ef
publicExponent: 65537 (0x10001)
privateExponent:
26:7d:5e:ba:68:dc:49:e0:5e:ab:72:b4:e0:34:27:
9f:f6:8e:ac:3c:cb:e8:93:7d:d6:e4:dd:89:88:f0:
90:49:95:9d:6f:0f:55:be:76:64:00:4b:ac:a7:f6:
89:36:ae:e8:f6:5a:2a:a0:44:c3:13:16:37:c6:00:
1a:9e:45:07:c2:af:c7:0b:66:a0:ef:60:01:c1:e1:
e8:d2:c7:f5:bb:f0:f9:82:3a:67:f8:08:46:1e:76:
63:29:94:c8:3b:d3:ce:0a:fb:90:84:ce:f8:b2:a5:
17:2c:73:3e:c4:fd:7f:b1:08:61:be:0b:6c:b3:81:
f8:50:fe:20:62:09:b0:31

You can test this key pair for encryption/decryption in Python like this:

# Convert a string with hex digits, colons, and whitespace to a long integer
def hex2int(hexString):
	return int("".join(hexString.replace(":","").split()),16)

# Get modulus and key from: openssl rsa -text -in private_key.pem
n=hex2int("""
    00:e0:ae:82:8f:6a:92:0a:bb:66:95:34:04:e6:82:
    03:9a:fd:93:1d:6c:ed:e7:50:7a:74:da:ba:70:8f:
    f9:a4:b4:16:de:f9:9c:30:bf:15:d5:0e:6d:27:24:
    9f:ea:69:4d:a2:21:22:6e:47:fe:cc:9f:8d:b0:84:
    3f:f3:8e:fc:04:83:44:71:0d:ba:fd:7d:3f:f3:28:
    05:49:1e:47:a9:a7:14:94:57:71:5f:47:4f:4a:54:
    9f:b3:e4:48:6d:28:13:50:48:37:56:8b:33:d5:fa:
    b1:f5:89:b9:a6:16:2f:47:c2:9b:fd:14:0d:d1:ba:
    3a:41:4b:88:88:ed:a8:a1:ef
""")
d=hex2int("""
    26:7d:5e:ba:68:dc:49:e0:5e:ab:72:b4:e0:34:27:
    9f:f6:8e:ac:3c:cb:e8:93:7d:d6:e4:dd:89:88:f0:
    90:49:95:9d:6f:0f:55:be:76:64:00:4b:ac:a7:f6:
    89:36:ae:e8:f6:5a:2a:a0:44:c3:13:16:37:c6:00:
    1a:9e:45:07:c2:af:c7:0b:66:a0:ef:60:01:c1:e1:
    e8:d2:c7:f5:bb:f0:f9:82:3a:67:f8:08:46:1e:76:
    63:29:94:c8:3b:d3:ce:0a:fb:90:84:ce:f8:b2:a5:
    17:2c:73:3e:c4:fd:7f:b1:08:61:be:0b:6c:b3:81:
    f8:50:fe:20:62:09:b0:31	
""")
e=65537  # encryption exponent

m=123456789 # message

enc=pow(m,e,n)  # m^e mod n   (encryption)
dec=pow(enc,d,n)  # m^e^d mod n   (decryption)
print("Encrypted: enc = ",enc);
print("Decrypted: dec = ",dec);

(Try this in NetRun now!)

Creating an RSA Key Pair

Here's the process for making your own modulus n and key d:

# Set up RSA keys (do this once, beforehand, because it's slow!)
nbits=1024  # Use at least 1024 bits for security
p=random_prime(nbits//2) # secret primes each have half the bits
q=random_prime(nbits//2)
print("Secret random primes p =",p,"   q =",q);

n=p*q   # Public modulus
e=65537 # Public encryption exponent
print("Public modulus  n = p*q = ",n);
print("Public encryption exponent e =",e);

totient=(p-1)*(q-1)  # == Euler's totient function of n
d=modular_inverse(e,totient) # Secret key
print("Secret key d=",d);

(Try this in NetRun now!)

Why does this work?

This works because the modulus used to pick d, (p-1)*(q-1), is actually the Euler totient function of n, where totient(k) counts the number of integers between 1 and k that are relatively prime to k.  For a prime P, totient(P)=P-1, because everything between 1 and P-1 is relatively prime to P (relatively prime means gcd(a,P)=1).  If n was prime, totient(n)=n-1, but we built n=p*q from the product of two primes, so our count of relative primes is missing all the multiples of p or q up to n; there are p-1 and q-1 of each, for a total of n-1-(p-1)-(q-1) relative primes, which is algebraically equal to (p*q-p-q+1)=(p-1)*(q-1)=totient(n).

Euler's theorem states that for any integer a coprime to n, the totient function gives the "order of the multiplicative group", meaning:
    atotient(n) = 1 (mod n)

Both sides are equal to 1.  We can raise 1 to any power and still get 1:
    ah*totient(n) = 1 (mod n)

We can now multiply both sides by a to get back to a again (this equation will come in handy in a moment!).
    a1+h*totient(n) = a (mod n)

This is true for any numbers a and n that are relatively prime.

In RSA, we start with a message m, raise it to the power e, then raise it to the power d to recover the original message:
    (me)d = m (mod n)

We picked d so it's the multiplicative inverse of e, modulo totient(n), or
    ed = 1 (mod totient(n))
Because of the way modular arithmetic works, this means there's some integer h with:
    ed = 1 + h*totient(n)

But now we can just substitute the message m into Euler's theorem to get:
    m1+h*totient(n) = med = m (mod n)

This means picking e and d correctly will allow us to encrypt and decrypt data correctly.  We can do this because we know totient(n)=(p-1)*(q-1).  It's hard for other people because factoring a big number into primes is hard.

Generating Big Prime Numbers

For any of this to work, we need a way to generate huge prime numbers randomly, in a way other people can't guess them.

Fermat's Little Theorem states that if P is prime, and 1<=a<P, then
   aP-1 = 1 mod P
(This is a corollary of Euler's theorem, since for a prime P, totient(P) = P-1)

The number a is thus a potential "witness" to the primality of P.  If P really is prime, any number will attest to its primality by giving 1.  If P is composite, a "witness" number will result in a number other than 1; while a "liar" number will still result in an answer of 1.  We don't have a good way of distinguishing witnesses and liars, so we just try many of them randomly.
from time import time; # timing
from random import SystemRandom; # cryptographic random byte generator
rand=SystemRandom(); # create strong random number generator

# This is the prime modulus used by SECG secp256k1, the BitCoin elliptic curve
P=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F;

start=time();
for repeat in range(10):
	a=rand.getrandbits(100); # random potential "witness"
	print("a = ",a);
	res=pow(a,P-1,P);
	print("a^P-1 mod P = ",res);

elapsed=time()-start;
print("Elapsed=",elapsed," seconds");

(Try this in NetRun now!)

Carmichael numbers are not prime, but (almost like mafia numbers) do not have any surviving Fermat witnesses, they're all liars that output 1.  To avoid accidentally getting a Carmichael number instead of a prime, we use the more complex Miller-Rabin primality test, which has a 3/4 probability of failing per pass on composite numbers, so running it 64 times gives a 2-128 chance of declaring a composite number as prime.

import math
from time import time # timing
from random import SystemRandom # cryptographic random byte generator
rand=SystemRandom() # create strong random number generator

# This is the prime modulus used by SECG secp256k1, the BitCoin elliptic curve
P=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F

# Miller-Rabin primality test (NIST FIPS 186-4 Appendix C)
#    w is the number to test for primality.
#    iterations is the number of tests (64 is reasonable)
def Miller_Rabin(w,iterations):
	a=0 # largest integer such that 2^a divides w-1
	while (0==((w-1)%(1<<(a+1)))): 
		a=a+1
	m=(w-1)//(1<<a)  # w-1 = 2^a * m
	# print("w=",w,"  m=",m,"  a=",a)
	wlen=int(math.log(w,2)) # counts bits in w
	for iter in range(iterations):
		# Pick a random potential witness b, 1<b<w-1
		foundB=False
		while (not foundB):
			b=rand.getrandbits(wlen)
			foundB=(b>1) and (b<w-1)
		
		# Main Miller-Rabin test:
		z=pow(b,m,w)
		# print("z=",z)
		if (z==+1 or z==w-1): continue # quick pass
		
		# We may need a cleanup pass:
		for j in range(1,a-1):
			z=pow(z,2,w)
			# print("    j=",j,"  z=",z)
			if (z==w-1): break # pass
			if (z==1): return False # fail
		return False; # fail
	
	# It passed all the tests--it's probably prime!
	return True

start=time()
for j in range(10):
	print("Miller-Rabin test(P -",j,"): ",Miller_Rabin(P-j,64))

elapsed=time()-start
print("Elapsed=",elapsed," seconds")

(Try this in NetRun now!)

Finding large primes is the most time-inefficient part of RSA setup, taking 100x longer than an encryption operation!

Here's how the openssl key generation implementation scales with key size, the number of bits in the product of the two primes:

Key Size
Generation Time (sec)
Security
512 bits
0.01
Poor--crackable in a few hours for about $100 on the cloud.
1024 bits
0.04
OK--nobody has ever admitted factoring numbers this big.
2048 bits
0.09
Good
4096 bits
1.2
Excellent
8192 bits
20
Overkill--if efficient factoring algorithms exist, this won't help much.
16384 bits
100
Total overkill

Flaw: Direct Factoring

The core operation in RSA key generation is:
   p = random_prime(bits/2)
   q = random_prime(bits/2)
   n = p*q       # public modulus
   e = 65537
   d = modular_inverse(e,(p-1)*(q-1))  # private key

Because the public modulus is just the product of two primes, and you can recompute the private key via the modular inverse, factoring is one obvious attack on RSA.  Based on brute force factoring, trying all divisors up to sqrt(n), you might expect factoring an n-bit product to take 2n/2 time, so 200 bit products would be fine.

However, factoring algorithms exist which are much faster than brute force, such as elliptic curve factoring (coming in about a month!).  For example, GMP-ECM can factor a 128-bit product in about 1 second:

  $ echo 308419861048375364105417853828868357721 | ecm 1000000
GMP-ECM 6.4.4 [configured with GMP 5.1.3, --enable-asm-redc] [ECM]
Input number is 308419861048375364105417853828868357721 (39 digits)
Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=116281483
Step 1 took 815ms
********** Factor found in step 1: 16951655882749288739
Found probable prime factor of 20 digits: 16951655882749288739
Probable prime cofactor 18194084588646957139 has 20 digits

Factoring a 180 bit product takes about 2 minutes:

 $ echo 1177700523080677245883300985764599032955400633132267477 | ecm 100000000
GMP-ECM 6.4.4 [configured with GMP 5.1.3, --enable-asm-redc] [ECM]
Input number is 1177700523080677245883300985764599032955400633132267477 (55 digits)
Using B1=100000000, B2=776268975310, polynomial Dickson(30), sigma=4175794950
Step 1 took 89854ms
Step 2 took 44118ms
********** Factor found in step 2: 1207946822204051595384881203
Found probable prime factor of 28 digits: 1207946822204051595384881203
Probable prime cofactor 974960570641523643171225559 has 27 digits

Factoring a 200-bit product takes about half an hour on my laptop.

NIST recommends much longer key sizes for RSA than either a symmetric algorithm (like AES) or an elliptic curve:

Symmetric Key
RSA Key
ECC Key
80 bits
1024 bits
160 bits
112 bits
2048 bits
224 bits
128 bits
3072 bits
256 bits
192 bits
7680 bits
384 bits
256 bits
15360 bits
512 bits

Progress in algorithms for factoring are the primary determinant of security of RSA going forward. 

Flaw: Prime Generation

The prime numbers p and q need to be truly unpredictable, not just random looking.  This means calling a deterministic random number generator like "rand" is completely insufficient--consider there are only 4 billion possible values that can be passed to srand(). 

Back in 1996, this was a known flaw in the then wildly popular web browser Netscape, which based the key on the current time in seconds (predictable on NTP, and published in HTTP headers too).  In 2008, Debian's version of openssl was found to have a similar flaw, basing the key only on the process ID (only 2^16 options, and predictable on servers).  Ironically, openssl used a good cryptographic randomness source, but a Debian audit determined the code that applied that randomness also read from uninitialized memory (bad in theory, OK for random number generation) and commented out the whole thing.

The fix: make sure your implementation is using a good cryptographic random number source, not pseudorandom numbers.

Folks that compute gcd on the products of lots of published keys keep finding common factors, which are due to re-used primes, mostly on embedded hardware such as routers or VPN hardware.

Flaw: Side Channel (Timing, Heat, etc) Attacks

Each operation used in RSA encryption is a slow multi-precision operation, so it's easier to measure the time taken to do each sub-multiplication in an exponentiation (compared to, for example, the sub-clockcycle time to do the carry bits in a normal addition).  This is bad, because tons of operations have data-dependent timing--for example, decrypting data (m=cd mod n) means looking at each bit of your private key, and choosing to include a multiple of c.  This sort of attack can be extremely high precision when coming from a different VM on the same cloud server.

Possible fixes:

Flaw: Message Enumeration

Recall that:
To encrypt a message m into ciphertext c, use the public key e: c = me mod n
To decrypt ciphertext c back into the message m, use your private key d: m=cd mod n

Theoretically, you need the private key to decrypt the message.  However, anybody can check if a possible message m encrypts to the intercepted ciphertext c, because anybody can encrypt a message for you.  In other words, public key "encryption" works more like a hash function, where anybody can check if the hashes match.

Secret key ciphers don't have this flaw, because you need the key to encrypt.

The fix is to pad each message with random data--the exact analog of using a salted hash for passwords.  256 bits is enough to make an attacker's search space entirely infeasible.  Some follow this with a weak encryption scheme, like the hash-based one round Feistel used in OAEP, to ensure that a partial break won't reveal anything valuable (it also messes up multiplicative structure, below).

Flaw: Multiplicative Structure

Since RSA is based on simple exponentiation, encrypting (or decrypting) the product of two messages gives the product of the two ciphertexts:
c = me mod n
c' = m'e mod n
c * c' = me * m'e = (m * m')e mod n

There are enough ways to exploit this that you should never encrypt or sign a raw message, only a carefully packaged message. 

For encryption, see the previous fix OAEP.

For signing, several options exist, but the most popular signing scheme is the hash based RSASSA-PKCS1-v1.5.  This boils down to signing a binary value consisting of:

Flaw: Malware Certificates

Some machines come with adware or malware that installs its own unrestricted HTTPS root certificate, which it uses to man-in-the-middle web traffic for the purpose of installing ads.  For example, Lenovo recently stopped shipping with the Superfish adware/malware, which had a single(!) hardcoded private key, recently pulled out of the executable with "strings", and a password of komodia.