Lesson 6: Password-Based KDFs (PBKDF2/Argon2)
Alice’s Bookstore — Chapter 6
Alice’s bookstore now has customer accounts. Customers create passwords to log in. One night, Alice gets an email from her hosting provider:
“We detected a breach. An attacker accessed your database.”
Alice panics. She stored passwords using SHA-256 — that’s a hash, not plaintext, so it should be safe… right? Bob shakes his head:
“SHA-256 is too fast. An attacker with a GPU can try 10 billion passwords per second. Most of your customers use common passwords — they’ll be cracked in minutes.”
“But I hashed them! Isn’t that enough?”
“Fast hashing is great when you WANT speed — like in HKDF. But for passwords, speed is the enemy. You need a hash that’s intentionally slow.”
Prerequisites: Lesson 5 (HKDF). You understand key derivation — now learn why passwords need a different approach.
The core problem: passwords are terrible secrets
In Lesson 5, we used HKDF to derive encryption keys from a DH shared secret. That shared secret was 32 random bytes — impossible to guess.
But what if the secret is a password? This happens more often than you’d think:
When the user IS the key:
Password manager → master password → decrypt vault
Encrypted disk → login password → decrypt drive (FileVault, LUKS, BitLocker)
Encrypted ZIP/7z → password → decrypt files
SSH key passphrase → passphrase → decrypt private key
Encrypted backups → password → decrypt backup (restic, borgbackup)
Crypto wallet → password → decrypt wallet file
E2E encrypted notes → password → decrypt your notes (Standard Notes)
KeePass → master password → decrypt password database
In all these cases, there’s no DH key exchange, no pre-shared random secret. Just a human’s password. And passwords are:
A random 256-bit key:
7f3a9b2c... (32 bytes of pure randomness)
Possible values: 2^256 ≈ 10^77
Time to brute-force: longer than the universe exists
A typical password:
"password123"
Possible values: maybe 10 billion common passwords
Time to brute-force with HKDF: seconds
Feel the problem yourself — run this:
python3 -c "
import hashlib, time
# Simulate an attacker trying passwords with fast hashing
passwords = ['password', '123456', 'qwerty', 'letmein', 'monkey',
'dragon', 'master', 'abc123', 'password123', 'target']
target_hash = hashlib.sha256(b'password123').hexdigest()
start = time.time()
for pw in passwords:
h = hashlib.sha256(pw.encode()).hexdigest()
if h == target_hash:
elapsed = time.time() - start
print(f'CRACKED! Password is: {pw}')
print(f'Time: {elapsed*1000:.3f}ms')
print(f'Tried {passwords.index(pw)+1} passwords')
break
print()
print('Now imagine trying 10 billion passwords instead of 10.')
print('At 100 million hashes/sec (one GPU), that takes 100 seconds.')
print('Your password is cracked in under 2 minutes.')
"
That’s the problem. HKDF and SHA-256 are too fast. An attacker with a GPU can try billions of passwords per second.
The solution: make it slow on purpose
The idea is simple:
Fast hash (SHA-256):
password → key instantly
Attacker: 1,000,000,000 attempts/second
Slow hash (PBKDF2, 100K iterations):
password → hash → hash → ... (100K times) → key
Attacker: 10 attempts/second
Memory-hard hash (Argon2):
password → fill 64MB of RAM with data → key
Attacker: needs 64MB PER attempt
GPU with 1000 cores but only 8GB RAM?
→ only 125 parallel attempts, not 1000
Real-life analogy: the vault door
Regular door lock (HKDF):
Insert key → door opens instantly
If someone has a lockpick (GPU), they try 1 billion keys/second
┌──┐
│🚪│ → click → open
└──┘
Bank vault door (PBKDF2):
Insert key → wait 1 second → door opens
Lockpick still works, but only 1 attempt/second
┌──┐
│🚪│ → click → ⏰ 1 second → open
└──┘
Vault door + weight requirement (Argon2):
Insert key → must also carry 64kg weight → wait → door opens
Lockpick works, but you need one 64kg weight per attempt
Can't try 1000 doors in parallel unless you have 64,000 kg of weights
┌──┐
│🚪│ → click → 🏋️ 64kg → ⏰ 1 second → open
└──┘
The delay and memory cost don’t bother legitimate users (one login = one derivation). They devastate attackers who need billions of attempts.
See the difference yourself
# FAST: SHA-256 — how many per second?
python3 -c "
import hashlib, time
start = time.time()
for i in range(1_000_000):
hashlib.sha256(b'password123').digest()
elapsed = time.time() - start
rate = 1_000_000 / elapsed
print(f'SHA-256: {rate:,.0f} hashes/second')
print(f' 10 billion passwords cracked in: {10_000_000_000/rate:.0f} seconds')
"
# SLOW: PBKDF2 100K iterations
python3 -c "
import hashlib, os, time
salt = os.urandom(16)
start = time.time()
for i in range(10):
hashlib.pbkdf2_hmac('sha256', b'password123', salt, 100_000)
elapsed = time.time() - start
rate = 10 / elapsed
print(f'PBKDF2: {rate:,.1f} hashes/second')
print(f' 10 billion passwords cracked in: {10_000_000_000/rate/3600/24/365:.0f} years')
"
Run both. The difference is dramatic — millions per second vs a handful per second.
What does a password KDF actually do?
All password KDFs do the same thing conceptually:
Input: "password123" (weak, guessable)
Output: 7f3a9b2c4d... (32 bytes, looks random, usable as encryption key)
Same as HKDF — but intentionally slow. The slowness IS the security feature.
Think of it this way:
- HKDF is a machine that instantly turns a secret into a key
- A password KDF is the same machine, but with a speed limiter bolted on
- The speed limiter wastes CPU time (PBKDF2) or RAM (Argon2) on purpose
- A legitimate user calls it once (1 second delay — barely noticeable)
- An attacker calls it billions of times (1 second × 1 billion = 31 years)
The three password KDFs
PBKDF2 (Password-Based Key Derivation Function 2)
The oldest, simplest. Runs HMAC in a loop:
PBKDF2(password, salt, iterations) → key
Internally:
round 1: HMAC(password, salt || 1) → h1
round 2: HMAC(password, h1) → h2
round 3: HMAC(password, h2) → h3
...
round 100000: HMAC(password, h99999) → h100000
key = h1 XOR h2 XOR h3 XOR ... XOR h100000
Each iteration is cheap individually, but 100,000 of them add up. The problem: GPUs run HMAC very fast in parallel. PBKDF2 is CPU-bound but not memory-bound.
bcrypt
Designed specifically for password hashing. Uses a modified Blowfish cipher internally. Harder to parallelize on GPUs than PBKDF2, but fixed output size (not a general-purpose KDF).
Argon2 (the modern choice)
Winner of the 2015 Password Hashing Competition. Three parameters:
Argon2(password, salt, time_cost, memory_cost, parallelism) → key
time_cost: number of iterations (like PBKDF2)
memory_cost: MB of RAM required per computation
parallelism: number of threads to use
Example:
Argon2id("password123", salt, t=3, m=65536, p=4)
→ requires 64MB RAM and 4 threads for ~1 second
The memory requirement is what makes Argon2 special. GPUs have limited per-core memory — requiring 64MB per attempt makes GPU attacks 1000x slower.
┌──────────────────────────────────────────────────────┐
│ Password KDF Comparison │
│ │
│ Algorithm CPU Cost Memory Cost GPU Resistant │
│ ─────────────────────────────────────────────────── │
│ PBKDF2 High None No │
│ bcrypt High 4KB Somewhat │
│ Argon2id High Configurable Yes │
│ │
│ For new projects: always use Argon2id. │
│ PBKDF2 is acceptable if Argon2 isn't available. │
│ bcrypt for password hashing (not key derivation). │
└──────────────────────────────────────────────────────┘
Salt: preventing rainbow tables
A salt is random data added to the password before hashing. Without salt, two users with the same password get the same hash — an attacker can precompute a table of “password → hash” pairs (rainbow table).
Without salt:
hash("password123") → 0xabc...
hash("password123") → 0xabc... ← same! rainbow table works
With salt (unique per user):
hash("password123" + salt_alice) → 0x123...
hash("password123" + salt_bob) → 0x789... ← different!
Rainbow table is useless — would need a separate table per salt.
Salt should be:
- Random: 16+ bytes from a CSPRNG
- Unique: different for every user / every encryption
- Stored alongside the hash: it’s not secret, just unique
Try it yourself
# Generate a random salt
openssl rand -hex 16
# PBKDF2 with OpenSSL
echo -n "mypassword" | openssl kdf -keylen 32 -kdfopt digest:SHA256 \
-kdfopt pass:mypassword -kdfopt salt:$(openssl rand -hex 16) \
-kdfopt iter:100000 PBKDF2 | xxd
# Or use Python to see PBKDF2 in action:
python3 -c "
import hashlib, os, time
password = b'password123'
salt = os.urandom(16)
start = time.time()
key = hashlib.pbkdf2_hmac('sha256', password, salt, 100_000)
elapsed = time.time() - start
print(f'Salt: {salt.hex()}')
print(f'Key: {key.hex()}')
print(f'Time: {elapsed:.3f}s')
print(f'Rate: {1/elapsed:.0f} attempts/second')
print()
print(f'With 10B passwords to try: {10_000_000_000 * elapsed / 3600:.0f} hours')
"
# Compare with fast hashing (SHA-256):
python3 -c "
import hashlib, time
password = b'password123'
start = time.time()
for _ in range(1_000_000):
hashlib.sha256(password).digest()
elapsed = time.time() - start
print(f'SHA-256: {1_000_000/elapsed:.0f} hashes/second')
print('That is why you never use plain SHA-256 for passwords.')
"
PBKDF2 in Rust
#![allow(unused)]
fn main() {
use pbkdf2::pbkdf2_hmac;
use sha2::Sha256;
let password = b"my secret password";
let salt = b"random-salt-here"; // in real code: use rand to generate
let iterations = 100_000;
let mut key = [0u8; 32];
pbkdf2_hmac::<Sha256>(password, salt, iterations, &mut key);
println!("Derived key: {}", hex::encode(key));
}
Argon2 in Rust
#![allow(unused)]
fn main() {
use argon2::Argon2;
let password = b"my secret password";
let salt = b"random-salt-here";
let mut key = [0u8; 32];
let argon2 = Argon2::default(); // Argon2id, t=3, m=64MB, p=4
argon2.hash_password_into(password, salt, &mut key).unwrap();
println!("Derived key: {}", hex::encode(key));
}
When to use which
Deriving a key from a DH shared secret → HKDF (Lesson 5)
Fast, the input is already strong.
Deriving a key from a user's password → Argon2id
Slow on purpose, the input is weak.
Storing a password hash in a database → Argon2id or bcrypt
You don't need the key, just a hash to verify against.
API key / token derivation → HKDF
The input (master secret) is already random.
Exercises
Exercise 1: PBKDF2 key derivation
Add pbkdf2 and sha2 to your dependencies. Derive a 32-byte key from a password with 100,000 iterations. Print the key as hex. Verify that the same password + salt always gives the same key.
Exercise 2: Timing comparison
Time how long PBKDF2 takes with 1,000 vs 10,000 vs 100,000 vs 1,000,000 iterations. Plot the results. How many passwords/second can an attacker try at each level?
Exercise 3: Argon2 key derivation
Add argon2 to your dependencies. Derive a key with default parameters. Then try:
t_cost = 1, m_cost = 16384(16MB) — fastt_cost = 3, m_cost = 65536(64MB) — defaultt_cost = 10, m_cost = 262144(256MB) — paranoid
Measure time and observe memory usage with ps -o rss -p <pid>.
Exercise 4: Salt uniqueness
Derive keys from the same password with two different salts. Verify the keys are completely different. Then derive from the same password + same salt twice — keys should match. This proves salt prevents precomputation.
Exercise 5: Encrypt a file with a password
Combine this lesson with Lesson 2:
- Ask for a password
- Generate a random salt
- Derive a key with Argon2
- Encrypt a file with ChaCha20-Poly1305 using that key
- Store
salt || nonce || ciphertextto disk - Decrypt: read salt, ask for password, derive key, decrypt
This is the foundation of the Password Manager Vault project.