A review of our Redis caching architecture—how we built layered defenses against classic cache failures, and what technical debt remains.
Defenses Already in Place
1. Cache Avalanche
Problem: Mass key expiration causes a traffic spike that overwhelms the database.
Solution (Infrastructure layer): TTL randomization with jitter.
- The shared Redis library adds a random offset (e.g., ±10%) to every TTL
- This spreads expirations across time, eliminating synchronized spikes
2. Cache Breakdown (Hotspot Invalidation)
Problem: A single hot key expires, and concurrent requests all hit the database simultaneously.
Solution (Business Logic layer): Mutex lock on cache miss.
- On cache miss, requests compete for a distributed lock
- Lock winner: Queries DB, writes cache, releases lock
- Lock losers: Wait or retry until cache is populated
3. Cache Penetration
Problem: Queries for non-existent data always miss cache and hit the database. Common attack vector.
Solution (Business Logic layer): Cache null objects.
- When DB returns empty, store a sentinel value (e.g.,
{"empty": true}) in Redis - Use a short TTL (e.g., 60s) to allow eventual data creation
- Subsequent requests get the cached “not found” response
Limitation: Vulnerable to attacks using random keys (e.g., UUIDs). Attackers can flood Redis with infinite {"empty": true} entries, causing memory exhaustion and evicting legitimate hot data.
4. Connection & Transport Security
Connection Pooling:
- Reuses TCP connections to avoid handshake overhead
TLS Encryption:
- Encrypted transport for cross-network communication
Critical Technical Debt
Blocking Commands: KEYS Usage
Problem: The codebase uses KEYS for pattern matching and batch operations.
Risk: Redis is single-threaded. KEYS has O(N) complexity—it scans the entire keyspace. At millions of keys, this can block the Redis main thread for seconds.
During this block:
- ❌ No reads or writes processed
- ❌ All dependent services (auth, transactions) timeout
- ❌ Effectively a self-inflicted DoS
Fix: Replace all KEYS with SCAN.
| Command | Complexity | Blocking |
|---|---|---|
KEYS pattern |
O(N) | Yes |
SCAN cursor MATCH pattern |
O(1) per call | No |
SCAN uses cursor-based iteration. It’s more complex to implement but guarantees non-blocking operation.
Action Items
- Remove
KEYS: Audit and replace allKEYSusage withSCAN - Consolidate patterns: Evaluate moving mutex lock and null caching logic into the shared library to reduce cognitive load on developers
- Bloom Filter for penetration defense: Null object caching is insufficient against random-key attacks. Add a Bloom Filter (or Cuckoo Filter) as the first-line defense—reject keys that “definitely don’t exist” before touching Redis