Redis Cache Architecture: Layered Defense and Optimization

A review of our Redis caching architecture—how we built layered defenses against classic cache failures, and what technical debt remains.

Defenses Already in Place

1. Cache Avalanche

Problem: Mass key expiration causes a traffic spike that overwhelms the database.

Solution (Infrastructure layer): TTL randomization with jitter.

  • The shared Redis library adds a random offset (e.g., ±10%) to every TTL
  • This spreads expirations across time, eliminating synchronized spikes

2. Cache Breakdown (Hotspot Invalidation)

Problem: A single hot key expires, and concurrent requests all hit the database simultaneously.

Solution (Business Logic layer): Mutex lock on cache miss.

  • On cache miss, requests compete for a distributed lock
  • Lock winner: Queries DB, writes cache, releases lock
  • Lock losers: Wait or retry until cache is populated

3. Cache Penetration

Problem: Queries for non-existent data always miss cache and hit the database. Common attack vector.

Solution (Business Logic layer): Cache null objects.

  • When DB returns empty, store a sentinel value (e.g., {"empty": true}) in Redis
  • Use a short TTL (e.g., 60s) to allow eventual data creation
  • Subsequent requests get the cached “not found” response

Limitation: Vulnerable to attacks using random keys (e.g., UUIDs). Attackers can flood Redis with infinite {"empty": true} entries, causing memory exhaustion and evicting legitimate hot data.


4. Connection & Transport Security

Connection Pooling:

  • Reuses TCP connections to avoid handshake overhead

TLS Encryption:

  • Encrypted transport for cross-network communication

Critical Technical Debt

Blocking Commands: KEYS Usage

Problem: The codebase uses KEYS for pattern matching and batch operations.

Risk: Redis is single-threaded. KEYS has O(N) complexity—it scans the entire keyspace. At millions of keys, this can block the Redis main thread for seconds.

During this block:

  • ❌ No reads or writes processed
  • ❌ All dependent services (auth, transactions) timeout
  • ❌ Effectively a self-inflicted DoS

Fix: Replace all KEYS with SCAN.

Command Complexity Blocking
KEYS pattern O(N) Yes
SCAN cursor MATCH pattern O(1) per call No

SCAN uses cursor-based iteration. It’s more complex to implement but guarantees non-blocking operation.


Action Items

  1. Remove KEYS: Audit and replace all KEYS usage with SCAN
  2. Consolidate patterns: Evaluate moving mutex lock and null caching logic into the shared library to reduce cognitive load on developers
  3. Bloom Filter for penetration defense: Null object caching is insufficient against random-key attacks. Add a Bloom Filter (or Cuckoo Filter) as the first-line defense—reject keys that “definitely don’t exist” before touching Redis
comments powered by Disqus