How to handle one-time and recurring heavy tasks without blocking your API serverβusing BullMQ and Redis.
π₯ The Problem: Why APIs Can’t Do Heavy Lifting
Node.js Event Loop Blocking
Node.js runs on a single-threaded event loop. If your API handler runs a 5-minute report generation synchronously, every other request queues up behind it. Your entire server becomes unresponsive.
HTTP Timeout Constraints
Most API gateways and browsers enforce timeout limits (30-60 seconds). Long-running tasks will:
- Fail silently from the client’s perspective
- Leave data in inconsistent states
- Frustrate users with no feedback
Traffic Spike Risk
Without throttling, a sudden influx of heavy requests can spawn too many concurrent processes, exhausting CPU/memory and crashing your service.
π¦ What is BullMQ?
BullMQ is a Node.js job queue library built on top of Redis. It’s the successor to Bull (the most popular Redis-based queue for Node.js) with a complete rewrite for better performance and TypeScript support.
Why Not Build Your Own Queue?
You could use raw Redis commands (LPUSH/BRPOP) to build a queue. But production-grade queues need:
- β Retry with backoff β Failed jobs retry with exponential delay
- β Delayed jobs β Schedule jobs to run in the future
- β Rate limiting β Control job throughput
- β Job prioritization β Critical jobs jump the queue
- β Stalled job detection β Recover from worker crashes
- β Event hooks β Track job lifecycle (completed, failed, progress)
- β Repeatable jobs β Cron-like scheduling
BullMQ handles all of this out of the box. Rolling your own is reinventing the wheelβand you’ll likely miss edge cases that BullMQ has already solved.
ποΈ Architecture
The system uses a Producer-Consumer model with BullMQ as the orchestration layer. For heavy tasks, the worker spawns short-lived K8s Job pods.
|
|
| Role | Responsibility |
|---|---|
| Producer (API Server) | Receives request, enqueues job, returns 202 Accepted |
| Scheduler Service | Long-running pod with BullMQ Worker; reads cron from Redis, spawns K8s Jobs |
| Redis | Persists job metadata, schedules, and queue state |
| K8s Job Pod | Executes heavy computation, terminates after completion |
Job Types
Type A: One-time Jobs
Triggered by user actionsβexport reports, sync external data, send batch emails.
|
|
Type B: Recurring Jobs (Cron)
Replace traditional Linux crontab with distributed, HA scheduling.
|
|
β Advantage over crontab: No single scheduler process is a SPOF. If one server dies, another worker picks up the job. (Requires Redis HA for full resilience.)
Job Status Tracking
Users need visibility into long-running tasks. BullMQ provides built-in job state management and event hooks.
Updating Progress from Worker
|
|
Querying Job Status via API
|
|
Listening to Job Events
|
|
Throttling & Downstream Protection
Control concurrency to protect databases and third-party APIs:
|
|
This traffic shaping prevents your worker from DDoS-ing your own database or hitting external API rate limits.
β οΈ This limits worker throughput, not global API rate limiting.
Worker as K8s Orchestrator
For heavy tasks, the worker spawns a dedicated K8s Job:
|
|
β Benefit: BullMQ handles scheduling; K8s handles resource isolation and auto-cleanup.
π Redis’s Critical Role
1. Job Persistence
Jobs are stored in Redis. If a worker crashes mid-execution:
- Job status remains
activein Redis - On restart, BullMQ auto-retries stalled jobs
- No job metadata loss beyond the configured persistence window
2. Distributed Locking
With multiple workers competing for jobs, Redis ensures:
- Atomic job acquisition β Only one worker gets each job
- No duplicate execution β Race conditions eliminated via
BRPOPLPUSH/ Stream consumer groups
β οΈ Production Caveats
Understanding BullMQ’s Redis Usage
βΉοΈ BullMQ uses Redis List + Sorted Set, not Redis Stream. It relies on
BRPOPLPUSH(orBLMOVEin Redis 6.2+) for reliable job acquisition.
| Feature | List-based (BullMQ) | Stream (XADD/XREADGROUP) |
|---|---|---|
| Consumer Groups | β Implemented by BullMQ | β Native |
| Message Acknowledgment | β Implemented by BullMQ | β
Native (XACK) |
| Retry/Pending Tracking | β Via Sorted Set | β Built-in |
| Maturity | β Battle-tested | β Newer (Redis 5.0+) |
π If building a custom queue from scratch, Redis Stream provides native consumer groups and acknowledgment. However, BullMQ’s List-based implementation is production-proven and handles edge cases that raw Stream operations don’t cover out of the box.
Redis Persistence Matters
- RDB only β Jobs created in the last few minutes may be lost on crash
- AOF with
everysecβ At most 1 second of job loss - AOF with
alwaysβ Durability guaranteed, but slower writes
Choose based on your job criticality.
Worker Crash β Job Loss (But Needs Config)
BullMQ has stalled job detection, but you must configure:
stalledIntervalβ How often to check for stalled jobsmaxStalledCountβ How many times to retry before marking as failed
Without this, crashed jobs may hang in active state forever.
β οΈ K8s Job State Mismatch: If the worker exits after spawning a K8s Job but before updating BullMQ job state, the two systems will be out of sync. Consider implementing reconciliation logic.
π Summary
- Decoupling β API responds instantly; heavy work happens asynchronously
- Resilience β Redis persists job state; crashes don’t lose work
- Scalability β Add more workers to increase throughput
- Protection β Throttling prevents downstream overload
The Producer-Consumer pattern shifts time and resource pressure from your API layer to dedicated workersβkeeping your core service lightweight and responsive.