[Redis Beyond Caching] Part 1: Asynchronous Job Queue with BullMQ

How to handle one-time and recurring heavy tasks without blocking your API serverβ€”using BullMQ and Redis.

πŸ”₯ The Problem: Why APIs Can’t Do Heavy Lifting

Node.js Event Loop Blocking

Node.js runs on a single-threaded event loop. If your API handler runs a 5-minute report generation synchronously, every other request queues up behind it. Your entire server becomes unresponsive.

HTTP Timeout Constraints

Most API gateways and browsers enforce timeout limits (30-60 seconds). Long-running tasks will:

  • Fail silently from the client’s perspective
  • Leave data in inconsistent states
  • Frustrate users with no feedback

Traffic Spike Risk

Without throttling, a sudden influx of heavy requests can spawn too many concurrent processes, exhausting CPU/memory and crashing your service.


πŸ“¦ What is BullMQ?

BullMQ is a Node.js job queue library built on top of Redis. It’s the successor to Bull (the most popular Redis-based queue for Node.js) with a complete rewrite for better performance and TypeScript support.

Why Not Build Your Own Queue?

You could use raw Redis commands (LPUSH/BRPOP) to build a queue. But production-grade queues need:

  • βœ… Retry with backoff β€” Failed jobs retry with exponential delay
  • βœ… Delayed jobs β€” Schedule jobs to run in the future
  • βœ… Rate limiting β€” Control job throughput
  • βœ… Job prioritization β€” Critical jobs jump the queue
  • βœ… Stalled job detection β€” Recover from worker crashes
  • βœ… Event hooks β€” Track job lifecycle (completed, failed, progress)
  • βœ… Repeatable jobs β€” Cron-like scheduling

BullMQ handles all of this out of the box. Rolling your own is reinventing the wheelβ€”and you’ll likely miss edge cases that BullMQ has already solved.


πŸ—οΈ Architecture

The system uses a Producer-Consumer model with BullMQ as the orchestration layer. For heavy tasks, the worker spawns short-lived K8s Job pods.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   API Server     β”‚     β”‚           Scheduler Service                 β”‚
β”‚   (Producer)     β”‚     β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚                  β”‚     β”‚   β”‚ QueueSched. β”‚    β”‚   Worker    β”‚        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜        β”‚
         β”‚               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ push job                 β”‚                  β”‚
         β–Ό                          β–Ό                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Redis (BullMQ Queue)                        β”‚
β”‚  β€’ Stores job metadata & cron schedules                             β”‚
β”‚  β€’ Manages queue state (waiting β†’ active β†’ completed)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
                                   β”‚ Worker picks up job, spawns K8s Job
                                   β–Ό
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚     K8s Job Pod         β”‚
                      β”‚  (ephemeral, auto-exit) β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Role Responsibility
Producer (API Server) Receives request, enqueues job, returns 202 Accepted
Scheduler Service Long-running pod with BullMQ Worker; reads cron from Redis, spawns K8s Jobs
Redis Persists job metadata, schedules, and queue state
K8s Job Pod Executes heavy computation, terminates after completion

Job Types

Type A: One-time Jobs

Triggered by user actionsβ€”export reports, sync external data, send batch emails.

1
2
3
4
5
6
7
8
9
// Producer: API endpoint
app.post('/export', async (req, res) => {
  const job = await exportQueue.add('generate-report', {
    userId: req.user.id,
    dateRange: req.body.dateRange,
  });
  
  res.status(202).json({ jobId: job.id });
});

Type B: Recurring Jobs (Cron)

Replace traditional Linux crontab with distributed, HA scheduling.

1
2
3
4
5
6
7
8
9
// Repeatable job: runs daily at midnight
await reportQueue.add(
  'daily-summary',
  { type: 'daily' },
  {
    repeat: { cron: '0 0 * * *' },
    jobId: 'daily-summary', // prevent duplicates
  }
);

βœ… Advantage over crontab: No single scheduler process is a SPOF. If one server dies, another worker picks up the job. (Requires Redis HA for full resilience.)


Job Status Tracking

Users need visibility into long-running tasks. BullMQ provides built-in job state management and event hooks.

Updating Progress from Worker

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Worker: report progress during execution
const worker = new Worker('export', async (job) => {
  const totalSteps = 100;
  
  for (let i = 0; i < totalSteps; i++) {
    await processChunk(i);
    await job.updateProgress((i + 1) / totalSteps * 100);
  }
  
  return { downloadUrl: '/reports/123.csv' };
});

Querying Job Status via API

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// API: check job status
app.get('/jobs/:id/status', async (req, res) => {
  const job = await exportQueue.getJob(req.params.id);
  
  if (!job) {
    return res.status(404).json({ error: 'Job not found' });
  }
  
  const state = await job.getState(); // waiting, active, completed, failed
  const progress = job.progress;
  const result = job.returnvalue;
  
  res.json({ state, progress, result });
});

Listening to Job Events

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Subscribe to job lifecycle events
import { QueueEvents } from 'bullmq';

const queueEvents = new QueueEvents('export');

queueEvents.on('progress', ({ jobId, data }) => {
  console.log(`Job ${jobId} progress: ${data}%`);
  // Push to frontend via WebSocket/SSE
});

queueEvents.on('completed', ({ jobId, returnvalue }) => {
  console.log(`Job ${jobId} completed:`, returnvalue);
});

queueEvents.on('failed', ({ jobId, failedReason }) => {
  console.error(`Job ${jobId} failed:`, failedReason);
});

Throttling & Downstream Protection

Control concurrency to protect databases and third-party APIs:

1
2
3
4
5
6
7
const worker = new Worker('export', processJob, {
  concurrency: 10, // max 10 jobs running simultaneously
  limiter: {
    max: 100,      // max 100 jobs
    duration: 60000, // per minute
  },
});

This traffic shaping prevents your worker from DDoS-ing your own database or hitting external API rate limits.

⚠️ This limits worker throughput, not global API rate limiting.

Worker as K8s Orchestrator

For heavy tasks, the worker spawns a dedicated K8s Job:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import * as k8s from '@kubernetes/client-node';

const kc = new k8s.KubeConfig();
kc.loadFromDefault();
const batchApi = kc.makeApiClient(k8s.BatchV1Api);

const worker = new Worker('data-sync', async (job) => {
  const { source } = job.data;
  
  const k8sJob = {
    metadata: { name: `data-sync-${Date.now()}` },
    spec: {
      template: {
        spec: {
          containers: [{
            name: 'sync-worker',
            image: 'myregistry/data-sync:latest',
            env: [
              { name: 'SOURCE', value: source },
              { name: 'JOB_ID', value: job.id },
            ],
          }],
          restartPolicy: 'Never',
        },
      },
    },
  };
  
  await batchApi.createNamespacedJob('production', k8sJob);
  return { status: 'k8s-job-created' };
}, REDIS_OPT);

βœ… Benefit: BullMQ handles scheduling; K8s handles resource isolation and auto-cleanup.


πŸ”‘ Redis’s Critical Role

1. Job Persistence

Jobs are stored in Redis. If a worker crashes mid-execution:

  • Job status remains active in Redis
  • On restart, BullMQ auto-retries stalled jobs
  • No job metadata loss beyond the configured persistence window

2. Distributed Locking

With multiple workers competing for jobs, Redis ensures:

  • Atomic job acquisition β€” Only one worker gets each job
  • No duplicate execution β€” Race conditions eliminated via BRPOPLPUSH / Stream consumer groups

⚠️ Production Caveats

Understanding BullMQ’s Redis Usage

ℹ️ BullMQ uses Redis List + Sorted Set, not Redis Stream. It relies on BRPOPLPUSH (or BLMOVE in Redis 6.2+) for reliable job acquisition.

Feature List-based (BullMQ) Stream (XADD/XREADGROUP)
Consumer Groups βœ… Implemented by BullMQ βœ… Native
Message Acknowledgment βœ… Implemented by BullMQ βœ… Native (XACK)
Retry/Pending Tracking βœ… Via Sorted Set βœ… Built-in
Maturity βœ… Battle-tested βœ… Newer (Redis 5.0+)

πŸ‘‰ If building a custom queue from scratch, Redis Stream provides native consumer groups and acknowledgment. However, BullMQ’s List-based implementation is production-proven and handles edge cases that raw Stream operations don’t cover out of the box.

Redis Persistence Matters

  • RDB only β†’ Jobs created in the last few minutes may be lost on crash
  • AOF with everysec β†’ At most 1 second of job loss
  • AOF with always β†’ Durability guaranteed, but slower writes

Choose based on your job criticality.

Worker Crash β‰  Job Loss (But Needs Config)

BullMQ has stalled job detection, but you must configure:

  • stalledInterval β€” How often to check for stalled jobs
  • maxStalledCount β€” How many times to retry before marking as failed

Without this, crashed jobs may hang in active state forever.

⚠️ K8s Job State Mismatch: If the worker exits after spawning a K8s Job but before updating BullMQ job state, the two systems will be out of sync. Consider implementing reconciliation logic.


πŸ“ Summary

  • Decoupling β€” API responds instantly; heavy work happens asynchronously
  • Resilience β€” Redis persists job state; crashes don’t lose work
  • Scalability β€” Add more workers to increase throughput
  • Protection β€” Throttling prevents downstream overload

The Producer-Consumer pattern shifts time and resource pressure from your API layer to dedicated workersβ€”keeping your core service lightweight and responsive.


References

comments powered by Disqus