[Redis Beyond Caching] Part 1: Asynchronous Job Queue with BullMQ

How to handle one-time and recurring heavy tasks without blocking your API server—using BullMQ and Redis.

🔥 The Problem: Why APIs Can’t Do Heavy Lifting

Node.js Event Loop Blocking

Node.js runs on a single-threaded event loop. If your API handler runs a 5-minute report generation synchronously, every other request queues up behind it. Your entire server becomes unresponsive.

HTTP Timeout Constraints

Most API gateways and browsers enforce timeout limits (30-60 seconds). Long-running tasks will:

Fail silently from the client’s perspective
Leave data in inconsistent states
Frustrate users with no feedback

Traffic Spike Risk

Without throttling, a sudden influx of heavy requests can spawn too many concurrent processes, exhausting CPU/memory and crashing your service.

📦 What is BullMQ?

BullMQ is a Node.js job queue library built on top of Redis. It’s the successor to Bull (the most popular Redis-based queue for Node.js) with a complete rewrite for better performance and TypeScript support.

Why Not Build Your Own Queue?

You could use raw Redis commands (LPUSH/BRPOP) to build a queue. But production-grade queues need:

✅ Retry with backoff — Failed jobs retry with exponential delay
✅ Delayed jobs — Schedule jobs to run in the future
✅ Rate limiting — Control job throughput
✅ Job prioritization — Critical jobs jump the queue
✅ Stalled job detection — Recover from worker crashes
✅ Event hooks — Track job lifecycle (completed, failed, progress)
✅ Repeatable jobs — Cron-like scheduling

BullMQ handles all of this out of the box. Rolling your own is reinventing the wheel—and you’ll likely miss edge cases that BullMQ has already solved.

🏗️ Architecture

The system uses a Producer-Consumer model with BullMQ as the orchestration layer. For heavy tasks, the worker spawns short-lived K8s Job pods.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


┌──────────────────┐     ┌─────────────────────────────────────────────┐
│   API Server     │     │           Scheduler Service                 │
│   (Producer)     │     │   ┌─────────────┐    ┌─────────────┐        │
│                  │     │   │ QueueSched. │    │   Worker    │        │
└────────┬─────────┘     │   └──────┬──────┘    └──────┬──────┘        │
         │               └──────────┼──────────────────┼───────────────┘
         │ push job                 │                  │
         ▼                          ▼                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         Redis (BullMQ Queue)                        │
│  • Stores job metadata & cron schedules                             │
│  • Manages queue state (waiting → active → completed)               │
└─────────────────────────────────────────────────────────────────────┘
                                   │
                                   │ Worker picks up job, spawns K8s Job
                                   ▼
                      ┌─────────────────────────┐
                      │     K8s Job Pod         │
                      │  (ephemeral, auto-exit) │
                      └─────────────────────────┘

Role	Responsibility
Producer (API Server)	Receives request, enqueues job, returns `202 Accepted`
Scheduler Service	Long-running pod with BullMQ Worker; reads cron from Redis, spawns K8s Jobs
Redis	Persists job metadata, schedules, and queue state
K8s Job Pod	Executes heavy computation, terminates after completion

Job Types

Type A: One-time Jobs

Triggered by user actions—export reports, sync external data, send batch emails.

1
2
3
4
5
6
7
8
9


// Producer: API endpoint
app.post('/export', async (req, res) => {
  const job = await exportQueue.add('generate-report', {
    userId: req.user.id,
    dateRange: req.body.dateRange,
  });
  
  res.status(202).json({ jobId: job.id });
});

Type B: Recurring Jobs (Cron)

Replace traditional Linux crontab with distributed, HA scheduling.

1
2
3
4
5
6
7
8
9


// Repeatable job: runs daily at midnight
await reportQueue.add(
  'daily-summary',
  { type: 'daily' },
  {
    repeat: { cron: '0 0 * * *' },
    jobId: 'daily-summary', // prevent duplicates
  }
);

✅ Advantage over crontab: No single scheduler process is a SPOF. If one server dies, another worker picks up the job. (Requires Redis HA for full resilience.)

Job Status Tracking

Users need visibility into long-running tasks. BullMQ provides built-in job state management and event hooks.

Updating Progress from Worker

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


// Worker: report progress during execution
const worker = new Worker('export', async (job) => {
  const totalSteps = 100;
  
  for (let i = 0; i < totalSteps; i++) {
    await processChunk(i);
    await job.updateProgress((i + 1) / totalSteps * 100);
  }
  
  return { downloadUrl: '/reports/123.csv' };
});

Querying Job Status via API

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


// API: check job status
app.get('/jobs/:id/status', async (req, res) => {
  const job = await exportQueue.getJob(req.params.id);
  
  if (!job) {
    return res.status(404).json({ error: 'Job not found' });
  }
  
  const state = await job.getState(); // waiting, active, completed, failed
  const progress = job.progress;
  const result = job.returnvalue;
  
  res.json({ state, progress, result });
});

Listening to Job Events

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


// Subscribe to job lifecycle events
import { QueueEvents } from 'bullmq';

const queueEvents = new QueueEvents('export');

queueEvents.on('progress', ({ jobId, data }) => {
  console.log(`Job ${jobId} progress: ${data}%`);
  // Push to frontend via WebSocket/SSE
});

queueEvents.on('completed', ({ jobId, returnvalue }) => {
  console.log(`Job ${jobId} completed:`, returnvalue);
});

queueEvents.on('failed', ({ jobId, failedReason }) => {
  console.error(`Job ${jobId} failed:`, failedReason);
});

Throttling & Downstream Protection

Control concurrency to protect databases and third-party APIs:

1
2
3
4
5
6
7


const worker = new Worker('export', processJob, {
  concurrency: 10, // max 10 jobs running simultaneously
  limiter: {
    max: 100,      // max 100 jobs
    duration: 60000, // per minute
  },
});

This traffic shaping prevents your worker from DDoS-ing your own database or hitting external API rate limits.

⚠️ This limits worker throughput, not global API rate limiting.

Worker as K8s Orchestrator

For heavy tasks, the worker spawns a dedicated K8s Job:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


import * as k8s from '@kubernetes/client-node';

const kc = new k8s.KubeConfig();
kc.loadFromDefault();
const batchApi = kc.makeApiClient(k8s.BatchV1Api);

const worker = new Worker('data-sync', async (job) => {
  const { source } = job.data;
  
  const k8sJob = {
    metadata: { name: `data-sync-${Date.now()}` },
    spec: {
      template: {
        spec: {
          containers: [{
            name: 'sync-worker',
            image: 'myregistry/data-sync:latest',
            env: [
              { name: 'SOURCE', value: source },
              { name: 'JOB_ID', value: job.id },
            ],
          }],
          restartPolicy: 'Never',
        },
      },
    },
  };
  
  await batchApi.createNamespacedJob('production', k8sJob);
  return { status: 'k8s-job-created' };
}, REDIS_OPT);

✅ Benefit: BullMQ handles scheduling; K8s handles resource isolation and auto-cleanup.

🔑 Redis’s Critical Role

1. Job Persistence

Jobs are stored in Redis. If a worker crashes mid-execution:

Job status remains active in Redis
On restart, BullMQ auto-retries stalled jobs
No job metadata loss beyond the configured persistence window

2. Distributed Locking

With multiple workers competing for jobs, Redis ensures:

Atomic job acquisition — Only one worker gets each job
No duplicate execution — Race conditions eliminated via BRPOPLPUSH / Stream consumer groups

⚠️ Production Caveats

Understanding BullMQ’s Redis Usage

ℹ️ BullMQ uses Redis List + Sorted Set, not Redis Stream. It relies on BRPOPLPUSH (or BLMOVE in Redis 6.2+) for reliable job acquisition.

Feature	List-based (BullMQ)	Stream (`XADD`/`XREADGROUP`)
Consumer Groups	✅ Implemented by BullMQ	✅ Native
Message Acknowledgment	✅ Implemented by BullMQ	✅ Native (`XACK`)
Retry/Pending Tracking	✅ Via Sorted Set	✅ Built-in
Maturity	✅ Battle-tested	✅ Newer (Redis 5.0+)

👉 If building a custom queue from scratch, Redis Stream provides native consumer groups and acknowledgment. However, BullMQ’s List-based implementation is production-proven and handles edge cases that raw Stream operations don’t cover out of the box.

Redis Persistence Matters

RDB only → Jobs created in the last few minutes may be lost on crash
AOF with everysec → At most 1 second of job loss
AOF with always → Durability guaranteed, but slower writes

Choose based on your job criticality.

Worker Crash ≠ Job Loss (But Needs Config)

BullMQ has stalled job detection, but you must configure:

stalledInterval — How often to check for stalled jobs
maxStalledCount — How many times to retry before marking as failed

Without this, crashed jobs may hang in active state forever.

⚠️ K8s Job State Mismatch: If the worker exits after spawning a K8s Job but before updating BullMQ job state, the two systems will be out of sync. Consider implementing reconciliation logic.

📝 Summary

Decoupling — API responds instantly; heavy work happens asynchronously
Resilience — Redis persists job state; crashes don’t lose work
Scalability — Add more workers to increase throughput
Protection — Throttling prevents downstream overload

The Producer-Consumer pattern shifts time and resource pressure from your API layer to dedicated workers—keeping your core service lightweight and responsive.