[Redis Beyond Caching] Part 6: DevOps State Management with ChatOps

How to coordinate team deployments and staging environment claims using Redis as a lightweight state centerβ€”no database required.

🀝 The Problem: Team Coordination

Deployment Conflicts

Without coordination:

  • Developer A deploys to production
  • Developer B deploys at the same time
  • Result: Partial deployment, rollback chaos, or worseβ€”silent failures

Staging Environment Contention

Most teams have limited staging environments:

  • “Who’s using staging right now?”
  • “Can I push my branch?”
  • “Is anyone still testing?”

Slack/Mattermost messages get lost. Spreadsheets are never updated. Chaos ensues.


πŸ”§ Why Redis (Not a Database)?

Requirement SQL Redis
Simple key-value state Overkill βœ… Perfect fit
TTL auto-expiration Manual cleanup job βœ… Built-in
Fast iteration Schema migrations βœ… Schemaless
ChatBot integration ORM complexity βœ… Direct commands

For ephemeral coordination state, Redis is the right tool.


πŸ’» Implementation: Deploy Lock

Data Model

1
2
3
4
5
6
7
8
9
// Each service gets its own lock key
const getLockKey = (service: string) => `deploy_lock:${service}`;

interface DeployLock {
  locked_by: string;
  service: string;
  commit: string;
  timestamp: number;
}

Lock Acquisition (Atomic with SETNX)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
async function lockProduction(userId: string, service: string, commit: string) {
  const key = getLockKey(service);
  const value = JSON.stringify({
    locked_by: userId,
    service,
    commit,
    timestamp: Date.now(),
  });
  
  // SETNX: atomic "set if not exists" + TTL
  const acquired = await redis.set(key, value, 'NX', 'EX', 3600); // 1 hour TTL
  
  if (!acquired) {
    const current = await redis.get(key);
    const lock = JSON.parse(current!) as DeployLock;
    throw new Error(`Production ${service} is locked by ${lock.locked_by} since ${new Date(lock.timestamp).toISOString()}`);
  }
  
  return JSON.parse(value) as DeployLock;
}

Lock Release

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
async function unlockProduction(userId: string, service: string) {
  const key = getLockKey(service);
  const current = await redis.get(key);
  
  if (!current) {
    throw new Error(`Production ${service} is not locked`);
  }
  
  const lock = JSON.parse(current) as DeployLock;
  
  // Only the lock holder can unlock
  if (lock.locked_by !== userId) {
    throw new Error(`Only ${lock.locked_by} can unlock ${service}`);
  }
  
  await redis.del(key);
}

πŸ’» Implementation: Staging Environment Claim

Data Model (Using Redis Hash)

1
2
3
4
5
6
7
8
// Redis Hash key - each repo is a field, avoiding JSON overwrite race condition
const STAGING_HKEY = 'staging_status';

interface StagingClaim {
  claimed_by: string;
  branch: string;
  claimed_at: string;
}

Claim/Release Operations

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
async function claimStaging(repo: string, userId: string, branch: string) {
  // Check current claim (atomic read of single field)
  const current = await redis.hget(STAGING_HKEY, repo);
  
  if (current) {
    const claim = JSON.parse(current) as StagingClaim;
    if (claim.claimed_by && claim.claimed_by !== userId) {
      throw new Error(`Staging ${repo} is claimed by ${claim.claimed_by}`);
    }
  }
  
  // Claim the environment (atomic write to single field)
  const newClaim: StagingClaim = {
    claimed_by: userId,
    branch,
    claimed_at: new Date().toISOString(),
  };
  
  await redis.hset(STAGING_HKEY, repo, JSON.stringify(newClaim));
  return newClaim;
}

async function releaseStaging(repo: string, userId: string) {
  const current = await redis.hget(STAGING_HKEY, repo);
  
  if (!current) {
    throw new Error(`Staging ${repo} has no claim`);
  }
  
  const claim = JSON.parse(current) as StagingClaim;
  
  if (claim.claimed_by !== userId) {
    throw new Error(`Only ${claim.claimed_by} can release ${repo}`);
  }
  
  await redis.hdel(STAGING_HKEY, repo);
}

async function listStaging(): Promise<Record<string, StagingClaim | null>> {
  const raw = await redis.hgetall(STAGING_HKEY);
  const result: Record<string, StagingClaim | null> = {};
  
  for (const [repo, value] of Object.entries(raw)) {
    result[repo] = value ? JSON.parse(value) : null;
  }
  
  return result;
}

βœ… Why Hash? Each HSET only updates one field, so Alice claiming backend won’t overwrite Bob’s frontend claimβ€”even if they run simultaneously.


πŸ’» Implementation: Job Status Tracking

Using Redis Hash for Structured Data

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
const JOB_STATUS_HKEY = 'job_status';

// Track a K8s job
async function updateJobStatus(jobName: string, status: object) {
  await redis.hset(JOB_STATUS_HKEY, jobName, JSON.stringify(status));
}

// Get all jobs
async function getAllJobs(): Promise<Record<string, object>> {
  const raw = await redis.hgetall(JOB_STATUS_HKEY);
  const result: Record<string, object> = {};
  
  for (const [key, value] of Object.entries(raw)) {
    result[key] = JSON.parse(value);
  }
  
  return result;
}

// Remove completed job
async function removeJob(jobName: string) {
  await redis.hdel(JOB_STATUS_HKEY, jobName);
}

πŸ€– ChatOps Integration

Mattermost/Slack Command Flow

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
User: /deploy lock backend
                    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           ChatBot Server            β”‚
β”‚                                     β”‚
β”‚  1. Parse command                   β”‚
β”‚  2. Read/Write Redis                β”‚
β”‚  3. Format response                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   ↓
             Redis State
                   ↓
User: βœ… Production locked by @alice for backend (commit abc123)

Express Handler Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
app.post('/deploy', async (req, res) => {
  const { text, user_id } = req.body;
  const [action, service] = text.split(' ');
  
  try {
    switch (action) {
      case 'lock':
        const status = await lockProduction(user_id, service, 'latest');
        return res.json({
          response_type: 'in_channel',
          text: `πŸ”’ Production locked by <@${user_id}> for ${service}`,
        });
        
      case 'unlock':
        await unlockProduction(user_id);
        return res.json({
          response_type: 'in_channel',
          text: `πŸ”“ Production unlocked by <@${user_id}>`,
        });
        
      case 'status':
        const current = await redis.get(PROD_STATUS_KEY);
        const info = current ? JSON.parse(current) : { locked_by: 'unlocked' };
        return res.json({
          response_type: 'in_channel',
          text: `πŸ“Š Production: ${info.locked_by === 'unlocked' ? 'Available' : `Locked by ${info.locked_by}`}`,
        });
        
      default:
        return res.json({ text: 'Unknown command. Try: lock, unlock, status' });
    }
  } catch (error) {
    return res.json({ text: `❌ ${error.message}` });
  }
});

⚠️ Production Caveats

Race Condition Awareness

The claimStaging function still has a GET-then-SET window for the β€œsame repo” case. For truly atomic claim acquisition, use a Lua script or accept that concurrent claims to the same repo may have a small race window.

For deploy locks, we use SETNX which is fully atomicβ€”no race condition.

TTL for Auto-release

Add TTL to prevent forgotten locks:

1
2
// Lock expires after 1 hour
await redis.set(key, value, 'EX', 3600);

If someone forgets to unlock, the lock auto-releases.

No ACID Guarantees

Redis operations on different keys are not transactional. If you need to update multiple states atomically, use:

  • Lua scripts
  • Single-key JSON structure
  • Accept eventual consistency

πŸ“ Summary

  • Deploy Lock β€” Prevent concurrent deployments with simple SET/GET
  • Staging Claim β€” Track who’s using which environment
  • Job Status β€” Hash structure for K8s job tracking
  • ChatOps β€” Direct integration with Slack/Mattermost

Redis serves as a lightweight coordination layer for DevOps workflowsβ€”no database schema, no ORM, just fast key-value operations with optional TTL.


References

comments powered by Disqus