PM2 in Production: The Lessons Nobody Tells You

PM2 is the de facto process manager for Node.js. We’ve run it in production for 6+ months managing a backend with 1M+ job listings, 7 queue workers, and scheduled jobs. Here are the gotchas.

Gotcha #1: PM2 Caches Environment Variables

# You change .env
echo "NEW_API_KEY=abc123" >> .env

# You restart PM2
pm2 restart all

# NEW_API_KEY is NOT loaded. PM2 still uses the old env.

Fix: Use --update-env:

pm2 restart all --update-env

Or better: don’t use .env in production. Use a secrets manager.

Gotcha #2: Cluster Mode + Scheduled Jobs = Duplicate Execution

In cluster mode, PM2 spawns multiple processes. If your app has a cron job:

cron.schedule('0 3 * * *', () => importJobs());

It runs in every process. 4 cluster instances = 4 simultaneous imports.

Fix: Use instance ID to run cron only in the primary:

const instanceId = parseInt(process.env.NODE_APP_INSTANCE || '0');
if (instanceId === 0) {
  cron.schedule('0 3 * * *', () => importJobs());
}

Gotcha #3: Graceful Shutdown

When PM2 restarts, it sends SIGINT. If your app doesn’t handle it, in-progress jobs get killed mid-execution.

process.on('SIGINT', async () => {
  console.log('Graceful shutdown...');
  await queue.close(); // Finish current job
  await pool.end();    // Close DB connections
  process.exit(0);
});

Without this, queue jobs get marked as “failed” with “Auto-cancelled” errors, and your health monitor reports false failures.

Gotcha #4: Memory Leaks Are Silent

PM2 doesn’t restart on memory leaks by default. Your process slowly eats RAM until OOM.

// ecosystem.config.js
module.exports = {
  apps: [{
    name: 'backend',
    script: 'dist/index.js',
    max_memory_restart: '1500M', // Restart if > 1.5GB
    kill_timeout: 10000, // 10s for graceful shutdown
  }]
};

Gotcha #5: Log Rotation

PM2 logs grow forever by default. On a 30GB disk, this matters.

pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 50M
pm2 set pm2-logrotate:retain 5

Or just pm2 flush when disk gets low.

Our Production Config

module.exports = {
  apps: [{
    name: 'misujob-backend',
    script: 'dist/index.js',
    instances: 1, // Single instance (we handle concurrency via queues)
    max_memory_restart: '1500M',
    kill_timeout: 10000,
    env: {
      NODE_ENV: 'production',
    }
  }]
};

We use a single instance (not cluster) because our concurrency comes from Bull queue workers, not Express request handling.

This setup runs MisuJob — months of uptime, zero data loss.

What process manager do you use? PM2, systemd, Docker? Share your setup.