PM2 is the de facto process manager for Node.js. We’ve run it in production for 6+ months managing a backend with 1M+ job listings, 7 queue workers, and scheduled jobs. Here are the gotchas.
Gotcha #1: PM2 Caches Environment Variables
# You change .env
echo "NEW_API_KEY=abc123" >> .env
# You restart PM2
pm2 restart all
# NEW_API_KEY is NOT loaded. PM2 still uses the old env.
Fix: Use --update-env:
pm2 restart all --update-env
Or better: don’t use .env in production. Use a secrets manager.
Gotcha #2: Cluster Mode + Scheduled Jobs = Duplicate Execution
In cluster mode, PM2 spawns multiple processes. If your app has a cron job:
cron.schedule('0 3 * * *', () => importJobs());
It runs in every process. 4 cluster instances = 4 simultaneous imports.
Fix: Use instance ID to run cron only in the primary:
const instanceId = parseInt(process.env.NODE_APP_INSTANCE || '0');
if (instanceId === 0) {
cron.schedule('0 3 * * *', () => importJobs());
}
Gotcha #3: Graceful Shutdown
When PM2 restarts, it sends SIGINT. If your app doesn’t handle it, in-progress jobs get killed mid-execution.
process.on('SIGINT', async () => {
console.log('Graceful shutdown...');
await queue.close(); // Finish current job
await pool.end(); // Close DB connections
process.exit(0);
});
Without this, queue jobs get marked as “failed” with “Auto-cancelled” errors, and your health monitor reports false failures.
Gotcha #4: Memory Leaks Are Silent
PM2 doesn’t restart on memory leaks by default. Your process slowly eats RAM until OOM.
// ecosystem.config.js
module.exports = {
apps: [{
name: 'backend',
script: 'dist/index.js',
max_memory_restart: '1500M', // Restart if > 1.5GB
kill_timeout: 10000, // 10s for graceful shutdown
}]
};
Gotcha #5: Log Rotation
PM2 logs grow forever by default. On a 30GB disk, this matters.
pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 50M
pm2 set pm2-logrotate:retain 5
Or just pm2 flush when disk gets low.
Our Production Config
module.exports = {
apps: [{
name: 'misujob-backend',
script: 'dist/index.js',
instances: 1, // Single instance (we handle concurrency via queues)
max_memory_restart: '1500M',
kill_timeout: 10000,
env: {
NODE_ENV: 'production',
}
}]
};
We use a single instance (not cluster) because our concurrency comes from Bull queue workers, not Express request handling.
This setup runs MisuJob — months of uptime, zero data loss.
What process manager do you use? PM2, systemd, Docker? Share your setup.

