Our Node.js Server Ate 11.5 GB of Disk Space While We Slept

Monday morning. Everything is down. SSH into the VM: disk 100% full. PM2 can’t restart. Logs can’t write. Even ls is slow.

The culprit: 18 Node.js core dump files at 2.5 GB each. Generated silently at 2 AM during OOM crashes.

What Happened

Our scheduled jobs run at 2-3 AM (imports, dedup, match recalculation). Occasionally, a heavy batch job exceeds the 2GB RAM limit and crashes with OOM. When this happens, Linux generates a core dump - a snapshot of the entire process memory.

Each dump: ~2.5 GB. Our disk: 30 GB.

After a week of nightly OOM crashes: 18 dumps x 2.5 GB = 11.5 GB gone. Combined with normal logs and application data, the disk hit 100%.

The Fix (30 Seconds)

# Disable core dumps permanently
echo 'kernel.core_pattern=/dev/null' | sudo tee /etc/sysctl.d/50-no-coredumps.conf
sudo sysctl -p /etc/sysctl.d/50-no-coredumps.conf

# Set ulimit for all users
echo 'ulimit -c 0' | sudo tee /etc/profile.d/no-coredumps.sh

# Clean up existing dumps
rm -f ~/app/core.*

Why You Should Do This Too

If you run Node.js on a small VM (< 30 GB disk) with any process that could OOM:

Core dumps are enabled by default on most Linux distros
A single dump equals your entire process memory
They accumulate silently until your disk is full
You’ll never debug a 2.5 GB binary dump anyway

Disk Cleanup Checklist

If your VM disk is full, check these in order:

# 1. What's eating space?
du -h --max-depth=2 /home /var /tmp 2>/dev/null | sort -rh | head -20

# 2. Core dumps
find / -name "core.*" -size +100M 2>/dev/null

# 3. Snap cache (Ubuntu)
sudo rm -rf /var/lib/snapd/cache/*

# 4. Journal logs
sudo journalctl --vacuum-size=100M

# 5. PM2 logs
pm2 flush

This happened at MisuJob, where our background jobs process 10,000+ items nightly. Since disabling core dumps: zero disk emergencies.

What’s the weirdest production incident you’ve debugged? Comments below.