Besides the file-based caching causing high disk I/O, adjusting the PM2 configuration appeared to be a key factor in resolving the socket hang-up issue.

Starting on May 22, SwimStandards.com began experiencing major slowness — high server load, “socket hang up” errors in the Next.js logs, and sluggish API responses. The issue initially seemed to coincide with my integration of new ad scripts (specifically a CMP script added to the header), which I suspected might be blocking Next.js SSR and delaying downstream API calls. But even after removing the script, the issue returned — confirming it wasn’t the root cause.

That kicked off a full-scale meltdown diagnosis and optimization effort. I:

Tuned Axios timeout settings (from 10s to 30s),
Blocked abusive bots and IPs,
Investigated MongoDB slow queries and reviewed indexes,
Limited $skip usage in paginated queries,
Hid expensive features like team records,
Tweaked Apache settings (MaxConnectionsPerChild=1000, KeepAlive, etc.),
And tested persistent socket settings (keep-alive, agent pooling).

But the actual root cause?

I had enabled meet result caching on May 12 — which generated .json files into the .next/.cache folder using aggregation-heavy requests. These files quickly piled up (over 15,000), and because both bots and users accessed them constantly, the intense disk I/O (up to 5MB/s) crippled performance. Even when I trimmed the cache down to 1,000 files, the slowness persisted. The fix? I completely disabled the meet result cache — and everything immediately stabilized.

Along the way, I also fixed a long-standing bug in the FeathersJS setModel logic and improved database hygiene. I now actively monitor MongoDB slow query logs, netstat, and system resource usage to stay ahead of future regressions.

Below are all the shell commands and diagnostic tools I used throughout this painful but productive journey.

✅ Server Load & Traffic Debugging Checklist

1. 🔍 Check for suspicious or bot activity

Look for bots targeting /.env or other known paths

sudo grep '/.env' /var/log/apache2/access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head

Check top IPs accessing any vhost

sudo awk '{print $2}' /var/log/apache2/other_vhosts_access.log | sort | uniq -c | sort -nr | head -10

Check top IPs in the last 20 minutes

sudo awk -v d="$(date -d '20 minutes ago' '+%d/%b/%Y:%H:%M')" '$0 ~ d,0' /var/log/apache2/other_vhosts_access.log | awk '{print $2}' | sort | uniq -c | sort -nr | head -20

2. 🚫 Block abusive IPs using iptables

List all existing rules with line numbers

sudo iptables -L INPUT -n --line-numbers

Block IP

sudo iptables -I INPUT 1 -s <IP_ADDRESS> -j DROP

Unblock IP

sudo iptables -D INPUT -s <IP_ADDRESS> -j DROP

3. 📊 Monitor PM2 logs and app errors

pm2 logs        # Real-time logs
pm2 status      # Check memory/CPU per process

4. 🐢 Identify slow MongoDB queries (on DB server)

sudo grep -i "slow query" /var/log/mongodb/mongod.log | jq -c 'select(.attr.durationMillis > 800)' > slow.log

5. 🌐 Monitor TCP socket load (Next.js <-> Feathers.js port)

watch -n 1 "netstat -anp | grep :5052 | wc -l"

6. 🔗 Check HTTPS connection states

sudo netstat -anp | grep :443 | awk '{print $6}' | sort | uniq -c

🔸 Healthy values (approximate):

ESTABLISHED: ~40–60
TIME_WAIT: < 50 preferred (but up to ~100 is manageable)
SYN_RECV: ~200–300 is okay if from bots, but monitor for spikes

Deni Apps LLC 2015-2024 | Source Code

May Meltdown Fixes: Every Command I Ran to Battle Website SlownessAdam C. | 1 month ago