Web APIs
Deployment
Take the completed APIForge Platform API from a local machine to a live production server — environment configuration, process management, database migrations, reverse proxy, and health monitoring.
An API that only runs on localhost is not an API — it is a local script. Deployment is the step that turns working code into a live service that other systems can depend on. And it is the step where most developers discover problems that localhost never revealed: environment variables that were hardcoded, ports that conflict, processes that crash silently, databases that are not migrated, and traffic that arrives before the server is ready.
The APIForge Engineering team deploys the Platform API to a Linux VPS — a Virtual Private Server running Ubuntu 24. This is the most common deployment target for APIs at the scale this project operates at. The same process applies to any cloud provider: DigitalOcean, AWS EC2, Hetzner, Render, or Railway. The tools change slightly. The steps do not.
This lesson covers five deployment tasks in order: server setup, environment configuration, process management with PM2, Nginx as a reverse proxy, and health monitoring. Each task has a clear before and after — skip one and a specific class of production problem becomes inevitable.
Step 1 — Server Setup and Dependencies
A fresh Ubuntu server needs three things before the API can run: Node.js, PostgreSQL, and Redis. The APIForge team installs Node.js via nvm — the Node Version Manager — rather than the system package manager. System packages lag behind the official releases and make version switching difficult. nvm lets you pin the project to an exact Node version and switch cleanly when the next LTS drops.
PostgreSQL and Redis run as system services managed by systemd — they start automatically on boot, before the Node process comes up. The API server depends on both being available. If either is down when the Node process starts, the boot-time validation from Phase 1 crashes the server with a clear error rather than starting in a broken state.
Run these commands on the fresh Ubuntu server as a non-root user with sudo access. The deployment user should not be root — running a Node process as root means a compromised process has full system access.
# WHAT: APIForge server setup — Ubuntu 24 LTS
# Run as deploy user (not root) with sudo privileges
# ── System update ──────────────────────────────────────────────────────────
sudo apt update && sudo apt upgrade -y
# ── Install Node.js 20 LTS via nvm ────────────────────────────────────────
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
source ~/.bashrc
nvm install 20
nvm use 20
nvm alias default 20
node --version # v20.x.x
# ── Install PostgreSQL 16 ──────────────────────────────────────────────────
sudo apt install -y postgresql postgresql-contrib
sudo systemctl enable postgresql
sudo systemctl start postgresql
# Create database and user
sudo -u postgres psql -c "CREATE USER apiforge WITH PASSWORD 'strong_password_here';"
sudo -u postgres psql -c "CREATE DATABASE apiforge_prod OWNER apiforge;"
# ── Install Redis 7 ────────────────────────────────────────────────────────
sudo apt install -y redis-server
sudo systemctl enable redis-server
sudo systemctl start redis-server
# ── Install PM2 globally ──────────────────────────────────────────────────
npm install -g pm2
# ── Clone the project ─────────────────────────────────────────────────────
cd /home/deploy
git clone https://github.com/apiforge-team/platform-api.git
cd platform-api
npm install --production
# ── Run database schema ───────────────────────────────────────────────────
DATABASE_URL="postgresql://apiforge:strong_password_here@localhost/apiforge_prod" \
psql $DATABASE_URL -f db/schema.sqlnpm install --production installs only the dependencies listed under dependencies in package.json — not devDependencies. On the production server you do not need nodemon, test runners, or linters. Skipping them reduces the installed package count, the attack surface, and the disk usage.
sudo systemctl enable sets both services to start automatically when the server reboots. Without this, a server restart after a kernel update or a power interruption brings the server back online without PostgreSQL or Redis — and your API crashes with a cryptic connection error until someone SSHs in to start them manually.
sudo systemctl status postgresql redis-server and confirm both show "active (running)". If either shows "failed", the journal log at journalctl -u postgresql will tell you exactly why.
Step 2 — Production Environment Configuration
The .env file from development never goes to production. Production secrets — database passwords, JWT signing keys, AWS credentials — must never touch source control. The APIForge team uses a dedicated .env.production file created directly on the server, populated manually, and owned by the deploy user with permissions set so only that user can read it.
Two values change significantly from development to production: NODE_ENV=production tells Express to disable detailed error output and enable performance optimisations, and JWT_SECRET must be a cryptographically random 64-character string — not the placeholder from development.
# WHAT: APIForge production environment configuration
# Run on the production server — never committed to git
# ── Generate a secure JWT secret ──────────────────────────────────────────
# openssl outputs a 64-byte hex string — suitable for JWT signing
openssl rand -hex 64
# output: 9f3a2b8c4d1e6f7a0b5c3d2e1f4g5h6i7j8k9l0m1n2o3p4q5r6s7t8u9v0w1x2y3z
# ── Create the production .env file ───────────────────────────────────────
cat > /home/deploy/platform-api/.env.production << 'EOF'
NODE_ENV=production
PORT=3000
# Database
DATABASE_URL=postgresql://apiforge:strong_password_here@localhost:5432/apiforge_prod
# Auth — use the openssl output above, never the dev placeholder
JWT_SECRET=9f3a2b8c4d1e6f7a0b5c3d2e1f4g5h6i7j8k9l0m1n2o3p4q5r6s7t8u9v0w1x2y3z
JWT_EXPIRES_IN=2h
JWT_REFRESH_EXPIRES_IN=7d
# Redis
REDIS_URL=redis://127.0.0.1:6379
# AWS S3
AWS_REGION=ap-south-1
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
S3_BUCKET=apiforge-attachments-prod
# Search
MEILISEARCH_URL=http://127.0.0.1:7700
MEILISEARCH_KEY=masterKey_replace_this
# Webhooks
WEBHOOK_SIGNING_SECRET=replace_with_32_char_random_string
# Versioning
API_VERSION=2024-11-01
EOF
# ── Lock down file permissions — only deploy user can read ────────────────
chmod 600 /home/deploy/platform-api/.env.production
# Verify permissions
ls -la /home/deploy/platform-api/.env.production
# -rw------- 1 deploy deploy 612 Nov 14 10:00 .env.productionchmod 600 sets the file to owner-read-write only. Mode 600 means: owner can read and write, group has no permissions, others have no permissions. If another process on the server runs as a different user — a compromised web scraper, a misconfigured cron job — it cannot read this file. Secrets at rest deserve the same protection as secrets in transit.
The REDIS_URL uses 127.0.0.1 rather than localhost. On some systems localhost resolves via IPv6 to ::1 while Redis is only listening on the IPv4 loopback. Using the explicit IPv4 address avoids a confusing connection failure that looks like Redis is down when it is actually just on a different address.
cat /home/deploy/platform-api/.env.production as a different user and confirm you get "Permission denied". That is the file protection working correctly.
Step 3 — Process Management with PM2
Running node src/server.js directly in a terminal works on localhost. On a production server, that process dies the moment the terminal session closes, the server reboots, or the Node process crashes. PM2 is a production process manager for Node.js that keeps the process running, restarts it on crash, starts it on boot, and captures its stdout and stderr logs to rotating files.
PM2 is configured using an ecosystem file — a JavaScript or JSON file that specifies the app name, script path, environment variables, and restart behaviour. The ecosystem file is committed to the repository (without secrets) so the deployment process is reproducible on any server.
The ecosystem file references the .env.production file for environment variables. PM2 loads it before starting the process — no secrets need to be in the ecosystem file itself.
// WHAT: APIForge PM2 ecosystem configuration
// File: ecosystem.config.cjs (committed to repo — no secrets)
// PM2 starts, restarts, and monitors the API process
module.exports = {
apps: [
{
name: 'apiforge-api',
script: 'src/server.js',
instances: 'max', // one instance per CPU core
exec_mode: 'cluster', // PM2 cluster mode — shared port, load balanced
env_file: '.env.production',
// Restart behaviour
watch: false, // do not watch files in production
max_memory_restart: '512M', // restart if memory exceeds 512MB
restart_delay: 3000, // wait 3s before restarting after crash
max_restarts: 10, // stop restarting after 10 consecutive crashes
// Logging
out_file: '/var/log/apiforge/out.log',
error_file: '/var/log/apiforge/error.log',
log_date_format: 'YYYY-MM-DD HH:mm:ss',
merge_logs: true, // combine cluster instance logs into one file
// Graceful shutdown — wait for in-flight requests before exiting
kill_timeout: 5000,
wait_ready: true, // wait for process.send('ready') before marking as started
},
],
};
// ── server.js addition — signal PM2 when server is ready ──────────────────
// Add this line inside app.listen() callback in server.js:
// if (process.send) process.send('ready');Cluster mode starts one process per CPU core and puts them all behind a shared port. Incoming connections are distributed across instances by PM2's built-in load balancer. A 2-core server running cluster mode handles roughly twice the concurrent requests of a single-instance setup — and if one instance crashes, the others keep serving traffic while PM2 restarts the crashed one.
pm2 reload performs a zero-downtime reload: it starts new instances, waits for them to signal ready via process.send('ready'), then gracefully shuts down the old instances. No request is dropped. Compare this to a plain restart which closes the port immediately, rejects incoming connections for a few seconds, then reopens. For a public API, that gap matters.
pm2 monit and watch CPU and memory per instance in real time. Then hit the API with a burst of requests and watch the load distribute across instances.
Step 4 — Nginx Reverse Proxy and HTTPS
The Node process listens on port 3000. Clients should not connect to port 3000 — they should connect to port 443 (HTTPS). Nginx sits in between: it accepts HTTPS connections on port 443, terminates the TLS encryption, and forwards the plain HTTP request to Node on port 3000. This is called a reverse proxy.
Running TLS termination in Nginx rather than Node has two advantages. First, Nginx handles TLS in C — significantly more efficiently than Node's JavaScript TLS stack under high concurrency. Second, Nginx can be reloaded with a new certificate without restarting the Node process at all, which means certificate renewals are completely transparent to running traffic.
# WHAT: APIForge Nginx configuration + free SSL certificate via Certbot
# ── Install Nginx and Certbot ──────────────────────────────────────────────
sudo apt install -y nginx certbot python3-certbot-nginx
# ── Create Nginx server block ──────────────────────────────────────────────
sudo tee /etc/nginx/sites-available/apiforge << 'EOF'
server {
listen 80;
server_name api.apiforge.dev;
# Certbot will add the HTTPS block below after certificate issuance
# For now, redirect HTTP to HTTPS
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name api.apiforge.dev;
# SSL certificate (filled in by Certbot)
ssl_certificate /etc/letsencrypt/live/api.apiforge.dev/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.apiforge.dev/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
# Security headers
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains";
# Proxy to Node.js
location / {
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection keep-alive;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
# Timeouts
proxy_connect_timeout 10s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
# Body size limit (50MB — matches file upload limit)
client_max_body_size 50M;
}
}
EOF
# ── Enable site and test config ────────────────────────────────────────────
sudo ln -s /etc/nginx/sites-available/apiforge /etc/nginx/sites-enabled/
sudo nginx -t # test config before applying
sudo systemctl reload nginx
# ── Issue SSL certificate via Let's Encrypt ───────────────────────────────
sudo certbot --nginx -d api.apiforge.dev --non-interactive --agree-tos -m ops@apiforge.dev
# Certbot automatically renews certificates before they expire
# Verify auto-renewal is configured:
sudo certbot renew --dry-runThe X-Forwarded-For and X-Forwarded-Proto headers tell the Node process the real client IP and the original protocol. Without them, every request appears to come from 127.0.0.1 (the Nginx server itself) and every request looks like HTTP even though the client used HTTPS. The rate limiter uses the client IP from X-Forwarded-For — without this header, every client shares the same rate limit counter.
Strict-Transport-Security tells browsers to only connect to this domain over HTTPS for the next year — even if the user types http://. The browser enforces HTTPS locally before any network request is made. This header is why you see HTTPS enforced on banking sites even when you deliberately type HTTP.
Step 5 — Health Monitoring and Alerting
A deployed API that nobody is watching is a production incident waiting to happen quietly. The APIForge team sets up two layers of monitoring: a lightweight health check script that runs every minute and alerts the team if the API stops responding, and structured PM2 log monitoring that flags error spikes.
The health check script hits GET /health and checks two things: the HTTP status code (must be 200) and the response body (must contain status: "ok"). If either check fails three times in a row, it sends an alert to the team's Slack channel using an incoming webhook — the same outbound webhook pattern from Lesson 35, but pointed at Slack instead of a custom endpoint.
// WHAT: APIForge health monitor — runs as a cron job every minute
// File: /home/deploy/monitor/health-check.js
// Alerts Slack channel if API fails 3 consecutive checks
const API_URL = 'https://api.apiforge.dev/health';
const SLACK_WEBHOOK = process.env.SLACK_ALERT_WEBHOOK;
const STATE_FILE = '/tmp/apiforge-health-state.json';
const THRESHOLD = 3; // alert after this many consecutive failures
import { readFileSync, writeFileSync } from 'fs';
import { existsSync } from 'fs';
async function checkHealth() {
const state = existsSync(STATE_FILE)
? JSON.parse(readFileSync(STATE_FILE, 'utf8'))
: { failures: 0, alerted: false };
try {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 5000); // 5s timeout
const response = await fetch(API_URL, { signal: controller.signal });
clearTimeout(timeout);
const body = await response.json();
const ok = response.status === 200 && body.status === 'ok';
if (ok) {
// Recovery — reset failure count and send recovery alert if previously alerted
if (state.alerted) {
await sendSlackAlert('API recovered', `api.apiforge.dev is responding normally.`, 'good');
}
writeFileSync(STATE_FILE, JSON.stringify({ failures: 0, alerted: false }));
console.log(`[${new Date().toISOString()}] Health check: OK`);
} else {
throw new Error(`Unexpected response: ${response.status} ${JSON.stringify(body)}`);
}
} catch (err) {
const failures = state.failures + 1;
console.error(`[${new Date().toISOString()}] Health check failed (${failures}): ${err.message}`);
if (failures >= THRESHOLD && !state.alerted) {
await sendSlackAlert(
'API DOWN',
`api.apiforge.dev has failed ${failures} consecutive health checks.\nError: ${err.message}`,
'danger'
);
writeFileSync(STATE_FILE, JSON.stringify({ failures, alerted: true }));
} else {
writeFileSync(STATE_FILE, JSON.stringify({ failures, alerted: state.alerted }));
}
}
}
async function sendSlackAlert(title, message, color) {
if (!SLACK_WEBHOOK) return;
await fetch(SLACK_WEBHOOK, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
attachments: [{ color, title, text: message, ts: Math.floor(Date.now() / 1000) }]
}),
});
}
checkHealth();The three-failure threshold prevents false alarms from single transient failures — a momentary network hiccup, a slow response during a garbage collection pause. A single failure triggers a warning in the log but not a Slack alert. Three in a row indicates a real problem. The threshold is tunable — a payment API might alert on two failures, a non-critical reporting API might wait for five.
The recovery alert is as important as the outage alert. Without it, the team sees "API DOWN" in Slack at 3am, loses sleep, and never gets confirmation the issue resolved. A recovery message closes the loop — the team knows the system is back without manually checking. Good alerting has a matching "all clear" for every "we have a problem".
Try this: Manually stop the PM2 process withpm2 stop apiforge-api, wait 3 minutes, and confirm the Slack alert fires. Then start it again with pm2 start apiforge-api and confirm the recovery alert arrives.
Before and After: localhost vs Production
| Deployment Component | Purpose | What Breaks Without It |
|---|---|---|
| nvm + Node 20 LTS | Pinned Node version, easy upgrades, no system package lag | Version mismatch errors when OS updates Node underneath you |
| systemctl enable | PostgreSQL and Redis start automatically on server boot | API crashes after every server reboot until someone SSHs in |
| chmod 600 .env | Secrets file readable only by the deploy user | Any process on the server can read database passwords and JWT secrets |
| PM2 cluster mode | One process per CPU, auto-restart on crash, zero-downtime reload | Single process uses one core, crashes end the service |
| Nginx + Certbot | HTTPS termination, HSTS, auto-renewing Let's Encrypt certificate | Credentials and tokens sent in plain text, browser security warnings |
| Health monitor | Detects outages within 3 minutes, alerts team, confirms recovery | API can be down for hours before anyone notices |
Quiz
1. The APIForge ecosystem.config.cjs sets instances to "max" and exec_mode to "cluster". What does this configuration achieve compared to running a single Node.js process?
2. The APIForge Nginx config sets proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for. What breaks in the Node.js application if this header is omitted?
3. The APIForge health monitor only sends a Slack alert after three consecutive failures rather than the first failure. What problem does this threshold prevent?