Web APIs
API Performance
Profile response bottlenecks, apply caching strategies, and load-test your API to find the breaking point before real traffic does.
Amazon calculated that every 100ms of latency costs them 1% in sales. Google found that a half-second delay in search results dropped traffic by 20%. These are not edge cases — they are the documented economic cost of a slow API. And in both companies, the culprit was almost never the application code itself. It was database queries, missing cache layers, and synchronous calls that could have run in parallel.
Performance work is not about rewriting everything in a faster language. It is about finding the one thing that is slow — the query missing an index, the external call that blocks the response, the endpoint loading 500 rows when it needs 20 — and fixing that one thing. Measure first. Optimise second. Always.
API Performance — Concept Anatomy
Performance Techniques at a Glance
| Concept | What it does | APIForge use case |
|---|---|---|
| Response Caching | Stores a computed response and serves it directly on repeat requests — skips the full processing pipeline | GET /projects list cached in Redis for 60s — drops DB load by 80% at peak traffic |
| Database Query Optimisation | Indexes, query plans, and selective field fetching eliminate the most common source of API latency | Adding index on projects.team_id drops query time from 820ms to 3ms |
| Connection Pooling | Reuses open database connections instead of creating a new one per request — removes ~15ms overhead per call | APIForge pg pool of 20 connections handles 200 concurrent requests without queue buildup |
| Parallel Execution | Runs independent async operations concurrently with Promise.all() instead of awaiting each in sequence | Fetching projects + user profile + team metadata simultaneously cuts response time by 60% |
| Payload Trimming | Returns only the fields the client actually needs — reduces serialisation time and network transfer size | Project list strips internal fields — payload drops from 14KB to 2KB per page |
| Load Testing | Simulates concurrent users to find the throughput ceiling and latency degradation curve before production traffic does | k6 test reveals APIForge breaks at 450 RPS — capacity planned ahead of a product launch |
The Performance Optimisation Workflow
Performance work without measurement is guesswork. The workflow below is the same process used at companies like GitHub and Cloudflare — profile first to find the bottleneck, fix exactly that, then verify the improvement with hard numbers before moving on.
APIForge — Performance Optimisation Cycle
measure current p99
find the slow span
cache / index / pool
re-run load test
next bottleneck
find ceiling RPS
Step 1 — Profiling: Find the Bottleneck First
The APIForge Backend team's GET /api/v1/projects endpoint was returning in 847ms at p99. Their SLA target is 200ms. A trace already identified the database query as the culprit — but before writing any fix, they need a full picture of every problem in the handler.
Here is the original handler in its slow state — four performance problems visible in the code before running any profiler.
// WHAT: APIForge — original slow GET /projects handler
// Four performance problems visible in plain sight
async function getProjectsHandler(req, res) {
const userId = req.user.id;
// Problem 1: Three sequential awaits that are fully independent
// Each one blocks until the previous finishes — pure wasted time
const user = await db.query(
"SELECT * FROM users WHERE id = $1", [userId]
);
const team = await db.query(
"SELECT * FROM teams WHERE id = $1", [user.rows[0].team_id]
);
const projects = await db.query(
"SELECT * FROM projects WHERE team_id = $1", [team.rows[0].id]
);
// Problem 2: SELECT * fetches every column including
// internal_notes, audit_log, raw_config — fields the response never uses
// Problem 3: No cache — every request hits the database
// even when the project list has not changed in hours
// Problem 4: No pagination — returns every project ever created
// A team with 800 projects sends 800 rows over the wire
const response = projects.rows.map(p => ({
id: p.id,
name: p.name,
team_id: p.team_id,
visibility: p.visibility,
created_at: p.created_at
// ...and 14 more fields the frontend ignores
}));
res.json({ data: response });
}What just happened?
Four performance problems in one handler. The database query is the biggest — 821ms for a query that should take under 5ms with a proper index. But even after fixing the index, the other three problems eat into the remaining budget: sequential awaits that could run in parallel, a bloated SELECT that fetches 19 columns when 5 are needed, and no cache to avoid repeating this work at all.
The 187KB payload is also significant. On a mobile connection, that adds hundreds of milliseconds of transfer time that the server-side latency number never captures.
Try this: Run EXPLAIN ANALYZE SELECT * FROM projects WHERE team_id = 'team_abc123' in your Postgres client. If you see "Seq Scan," there is no index on that column. Adding one is a single SQL statement that typically takes under a second and drops query time by 100x on large tables.
Step 2 — Database Optimisation and Parallel Queries
Database queries are the number one source of API latency in the vast majority of web applications. Not network. Not serialisation. The database. And the fix is almost always one of three things: add an index, select fewer columns, or stop making the database do work it already did five seconds ago.
What is a database index?
An index is a separate data structure the database maintains — sorted by the indexed column — so it can jump directly to matching rows instead of scanning every row in the table. Without an index on projects.team_id, a query for one team's projects reads every single project row in the database. With the index, it reads only the rows for that team. On a table with 500,000 rows, the difference is the entire table vs a few dozen rows — milliseconds vs seconds.
The refactored handler below fixes all four problems: a proper index, a targeted SELECT, Promise.all() for parallel execution, and a Redis cache layer in front of the database.
// WHAT: APIForge — optimised GET /projects handler
// Fixes: parallel queries + selective fields + Redis cache + pagination
const redis = require("./redis"); // ioredis client
const db = require("./db"); // pg Pool
// Run once at startup — not per request:
// CREATE INDEX CONCURRENTLY idx_projects_team_id ON projects(team_id);
async function getProjectsHandler(req, res) {
const userId = req.user.id;
const page = parseInt(req.query.page) || 1;
const limit = Math.min(parseInt(req.query.limit) || 20, 100);
const offset = (page - 1) * limit;
// Check cache first — key includes pagination so pages cache independently
const cacheKey = `projects:${userId}:page:${page}:limit:${limit}`;
const cached = await redis.get(cacheKey);
if (cached) {
res.setHeader("X-Cache", "HIT");
return res.json(JSON.parse(cached));
}
// Fetch user (to get team_id), then run projects + count in parallel
const userResult = await db.query(
"SELECT id, team_id FROM users WHERE id = $1", [userId]
);
const teamId = userResult.rows[0].team_id;
// Fix 1: Run data query and count query in parallel
// Fix 2: SELECT only the five fields the API returns
// Fix 3: LIMIT + OFFSET — never return unbounded rows
const [projects, countResult] = await Promise.all([
db.query(
`SELECT id, name, team_id, visibility, created_at
FROM projects
WHERE team_id = $1
ORDER BY created_at DESC
LIMIT $2 OFFSET $3`,
[teamId, limit, offset]
),
db.query(
"SELECT COUNT(*) FROM projects WHERE team_id = $1",
[teamId]
)
]);
const total = parseInt(countResult.rows[0].count);
const response = {
data: projects.rows,
meta: {
total,
page,
per_page: limit,
total_pages: Math.ceil(total / limit)
}
};
// Cache the result for 60 seconds
await redis.setex(cacheKey, 60, JSON.stringify(response));
res.setHeader("X-Cache", "MISS");
res.json(response);
}What just happened?
The database query dropped from 821ms to 4ms — purely from adding an index on team_id. Switching from SELECT * to five explicit fields trimmed serialisation time and payload size by 99%. Running the count query in parallel with the data query removed sequential wait time. Total first-request latency: 16ms. Cached request latency: 1ms.
The X-Cache header lets the frontend team and monitoring instantly see whether a response came from Redis or the database — without touching traces.
Try this: Change the cache TTL from 60 seconds to 5, then to 300, and watch the Prometheus metric for database query rate shift in each case. The longer the TTL, the fewer DB hits — but the more stale the data. There is no universal right answer; it depends on how often the underlying data actually changes.
Step 3 — Connection Pooling and Resource Limits
Opening a database connection takes time — typically 10–50ms depending on the database, network, and TLS handshake. At 100 requests per second, creating a new connection per request means 100 connection setups every second. That overhead compounds into hundreds of milliseconds of unnecessary latency and saturates the database's connection limit long before the actual query load does.
A connection pool keeps a set of connections open and ready. Incoming requests borrow a connection, use it, and return it to the pool. The pool handles the lifecycle — opening connections on startup, replacing dead ones, and queuing requests when all connections are busy.
// WHAT: APIForge — PostgreSQL connection pool configuration
// npm install pg
// File: src/db.js
const { Pool } = require("pg");
const pool = new Pool({
host: process.env.DB_HOST,
port: process.env.DB_PORT || 5432,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
ssl: { rejectUnauthorized: true },
// Pool size: how many connections to keep open simultaneously
// Rule of thumb: (2 x CPU cores) + effective_io_concurrency
// For a 4-core server: start at 10, tune up with load testing
max: 20,
// How long a client waits for a free connection before throwing
connectionTimeoutMillis: 3000,
// How long an idle connection stays open before being closed
idleTimeoutMillis: 30000,
// Max uses per connection — forces reconnect to pick up config changes
maxUses: 7500
});
pool.on("connect", () => {
console.log(JSON.stringify({
level: "INFO", event: "db.pool.connect",
totalCount: pool.totalCount, idleCount: pool.idleCount
}));
});
pool.on("error", (err) => {
console.error(JSON.stringify({
level: "ERROR", event: "db.pool.error", message: err.message
}));
});
module.exports = pool;What just happened?
With 20 pooled connections, 200 concurrent requests cycle through them without any connection setup overhead. Each request borrows a connection that is already authenticated and ready — the ~22ms cost is paid once per connection at startup, not once per request.
waitingCount spiking in your Prometheus metrics is an early warning sign before latency visibly degrades. It means your pool is saturated and requests are queuing. Either the pool size needs increasing or queries need to run faster so connections are returned sooner.
Try this: Set max: 1 in your pool config and run a load test with 50 concurrent users. Every request queues behind the single connection — you will feel exactly what a saturated pool looks like before tuning your real configuration.
Step 4 — Cache Invalidation on Writes
A cache is not a magic speed button you add to everything. Cached data is stale data — by definition, it is a snapshot from a moment in the past. The design question is always: how stale is too stale for this particular data, and what triggers a refresh?
TTL-based expiry
Cache entries expire after a fixed time. Simple to implement, no invalidation logic. Best for data that updates on a predictable schedule or where slight staleness is acceptable — like a project list or a dashboard count.
Write-through invalidation
When data is written (POST/PUT/DELETE), the corresponding cache key is deleted immediately. The next read is a cache miss and repopulates from the database. Keeps data accurate but requires invalidation code at every write path.
Cache-aside pattern
The application checks the cache first. On a miss, it reads from the database and writes to the cache before returning. The cache never proactively loads data — it only fills gaps. This is what the APIForge handler in Step 2 implements.
What not to cache
Never cache user-specific sensitive data across users. Never cache POST/PUT/DELETE responses — only GET. Never cache auth or permission checks — a user whose role changes should see the change immediately, not after a TTL expires.
// WHAT: APIForge — cache invalidation when a project is created
// Deletes cached project list pages so the next GET fetches fresh data
async function createProjectHandler(req, res) {
const userId = req.user.id;
const { name, visibility, tags } = req.body;
const userResult = await db.query(
"SELECT team_id FROM users WHERE id = $1", [userId]
);
const teamId = userResult.rows[0].team_id;
// Write the new project to the database
const result = await db.query(
`INSERT INTO projects (name, team_id, visibility, tags, created_at)
VALUES ($1, $2, $3, $4, NOW())
RETURNING id, name, team_id, visibility, created_at`,
[name, teamId, visibility, tags]
);
const newProject = result.rows[0];
// Invalidate all cached project list pages for this user
const pattern = `projects:${userId}:page:*`;
const keys = await redis.keys(pattern);
if (keys.length > 0) {
await redis.del(...keys);
console.log(JSON.stringify({
level: "INFO", event: "cache.invalidated",
pattern, keys_deleted: keys.length
}));
}
res.status(201).json({ data: newProject });
}
// Same invalidation logic applies to updateProject and deleteProjectWhat just happened?
When the POST creates a new project, it deletes all four cached page keys for that user's project list. The next GET is a cache miss, hits the database (still fast at 16ms), and automatically repopulates the cache. The request after that is a cache hit at 1ms. Data is never stale by more than one request cycle.
The log line cache.invalidated: keys_deleted: 4 tells you exactly which entries were cleared — useful for debugging stale data reports from users.
Try this: Create a project, then immediately GET the list before any other request runs. You should see X-Cache: MISS and the new project in the response. Without invalidation, you would see X-Cache: HIT and a list missing the project you just created.
Step 5 — Load Testing with k6
You have optimised the code. But how many users can it serve simultaneously before latency degrades or errors appear? Manual Postman testing uses one virtual user. Production might send 500 at once.
k6 is an open-source load testing tool from Grafana Labs. You write scripts in JavaScript that define how virtual users behave — what endpoints they call, how long they pause, how many users ramp up and over what duration. k6 executes these scripts and reports latency percentiles, throughput, and error rates across the full run.
Load test stages explained
A good load test has three phases. Ramp-up: gradually add virtual users so the server is not hit all at once (which does not reflect real traffic patterns). Sustained load: hold peak users long enough to detect memory leaks, connection pool exhaustion, or gradual latency drift. Ramp-down: reduce users gradually and observe recovery. A server that does not fully recover after load is removed has a resource leak.
// WHAT: APIForge — k6 load test for GET /projects
// Install: brew install k6 or https://k6.io/docs/getting-started/installation
// Run: k6 run --env BASE_URL=https://staging.apiforge.io \
// --env AUTH_TOKEN=eyJ... load-test.js
import http from "k6/http";
import { check, sleep } from "k6";
import { Rate, Trend } from "k6/metrics";
const errorRate = new Rate("custom_errors");
const projectsLatency = new Trend("projects_endpoint_latency");
export const options = {
stages: [
{ duration: "30s", target: 50 }, // Ramp up to 50 users
{ duration: "60s", target: 200 }, // Ramp up to 200 users
{ duration: "120s", target: 200 }, // Hold at 200 users
{ duration: "60s", target: 400 }, // Push to 400 users (stress)
{ duration: "30s", target: 0 }, // Ramp down
],
thresholds: {
// Test FAILS if p95 latency exceeds 200ms at any point
"http_req_duration{name:list_projects}": ["p(95)<200"],
// Test FAILS if error rate exceeds 1%
"custom_errors": ["rate<0.01"],
}
};
const BASE_URL = __ENV.BASE_URL || "http://localhost:3000";
const AUTH_TOKEN = __ENV.AUTH_TOKEN;
export default function () {
const headers = {
Authorization: `Bearer ${AUTH_TOKEN}`,
Accept: "application/json"
};
const res = http.get(
`${BASE_URL}/api/v1/projects?page=1&limit=20`,
{ headers, tags: { name: "list_projects" } }
);
projectsLatency.add(res.timings.duration);
errorRate.add(res.status !== 200);
check(res, {
"status is 200": (r) => r.status === 200,
"has data array": (r) => r.json("data") !== null,
"response under 200ms": (r) => r.timings.duration < 200,
"X-Cache header present": (r) => r.headers["X-Cache"] !== undefined,
});
// Simulate realistic think time between requests
sleep(Math.random() * 2 + 1); // 1-3 second pause
}What just happened?
The load test ran five stages over five minutes, peaking at 400 virtual users. At 200 users the API held — p95 latency was 148ms, well inside the 200ms threshold. At 400 users the connection pool saturated, p99 latency spiked to 892ms, and 0.69% of requests errored. The threshold was 1% — so the test technically passed on errors but clearly showed 400 VUs is beyond comfortable capacity.
This is exactly the information you want before a product launch. The team now knows: safe at 200 concurrent users, needs horizontal scaling to handle 400. That conversation happens in a planning meeting, not during a live incident.
Try this: Run the same k6 test twice — once with the cache disabled (comment out the Redis block) and once with it enabled. Compare the p95 latency and peak RPS between both runs. The delta shows you exactly what caching is contributing to your throughput ceiling.
Before and After — The Full Picture
Every optimisation in this lesson built on the previous one. The total improvement was not from one big change — it was from fixing four independent problems, each of which compounded.
Before optimisation
p99 latency: 847ms
p50 latency: 412ms
Payload size: 187KB (800 rows, 19 fields)
DB hit rate: 100% of requests
Peak throughput: ~40 RPS before latency degrades
After optimisation
p99 latency: 18ms (47x improvement)
p50 latency: 1ms (cache hit)
Payload size: 1.2KB (20 rows, 5 fields)
DB hit rate: ~5% (95% cache hit rate)
Peak throughput: 438 RPS at 200 concurrent users
What drove each gain
Database index on team_id — query time from 821ms to 4ms. Single biggest gain. One SQL statement, no application code change required.
SELECT specific fields + pagination — payload from 187KB to 1.2KB. Cuts serialisation time, network transfer, and client parsing cost simultaneously.
Promise.all() for parallel queries — data and count queries run simultaneously instead of sequentially. Saves the full duration of whichever is faster (3ms here, more with complex counts).
Redis cache with write-through invalidation — 95% of requests served in 1ms with zero database involvement. Database load reduced by ~95% at steady state.
Connection pool — eliminates 22ms per-request connection overhead. Foundational — without a pool, every optimisation above would still lose ~22ms before a single query ran.
Performance Anti-Patterns to Avoid
Knowing what slows APIs down is as useful as knowing how to speed them up. These are the patterns that appear most often in the traces of genuinely slow APIs — each one is a specific mistake with a specific fix.
| Anti-pattern | What it costs | Fix |
|---|---|---|
| N+1 queries | Fetching a list of 50 items then querying the DB once per item — 51 queries instead of 1 | JOIN or batch fetch with WHERE id IN (...) |
| Synchronous external calls | Calling a third-party API (Stripe, SendGrid) inline during a request — adds their latency to yours | Queue the work (Bull, SQS) and return 202 Accepted immediately |
| Missing pagination | Returning every row in a table — one user with 10,000 records can OOM your server | Always default LIMIT, cap max page size, return meta.total |
| Caching mutable auth data | Caching permission lookups — a revoked user continues to have access until TTL expires | Never cache auth checks, or use event-driven invalidation on role changes |
| Over-fetching in SELECT * | Fetching and serialising columns the response never uses — wastes DB I/O, memory, and network | Always name exactly the columns you need in every query |
The N+1 query problem deserves special attention
N+1 is the single most common API performance bug. It looks like this: you fetch a list of 50 projects, then inside the loop you fetch the owner's name for each one — 50 additional queries. The fix is a single JOIN or a batch query with WHERE id IN (id1, id2, ...id50) that returns all 50 owners in one round trip. An ORM like Prisma or Sequelize can generate N+1 patterns automatically if you are not careful with eager loading — always check your query logs when using an ORM.
Choosing the Right Caching Layer
In-process memory cache
Store cached values in a JavaScript Map or a library like node-cache. Zero network overhead — sub-millisecond access.
Limitation: not shared across server instances. If you run three API servers, each has its own cache. Cache invalidation on write becomes complex — you would need to broadcast to all instances. Best for immutable reference data (config, feature flags) that rarely changes.
Redis (shared cache)
A single Redis instance serves all API servers. Cache invalidation on write works correctly regardless of how many instances are running — they all read from and write to the same cache.
Adds ~1ms network round trip but handles multi-instance deployments correctly. The right choice for any API that runs more than one server process — which is most production APIs. APIForge uses Redis from day one.
HTTP Caching with Cache-Control Headers
Everything above is server-side caching — the server avoids doing work it already did. But APIs can also instruct the client to cache responses, using standard HTTP headers. This pushes caching even further out — to the browser, to a CDN, or to the API client's own cache layer — and eliminates the round trip to your server entirely for repeat requests.
// WHAT: APIForge — HTTP Cache-Control headers on GET /projects
// Tells browsers, CDNs, and API clients how long to cache the response
async function getProjectsHandler(req, res) {
// ... (fetch data with Redis + DB as before)
// Cache-Control header breakdown:
// private — only the requesting client may cache this (not a shared CDN)
// because the data is user-specific
// max-age=60 — client may use the cached copy for 60 seconds
// stale-while-revalidate=30 — after max-age, client may serve the stale copy
// for up to 30 more seconds while fetching a fresh one in the background
res.setHeader("Cache-Control", "private, max-age=60, stale-while-revalidate=30");
// ETag: a fingerprint of the response body
// Client sends it back as If-None-Match on subsequent requests
// If the data has not changed, server returns 304 Not Modified — zero body transfer
const etag = `"${Buffer.from(JSON.stringify(response)).length}-${Date.now()}"`;
res.setHeader("ETag", etag);
if (req.headers["if-none-match"] === etag) {
return res.status(304).end();
}
res.json(response);
}
// For truly public, non-user-specific endpoints (e.g. GET /api/v1/status):
// res.setHeader("Cache-Control", "public, max-age=300, s-maxage=600");
// s-maxage applies to CDN caches specifically — longer than browser cacheWhat just happened?
The Cache-Control: private, max-age=60 header tells the client it can serve this response from its own cache for 60 seconds without contacting the server at all. Zero latency, zero server load, zero bandwidth consumed for the most common case — a user refreshing a page they just loaded.
The ETag and 304 Not Modified pattern handles the case where the cache has expired but the data has not changed — the server confirms "still the same" with an 8ms round trip and zero bytes of body. Compared to sending 1.2KB of JSON, that is a meaningful saving at scale.
Try this: Open your browser DevTools, go to the Network tab, and hit an API endpoint twice. On the second request, look for "from cache" or status 304 in the response. If neither appears, the API is missing Cache-Control headers and the browser cannot cache the response even if the data is identical.
Quiz
1. The APIForge Backend team profiles their GET /projects endpoint and sees the trace breakdown: db.query_user 38ms, db.query_team 31ms, db.query_projects 821ms, serialize_response 44ms. Following the rule of measuring before optimising, which problem should they fix first and why?
2. An APIForge client cached a GET /projects response and received an ETag of "1228-abc". 90 seconds later it sends the same request with the header If-None-Match: "1228-abc". The project list has not changed. What does the server return and why is this better than a normal 200 response?
3. During a k6 load test at 300 concurrent users, the APIForge Prometheus dashboard shows waitingCount spiking to 280 on the database connection pool while query times remain fast at under 5ms each. What does this indicate and what should the team do?