MongoDB
Cursor and Pagination
When MongoDB executes a find() query it does not immediately send every matching document to your application. Instead it returns a cursor — a server-side pointer to the result set that streams documents in batches on demand. Understanding how cursors work, how they expire, and how to use them efficiently is fundamental to writing applications that remain fast and memory-safe regardless of collection size. Pagination — presenting large result sets in manageable pages — builds directly on cursor behaviour and has two main strategies: offset-based pagination with skip() and limit(), and keyset (cursor-based) pagination using range filters. This lesson covers both in depth, with all examples running against the Dataplexa Store dataset.
How Cursors Work
A cursor is a lazy, server-side iterator. It holds the query plan and the current position in the result set. Documents are fetched from the server in batches — by default the first batch is 101 documents or 16 MB, whichever comes first — and your application iterates over them without loading the entire result set into memory at once.
# Cursor fundamentals — lazy evaluation and batch fetching
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# find() returns a cursor immediately — no network round trip yet
cursor = db.products.find({})
print("Type of cursor:", type(cursor).__name__)
print("Query not executed yet — cursor is lazy")
# Iteration triggers the actual query execution
print("\nIterating cursor — query executes now:")
for product in cursor:
print(f" {product['name']}")
# A cursor can only be iterated once
print("\nTrying to iterate again:")
second_pass = list(cursor) # cursor is exhausted
print(f" Documents from second pass: {len(second_pass)}")
print(" Cursor exhausted — re-run the query to iterate again")
# Re-run the query for a fresh cursor
fresh_cursor = db.products.find({}, {"name": 1, "_id": 0})
all_products = list(fresh_cursor)
print(f"\nFresh cursor — {len(all_products)} products loaded into memory")Query not executed yet — cursor is lazy
Iterating cursor — query executes now:
Wireless Mouse
Mechanical Keyboard
Notebook A5
Standing Desk
USB-C Hub
Ballpoint Pens 10-pack
Monitor 27-inch
Trying to iterate again:
Documents from second pass: 0
Cursor exhausted — re-run the query to iterate again
Fresh cursor — 7 products loaded into memory
- A cursor is exhausted after one full iteration — it cannot be rewound. Re-run the query to get a fresh cursor
- Never call
list(cursor)on a query that could return millions of documents — it loads everything into memory and will cause out-of-memory errors - Use a
forloop to iterate a cursor — documents are fetched in batches and the previous batch is garbage-collected as you move through the results - Server-side cursors time out after 10 minutes of inactivity by default — long-running iterations should use
no_cursor_timeout=Truewith caution
Cursor Methods and Chaining
Cursor methods modify the query before execution. Because cursors are lazy, you can chain as many methods as needed before the first document is fetched. The server applies all modifications together in a fixed order: filter → sort → skip → limit — regardless of the Python call order.
# Cursor methods — chaining before execution
from pymongo import MongoClient, ASCENDING, DESCENDING
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Full method chain — none execute until iteration begins
cursor = (
db.products
.find({"category": "Electronics"}, {"name": 1, "price": 1, "_id": 0})
.sort("price", DESCENDING)
.skip(1)
.limit(2)
)
print("Top 2nd and 3rd most expensive Electronics:")
for p in cursor:
print(f" ${p['price']:>7.2f} {p['name']}")
# Cursor introspection — check before iterating
cursor2 = db.orders.find({"status": "delivered"})
print(f"\nEstimated documents in cursor: not available until iterated")
print(f"Count via collection method: {db.orders.count_documents({'status': 'delivered'})}")
# batch_size — control how many documents per network round trip
large_cursor = db.products.find({}).batch_size(3)
print("\nIterating with batch_size=3:")
for p in large_cursor:
print(f" {p['name']}")$269.99 Monitor 27-inch
$ 80.99 Mechanical Keyboard
Count via collection method: 4
Iterating with batch_size=3:
Wireless Mouse
Mechanical Keyboard
Notebook A5
Standing Desk
USB-C Hub
Ballpoint Pens 10-pack
Monitor 27-inch
batch_size(n)controls how many documents MongoDB sends per network round trip — useful for tuning memory usage vs latency- The Python call order of cursor methods does not affect results — MongoDB always applies sort before skip before limit on the server
- Use
count_documents(filter)on the collection — not on the cursor — for an accurate count of matching documents
Cursor Timeout and no_cursor_timeout
Server-side cursors automatically time out after 10 minutes of inactivity. If your application processes documents slowly — reading from the cursor in a loop with expensive processing between each document — the cursor may expire before you finish iterating. Understanding this and knowing the options prevents mysterious CursorNotFound errors in production.
# Cursor timeout — understanding expiry and the no_cursor_timeout option
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Default cursor — times out after 10 minutes of inactivity
normal_cursor = db.products.find({})
# no_cursor_timeout=True — server keeps cursor alive indefinitely
# Use only when necessary — always close the cursor explicitly
persistent_cursor = db.products.find({}, no_cursor_timeout=True)
try:
for product in persistent_cursor:
# Simulate slow processing
print(f" Processing: {product['name']}")
# ... expensive operation here ...
finally:
# Always close a no_cursor_timeout cursor — otherwise it leaks on the server
persistent_cursor.close()
print("Cursor closed — server resources released")
# Context manager — automatically closes the cursor on exit
print("\nUsing cursor as context manager:")
with db.products.find({}, {"name": 1, "_id": 0}) as ctx_cursor:
for p in ctx_cursor:
print(f" {p['name']}")
# cursor closed automatically hereProcessing: Mechanical Keyboard
Processing: Notebook A5
Processing: Standing Desk
Processing: USB-C Hub
Processing: Ballpoint Pens 10-pack
Processing: Monitor 27-inch
Cursor closed — server resources released
Using cursor as context manager:
Wireless Mouse
Mechanical Keyboard
Notebook A5
Standing Desk
USB-C Hub
Ballpoint Pens 10-pack
Monitor 27-inch
- Always use a context manager (
with db.collection.find() as cursor) or callcursor.close()explicitly when usingno_cursor_timeout=True - For batch processing jobs that take hours, consider fetching all
_idvalues first, then querying documents in small chunks rather than holding one long-lived cursor open - A
CursorNotFounderror at runtime almost always means the cursor timed out — re-run the query and process faster, or useno_cursor_timeout
Offset-Based Pagination — skip() and limit()
Offset-based pagination uses skip(n) to jump to a page position and limit(size) to return a fixed number of documents. It is simple to implement and supports random page access — but it degrades in performance as the page number grows because MongoDB must scan all skipped documents before returning results.
# Offset-based pagination — skip() + limit()
from pymongo import MongoClient, ASCENDING
import math
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
PAGE_SIZE = 3
total_docs = db.products.count_documents({})
total_pages = math.ceil(total_docs / PAGE_SIZE)
print(f"Total products: {total_docs} | Page size: {PAGE_SIZE} | Total pages: {total_pages}")
def get_page_offset(page: int, page_size: int = PAGE_SIZE):
"""Fetch one page using skip/limit offset pagination."""
skip_n = (page - 1) * page_size
docs = list(
db.products
.find({}, {"name": 1, "price": 1, "_id": 0})
.sort("price", ASCENDING)
.skip(skip_n)
.limit(page_size)
)
return {
"page": page,
"total_pages": total_pages,
"results": docs,
"has_next": page < total_pages,
"has_prev": page > 1,
}
for page_num in range(1, total_pages + 1):
page_data = get_page_offset(page_num)
print(f"\nPage {page_data['page']} of {page_data['total_pages']} "
f"[prev:{page_data['has_prev']} next:{page_data['has_next']}]")
for p in page_data["results"]:
print(f" ${p['price']:>7.2f} {p['name']}")Page 1 of 3 [prev:False next:True]
$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse
Page 2 of 3 [prev:True next:True]
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
Page 3 of 3 [prev:True next:False]
$349.99 Standing Desk
skip(n)formula:(page - 1) * page_size- Offset pagination supports jumping to an arbitrary page — useful for "page 47 of 200" style UIs
- Performance degrades linearly — page 1000 with page size 20 requires MongoDB to scan and discard 19,980 documents before returning 20 results
- Always combine
skip()with asort()— without a stable sort, pages may return inconsistent or overlapping results
Keyset Pagination — The Scalable Alternative
Keyset pagination (also called cursor-based pagination) uses the value of the last document on the current page as a filter anchor for the next page. Instead of skipping documents, it asks MongoDB "give me the next N documents after this value" — which is always an index-backed range scan regardless of how deep into the dataset you are.
# Keyset pagination — O(1) per page using a range filter anchor
from pymongo import MongoClient, ASCENDING
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
PAGE_SIZE = 3
def get_page_keyset(last_price=None, last_id=None, page_size=PAGE_SIZE):
"""
Fetch the next page of products sorted by price ascending.
Pass price and _id of the last document from the previous page.
Returns: list of documents and the cursor values for the next page.
"""
if last_price is None:
# First page — no anchor
filter_q = {}
else:
# Subsequent pages — anchor after the last seen document
# Tie-break on _id ensures stability when multiple docs share the same price
filter_q = {"$or": [
{"price": {"$gt": last_price}},
{"price": last_price, "_id": {"$gt": last_id}}
]}
docs = list(
db.products
.find(filter_q, {"name": 1, "price": 1, "_id": 1})
.sort([("price", ASCENDING), ("_id", ASCENDING)])
.limit(page_size)
)
next_anchor = {"last_price": docs[-1]["price"], "last_id": docs[-1]["_id"]} if docs else None
return docs, next_anchor
# Page 1
page, anchor = get_page_keyset()
print("Page 1 (keyset):")
for p in page:
print(f" ${p['price']:>7.2f} {p['name']}")
# Page 2 — pass anchor from page 1
page, anchor = get_page_keyset(**anchor)
print("\nPage 2 (keyset):")
for p in page:
print(f" ${p['price']:>7.2f} {p['name']}")
# Page 3 — pass anchor from page 2
page, anchor = get_page_keyset(**anchor)
print("\nPage 3 (keyset):")
for p in page:
print(f" ${p['price']:>7.2f} {p['name']}")
print("\nNo more pages" if anchor and db.products.count_documents(
{"$or": [{"price": {"$gt": anchor["last_price"]}},
{"price": anchor["last_price"], "_id": {"$gt": anchor["last_id"]}}]}
) == 0 else "More pages may exist")$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse
Page 2 (keyset):
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
Page 3 (keyset):
$349.99 Standing Desk
No more pages
- Each keyset page query is a simple range scan — performance is constant regardless of how deep into the dataset you are
- Always include a secondary sort and filter on
_idas a tie-breaker — without it, documents with equal sort values can be duplicated or skipped across pages - Keyset pagination cannot jump to arbitrary page numbers — it can only move forward (or backward with reversed sort). Use offset pagination when random page access is required
- Pass the anchor values in the API response so the client can request the next page without knowing about database internals
Offset vs Keyset — Choosing the Right Strategy
# Offset vs keyset — choosing the right pagination strategy
comparison = {
"Feature": [
"Implementation complexity",
"Performance at page 1",
"Performance at page 1000",
"Random page access (jump to page N)",
"Stable results during concurrent writes",
"Best for",
],
"Offset (skip + limit)": [
"Simple",
"Fast",
"Slow — scans all skipped docs",
"Yes — trivial",
"No — inserts shift page boundaries",
"Admin UIs, small datasets, known total pages",
],
"Keyset (range filter)": [
"Moderate",
"Fast",
"Fast — always O(1) index scan",
"No — forward only",
"Yes — anchor is stable",
"APIs, infinite scroll, large datasets",
],
}
header = f"{'Feature':40} {'Offset':35} {'Keyset':35}"
print(header)
print("-" * len(header))
for i, feature in enumerate(comparison["Feature"]):
offset = comparison["Offset (skip + limit)"][i]
keyset = comparison["Keyset (range filter)"][i]
print(f"{feature:40} {offset:35} {keyset:35}")-----------------------------------------------------------------------------------------------
Implementation complexity Simple Moderate
Performance at page 1 Fast Fast
Performance at page 1000 Slow — scans all skipped docs Fast — always O(1) index scan
Random page access (jump to page N) Yes — trivial No — forward only
Stable results during concurrent writes No — inserts shift page boundaries Yes — anchor is stable
Best for Admin UIs, small datasets APIs, infinite scroll, large datasets
- For public-facing APIs and infinite scroll UIs, always use keyset pagination — consistency and performance at scale matter more than random page access
- For internal admin dashboards with small datasets and a need to jump to page N, offset pagination is simpler and perfectly adequate
- A hybrid approach works well in practice — keyset for forward navigation, with an occasional offset query for the total page count display
Summary Table
| Concept | What It Is | Key Behaviour | Watch Out For |
|---|---|---|---|
| Cursor | Lazy server-side iterator | Streams batches on demand | Exhausted after one pass — single-use |
| Cursor timeout | Server expires idle cursors | Default 10 minutes idle | Close no_cursor_timeout cursors explicitly |
| batch_size(n) | Docs per network batch | Default: 101 docs or 16 MB | Large batches use more memory |
| Offset pagination | skip() + limit() | Supports random page access | Slow for deep pages — scans skipped docs |
| Keyset pagination | Range filter on last value | O(1) per page — always fast | Forward-only — no random page jump |
Practice Questions
Practice 1. What happens when you try to iterate a PyMongo cursor a second time after it has been fully consumed?
Practice 2. What error appears in production when a cursor times out during slow iteration, and how do you prevent it?
Practice 3. What is the skip value for fetching page 5 with a page size of 10?
Practice 4. Why must a tie-breaker field like _id be included in both the sort and the keyset filter anchor?
Practice 5. In what situation should you choose offset pagination over keyset pagination?
Quiz
Quiz 1. What is the default first-batch size MongoDB sends when opening a cursor?
Quiz 2. What is the key performance problem with skip()-based pagination on large collections?
Quiz 3. What happens to concurrent write consistency with offset pagination when a new document is inserted between page requests?
Quiz 4. What must you always do when using a cursor opened with no_cursor_timeout=True?
Quiz 5. Why is keyset pagination described as O(1) per page regardless of depth?
Next up — Write & Read Concern: Controlling durability and consistency guarantees on every MongoDB operation.