Mango DBLesson 21 – Cursor & Pagination | Dataplexa

Cursor and Pagination

When MongoDB executes a find() query it does not immediately send every matching document to your application. Instead it returns a cursor — a server-side pointer to the result set that streams documents in batches on demand. Understanding how cursors work, how they expire, and how to use them efficiently is fundamental to writing applications that remain fast and memory-safe regardless of collection size. Pagination — presenting large result sets in manageable pages — builds directly on cursor behaviour and has two main strategies: offset-based pagination with skip() and limit(), and keyset (cursor-based) pagination using range filters. This lesson covers both in depth, with all examples running against the Dataplexa Store dataset.

How Cursors Work

A cursor is a lazy, server-side iterator. It holds the query plan and the current position in the result set. Documents are fetched from the server in batches — by default the first batch is 101 documents or 16 MB, whichever comes first — and your application iterates over them without loading the entire result set into memory at once.

# Cursor fundamentals — lazy evaluation and batch fetching

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# find() returns a cursor immediately — no network round trip yet
cursor = db.products.find({})
print("Type of cursor:", type(cursor).__name__)
print("Query not executed yet — cursor is lazy")

# Iteration triggers the actual query execution
print("\nIterating cursor — query executes now:")
for product in cursor:
    print(f"  {product['name']}")

# A cursor can only be iterated once
print("\nTrying to iterate again:")
second_pass = list(cursor)   # cursor is exhausted
print(f"  Documents from second pass: {len(second_pass)}")
print("  Cursor exhausted — re-run the query to iterate again")

# Re-run the query for a fresh cursor
fresh_cursor = db.products.find({}, {"name": 1, "_id": 0})
all_products = list(fresh_cursor)
print(f"\nFresh cursor — {len(all_products)} products loaded into memory")
Type of cursor: Cursor
Query not executed yet — cursor is lazy

Iterating cursor — query executes now:
Wireless Mouse
Mechanical Keyboard
Notebook A5
Standing Desk
USB-C Hub
Ballpoint Pens 10-pack
Monitor 27-inch

Trying to iterate again:
Documents from second pass: 0
Cursor exhausted — re-run the query to iterate again

Fresh cursor — 7 products loaded into memory
  • A cursor is exhausted after one full iteration — it cannot be rewound. Re-run the query to get a fresh cursor
  • Never call list(cursor) on a query that could return millions of documents — it loads everything into memory and will cause out-of-memory errors
  • Use a for loop to iterate a cursor — documents are fetched in batches and the previous batch is garbage-collected as you move through the results
  • Server-side cursors time out after 10 minutes of inactivity by default — long-running iterations should use no_cursor_timeout=True with caution

Cursor Methods and Chaining

Cursor methods modify the query before execution. Because cursors are lazy, you can chain as many methods as needed before the first document is fetched. The server applies all modifications together in a fixed order: filter → sort → skip → limit — regardless of the Python call order.

# Cursor methods — chaining before execution

from pymongo import MongoClient, ASCENDING, DESCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Full method chain — none execute until iteration begins
cursor = (
    db.products
    .find({"category": "Electronics"}, {"name": 1, "price": 1, "_id": 0})
    .sort("price", DESCENDING)
    .skip(1)
    .limit(2)
)

print("Top 2nd and 3rd most expensive Electronics:")
for p in cursor:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Cursor introspection — check before iterating
cursor2 = db.orders.find({"status": "delivered"})
print(f"\nEstimated documents in cursor: not available until iterated")
print(f"Count via collection method: {db.orders.count_documents({'status': 'delivered'})}")

# batch_size — control how many documents per network round trip
large_cursor = db.products.find({}).batch_size(3)
print("\nIterating with batch_size=3:")
for p in large_cursor:
    print(f"  {p['name']}")
Top 2nd and 3rd most expensive Electronics:
$269.99 Monitor 27-inch
$ 80.99 Mechanical Keyboard

Count via collection method: 4

Iterating with batch_size=3:
Wireless Mouse
Mechanical Keyboard
Notebook A5
Standing Desk
USB-C Hub
Ballpoint Pens 10-pack
Monitor 27-inch
  • batch_size(n) controls how many documents MongoDB sends per network round trip — useful for tuning memory usage vs latency
  • The Python call order of cursor methods does not affect results — MongoDB always applies sort before skip before limit on the server
  • Use count_documents(filter) on the collection — not on the cursor — for an accurate count of matching documents

Cursor Timeout and no_cursor_timeout

Server-side cursors automatically time out after 10 minutes of inactivity. If your application processes documents slowly — reading from the cursor in a loop with expensive processing between each document — the cursor may expire before you finish iterating. Understanding this and knowing the options prevents mysterious CursorNotFound errors in production.

# Cursor timeout — understanding expiry and the no_cursor_timeout option

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Default cursor — times out after 10 minutes of inactivity
normal_cursor = db.products.find({})

# no_cursor_timeout=True — server keeps cursor alive indefinitely
# Use only when necessary — always close the cursor explicitly
persistent_cursor = db.products.find({}, no_cursor_timeout=True)

try:
    for product in persistent_cursor:
        # Simulate slow processing
        print(f"  Processing: {product['name']}")
        # ... expensive operation here ...
finally:
    # Always close a no_cursor_timeout cursor — otherwise it leaks on the server
    persistent_cursor.close()
    print("Cursor closed — server resources released")

# Context manager — automatically closes the cursor on exit
print("\nUsing cursor as context manager:")
with db.products.find({}, {"name": 1, "_id": 0}) as ctx_cursor:
    for p in ctx_cursor:
        print(f"  {p['name']}")
# cursor closed automatically here
Processing: Wireless Mouse
Processing: Mechanical Keyboard
Processing: Notebook A5
Processing: Standing Desk
Processing: USB-C Hub
Processing: Ballpoint Pens 10-pack
Processing: Monitor 27-inch
Cursor closed — server resources released

Using cursor as context manager:
Wireless Mouse
Mechanical Keyboard
Notebook A5
Standing Desk
USB-C Hub
Ballpoint Pens 10-pack
Monitor 27-inch
  • Always use a context manager (with db.collection.find() as cursor) or call cursor.close() explicitly when using no_cursor_timeout=True
  • For batch processing jobs that take hours, consider fetching all _id values first, then querying documents in small chunks rather than holding one long-lived cursor open
  • A CursorNotFound error at runtime almost always means the cursor timed out — re-run the query and process faster, or use no_cursor_timeout

Offset-Based Pagination — skip() and limit()

Offset-based pagination uses skip(n) to jump to a page position and limit(size) to return a fixed number of documents. It is simple to implement and supports random page access — but it degrades in performance as the page number grows because MongoDB must scan all skipped documents before returning results.

# Offset-based pagination — skip() + limit()

from pymongo import MongoClient, ASCENDING
import math

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

PAGE_SIZE  = 3
total_docs = db.products.count_documents({})
total_pages = math.ceil(total_docs / PAGE_SIZE)

print(f"Total products: {total_docs}  |  Page size: {PAGE_SIZE}  |  Total pages: {total_pages}")

def get_page_offset(page: int, page_size: int = PAGE_SIZE):
    """Fetch one page using skip/limit offset pagination."""
    skip_n = (page - 1) * page_size
    docs   = list(
        db.products
        .find({}, {"name": 1, "price": 1, "_id": 0})
        .sort("price", ASCENDING)
        .skip(skip_n)
        .limit(page_size)
    )
    return {
        "page":       page,
        "total_pages": total_pages,
        "results":    docs,
        "has_next":   page < total_pages,
        "has_prev":   page > 1,
    }

for page_num in range(1, total_pages + 1):
    page_data = get_page_offset(page_num)
    print(f"\nPage {page_data['page']} of {page_data['total_pages']}  "
          f"[prev:{page_data['has_prev']} next:{page_data['has_next']}]")
    for p in page_data["results"]:
        print(f"  ${p['price']:>7.2f}  {p['name']}")
Total products: 7 | Page size: 3 | Total pages: 3

Page 1 of 3 [prev:False next:True]
$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse

Page 2 of 3 [prev:True next:True]
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch

Page 3 of 3 [prev:True next:False]
$349.99 Standing Desk
  • skip(n) formula: (page - 1) * page_size
  • Offset pagination supports jumping to an arbitrary page — useful for "page 47 of 200" style UIs
  • Performance degrades linearly — page 1000 with page size 20 requires MongoDB to scan and discard 19,980 documents before returning 20 results
  • Always combine skip() with a sort() — without a stable sort, pages may return inconsistent or overlapping results

Keyset Pagination — The Scalable Alternative

Keyset pagination (also called cursor-based pagination) uses the value of the last document on the current page as a filter anchor for the next page. Instead of skipping documents, it asks MongoDB "give me the next N documents after this value" — which is always an index-backed range scan regardless of how deep into the dataset you are.

# Keyset pagination — O(1) per page using a range filter anchor

from pymongo import MongoClient, ASCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

PAGE_SIZE = 3

def get_page_keyset(last_price=None, last_id=None, page_size=PAGE_SIZE):
    """
    Fetch the next page of products sorted by price ascending.
    Pass price and _id of the last document from the previous page.
    Returns: list of documents and the cursor values for the next page.
    """
    if last_price is None:
        # First page — no anchor
        filter_q = {}
    else:
        # Subsequent pages — anchor after the last seen document
        # Tie-break on _id ensures stability when multiple docs share the same price
        filter_q = {"$or": [
            {"price": {"$gt": last_price}},
            {"price": last_price, "_id": {"$gt": last_id}}
        ]}

    docs = list(
        db.products
        .find(filter_q, {"name": 1, "price": 1, "_id": 1})
        .sort([("price", ASCENDING), ("_id", ASCENDING)])
        .limit(page_size)
    )

    next_anchor = {"last_price": docs[-1]["price"], "last_id": docs[-1]["_id"]} if docs else None
    return docs, next_anchor

# Page 1
page, anchor = get_page_keyset()
print("Page 1 (keyset):")
for p in page:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Page 2 — pass anchor from page 1
page, anchor = get_page_keyset(**anchor)
print("\nPage 2 (keyset):")
for p in page:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Page 3 — pass anchor from page 2
page, anchor = get_page_keyset(**anchor)
print("\nPage 3 (keyset):")
for p in page:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

print("\nNo more pages" if anchor and db.products.count_documents(
    {"$or": [{"price": {"$gt": anchor["last_price"]}},
              {"price": anchor["last_price"], "_id": {"$gt": anchor["last_id"]}}]}
) == 0 else "More pages may exist")
Page 1 (keyset):
$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse

Page 2 (keyset):
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch

Page 3 (keyset):
$349.99 Standing Desk

No more pages
  • Each keyset page query is a simple range scan — performance is constant regardless of how deep into the dataset you are
  • Always include a secondary sort and filter on _id as a tie-breaker — without it, documents with equal sort values can be duplicated or skipped across pages
  • Keyset pagination cannot jump to arbitrary page numbers — it can only move forward (or backward with reversed sort). Use offset pagination when random page access is required
  • Pass the anchor values in the API response so the client can request the next page without knowing about database internals

Offset vs Keyset — Choosing the Right Strategy

# Offset vs keyset — choosing the right pagination strategy

comparison = {
    "Feature": [
        "Implementation complexity",
        "Performance at page 1",
        "Performance at page 1000",
        "Random page access (jump to page N)",
        "Stable results during concurrent writes",
        "Best for",
    ],
    "Offset (skip + limit)": [
        "Simple",
        "Fast",
        "Slow — scans all skipped docs",
        "Yes — trivial",
        "No — inserts shift page boundaries",
        "Admin UIs, small datasets, known total pages",
    ],
    "Keyset (range filter)": [
        "Moderate",
        "Fast",
        "Fast — always O(1) index scan",
        "No — forward only",
        "Yes — anchor is stable",
        "APIs, infinite scroll, large datasets",
    ],
}

header = f"{'Feature':40} {'Offset':35} {'Keyset':35}"
print(header)
print("-" * len(header))
for i, feature in enumerate(comparison["Feature"]):
    offset = comparison["Offset (skip + limit)"][i]
    keyset = comparison["Keyset (range filter)"][i]
    print(f"{feature:40} {offset:35} {keyset:35}")
Feature Offset Keyset
-----------------------------------------------------------------------------------------------
Implementation complexity Simple Moderate
Performance at page 1 Fast Fast
Performance at page 1000 Slow — scans all skipped docs Fast — always O(1) index scan
Random page access (jump to page N) Yes — trivial No — forward only
Stable results during concurrent writes No — inserts shift page boundaries Yes — anchor is stable
Best for Admin UIs, small datasets APIs, infinite scroll, large datasets
  • For public-facing APIs and infinite scroll UIs, always use keyset pagination — consistency and performance at scale matter more than random page access
  • For internal admin dashboards with small datasets and a need to jump to page N, offset pagination is simpler and perfectly adequate
  • A hybrid approach works well in practice — keyset for forward navigation, with an occasional offset query for the total page count display

Summary Table

Concept What It Is Key Behaviour Watch Out For
Cursor Lazy server-side iterator Streams batches on demand Exhausted after one pass — single-use
Cursor timeout Server expires idle cursors Default 10 minutes idle Close no_cursor_timeout cursors explicitly
batch_size(n) Docs per network batch Default: 101 docs or 16 MB Large batches use more memory
Offset pagination skip() + limit() Supports random page access Slow for deep pages — scans skipped docs
Keyset pagination Range filter on last value O(1) per page — always fast Forward-only — no random page jump

Practice Questions

Practice 1. What happens when you try to iterate a PyMongo cursor a second time after it has been fully consumed?



Practice 2. What error appears in production when a cursor times out during slow iteration, and how do you prevent it?



Practice 3. What is the skip value for fetching page 5 with a page size of 10?



Practice 4. Why must a tie-breaker field like _id be included in both the sort and the keyset filter anchor?



Practice 5. In what situation should you choose offset pagination over keyset pagination?



Quiz

Quiz 1. What is the default first-batch size MongoDB sends when opening a cursor?






Quiz 2. What is the key performance problem with skip()-based pagination on large collections?






Quiz 3. What happens to concurrent write consistency with offset pagination when a new document is inserted between page requests?






Quiz 4. What must you always do when using a cursor opened with no_cursor_timeout=True?






Quiz 5. Why is keyset pagination described as O(1) per page regardless of depth?






Next up — Write & Read Concern: Controlling durability and consistency guarantees on every MongoDB operation.