Mango DBLesson 15 – | Dataplexa

Sorting & Limiting

Retrieving documents in a useful order and capping the number returned are two of the most fundamental query controls in MongoDB. Without sorting, documents come back in natural insertion order — fine for debugging, useless for a product listing or a leaderboard. Without limiting, a single query against a large collection can return millions of documents and bring your application to its knees. sort(), limit(), and skip() work as a chain on a cursor and compose cleanly with filters and projections to build precise, efficient read operations.

This lesson goes deeper than the brief introduction in Lesson 12 — covering multi-key sorting, sort stability, index-backed sorts, cursor method chaining, and the performance implications of each approach, all using the Dataplexa Store dataset.

sort() — Ordering Results

sort() orders the documents returned by a query. Pass a field name and a direction: 1 for ascending (A→Z, 0→9, oldest→newest), -1 for descending (Z→A, 9→0, newest→oldest). You can import the constants ASCENDING and DESCENDING from PyMongo to make code more readable.

Why it matters: users expect product catalogues sorted by price, activity feeds sorted by recency, and leaderboards sorted by score. Sorting in the database is always faster than fetching all documents and sorting in Python.

# sort() — ascending and descending on a single field

from pymongo import MongoClient, ASCENDING, DESCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Ascending — cheapest products first
print("Products cheapest first:")
results = db.products.find(
    {}, {"name": 1, "price": 1, "_id": 0}
).sort("price", ASCENDING)

for p in results:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Descending — highest-rated products first
print("\nProducts highest-rated first:")
results = db.products.find(
    {}, {"name": 1, "rating": 1, "_id": 0}
).sort("rating", DESCENDING)

for p in results:
    print(f"  {p['rating']}  {p['name']}")
Products cheapest first:
$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
$349.99 Standing Desk

Products highest-rated first:
4.8 Standing Desk
4.7 Mechanical Keyboard
4.6 Monitor 27-inch
4.5 Wireless Mouse
4.3 USB-C Hub
4.2 Notebook A5
4.0 Ballpoint Pens 10-pack
  • Use ASCENDING and DESCENDING constants — more readable than raw 1 and -1
  • Sorting without an index on the sort field forces MongoDB to load all matching documents into memory and sort them there — always index high-traffic sort fields
  • String sorting follows UTF-8 byte order by default — case-sensitive and locale-dependent

Multi-Key Sorting

Pass a list of (field, direction) tuples to sort by multiple fields. MongoDB applies the sort fields left to right — the second field only comes into play when two documents have equal values in the first field.

# Multi-key sort — sort by category first, then by price within each category

from pymongo import MongoClient, ASCENDING, DESCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Category A→Z, then price low→high within each category
print("Products by category (A-Z), then price (low-high):")
results = db.products.find(
    {}, {"name": 1, "category": 1, "price": 1, "_id": 0}
).sort([
    ("category", ASCENDING),
    ("price",    ASCENDING)
])

current_cat = None
for p in results:
    if p["category"] != current_cat:
        current_cat = p["category"]
        print(f"\n  [{current_cat}]")
    print(f"    ${p['price']:>7.2f}  {p['name']}")

# Orders by status A→Z, then total highest→lowest within status
print("\n\nOrders by status (A-Z), then total (high-low):")
results = db.orders.find(
    {}, {"_id": 1, "status": 1, "total": 1}
).sort([
    ("status", ASCENDING),
    ("total",  DESCENDING)
])

for o in results:
    print(f"  {o['_id']}  {o['status']:12}  ${o['total']:.2f}")
Products by category (A-Z), then price (low-high):

[Electronics]
$ 29.99 Wireless Mouse
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch

[Furniture]
$349.99 Standing Desk

[Stationery]
$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5

Orders by status (A-Z), then total (high-low):
o006 cancelled $89.99
o001 delivered $44.96
o003 delivered $99.98
o005 delivered $329.98
o007 delivered $11.97
o004 processing $349.99
o002 shipped $89.99
  • Multi-key sort requires a list of tuples — a plain dict does not preserve order reliably in older Python versions
  • The sort is applied in order: first by the first field, then by the second field for ties, and so on
  • A compound index on (category, price) makes this sort index-backed and very fast — both field order and direction should match the index

limit() — Capping the Result Count

limit(n) tells MongoDB to stop returning documents after n have been sent. It reduces both network transfer and server-side work — MongoDB stops scanning as soon as the limit is reached (especially when combined with an index).

# limit() — cap results to the top N documents

from pymongo import MongoClient, DESCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Top 3 most expensive products
print("Top 3 most expensive products:")
results = db.products.find(
    {}, {"name": 1, "price": 1, "_id": 0}
).sort("price", DESCENDING).limit(3)

for p in results:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Top 3 highest-rated Electronics
print("\nTop 3 highest-rated Electronics:")
results = db.products.find(
    {"category": "Electronics"},
    {"name": 1, "rating": 1, "_id": 0}
).sort("rating", DESCENDING).limit(3)

for p in results:
    print(f"  rating: {p['rating']}  {p['name']}")

# Most recent 2 delivered orders (sorted by date descending)
print("\n2 most recent delivered orders:")
results = db.orders.find(
    {"status": "delivered"},
    {"_id": 1, "date": 1, "total": 1}
).sort("date", DESCENDING).limit(2)

for o in results:
    print(f"  {o['_id']}  date: {o['date']}  ${o['total']:.2f}")
Top 3 most expensive products:
$349.99 Standing Desk
$299.99 Monitor 27-inch
$ 89.99 Mechanical Keyboard

Top 3 highest-rated Electronics:
rating: 4.7 Mechanical Keyboard
rating: 4.6 Monitor 27-inch
rating: 4.5 Wireless Mouse

2 most recent delivered orders:
o007 date: 2024-04-01 $11.97
o005 date: 2024-03-10 $329.98
  • Always pair limit() with sort() — without a stable sort the returned documents are non-deterministic
  • limit(0) is equivalent to no limit — all documents are returned
  • MongoDB stops scanning immediately when the limit is reached if the sort field is indexed — making top-N queries very efficient

skip() — Offsetting the Start Position

skip(n) skips over the first n documents before starting to return results. Combined with limit() it implements offset-based pagination — the most common approach for "previous / next page" UIs.

# skip() + limit() — offset-based pagination

from pymongo import MongoClient, ASCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

PAGE_SIZE = 3

def get_products_page(page: int):
    """Return one page of products sorted by price ascending."""
    skip_count = (page - 1) * PAGE_SIZE
    cursor = db.products.find(
        {}, {"name": 1, "price": 1, "_id": 0}
    ).sort("price", ASCENDING).skip(skip_count).limit(PAGE_SIZE)
    return list(cursor)

# Page 1
print("Page 1:")
for p in get_products_page(1):
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Page 2
print("\nPage 2:")
for p in get_products_page(2):
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Page 3 — only 1 product left
print("\nPage 3:")
for p in get_products_page(3):
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Total pages calculation
total     = db.products.count_documents({})
import math
total_pages = math.ceil(total / PAGE_SIZE)
print(f"\nTotal products: {total}  |  Page size: {PAGE_SIZE}  |  Total pages: {total_pages}")
Page 1:
$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse

Page 2:
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch

Page 3:
$349.99 Standing Desk

Total products: 7 | Page size: 3 | Total pages: 3
  • Skip formula: skip = (page_number - 1) * page_size
  • skip() is inefficient for very large offsets — MongoDB still scans all the skipped documents internally
  • For collections with thousands of pages, use keyset (cursor-based) pagination instead of skip()

Keyset Pagination — The Scalable Alternative to skip()

Keyset pagination uses the last seen value of the sort field as the starting point for the next page — instead of skipping. This is O(1) per page regardless of how deep into the dataset you are, whereas skip() gets slower with every page.

# Keyset pagination — O(1) per page, scales to millions of documents

from pymongo import MongoClient, ASCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

PAGE_SIZE = 3

def get_next_page(last_price=None, last_id=None):
    """
    Fetch the next page of products sorted by price ascending.
    Pass the price and _id of the last document on the previous page.
    """
    if last_price is None:
        # First page — no previous cursor
        filter_query = {}
    else:
        # Subsequent pages — start after the last seen document
        filter_query = {"$or": [
            {"price": {"$gt": last_price}},
            {"price": last_price, "_id": {"$gt": last_id}}
        ]}

    return list(
        db.products.find(
            filter_query,
            {"name": 1, "price": 1, "_id": 1}
        ).sort([("price", ASCENDING), ("_id", ASCENDING)]).limit(PAGE_SIZE)
    )

# Page 1
page = get_next_page()
print("Page 1 (keyset):")
for p in page:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Page 2 — pass last price and _id from page 1
last = page[-1]
page = get_next_page(last_price=last["price"], last_id=last["_id"])
print("\nPage 2 (keyset):")
for p in page:
    print(f"  ${p['price']:>7.2f}  {p['name']}")
Page 1 (keyset):
$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse

Page 2 (keyset):
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
  • Keyset pagination is always index-backed — each page query uses a range scan, not a full scan with offset
  • The _id tie-breaker in the filter ensures stable pagination when multiple documents share the same sort value
  • The trade-off: keyset pagination cannot jump to an arbitrary page number — it can only go forward from the current position

Cursor Method Chaining

All cursor methods — sort(), limit(), skip(), projection — chain together and the order in which you call them in Python does not affect the query execution order. MongoDB always processes filter → sort → skip → limit on the server, regardless of the Python call order.

# Cursor method chaining — order of calls does not affect execution order

from pymongo import MongoClient, DESCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# These two are identical — MongoDB executes filter → sort → skip → limit
# regardless of Python call order

query_a = db.products.find(
    {"category": "Electronics"},
    {"name": 1, "price": 1, "_id": 0}
).sort("price", DESCENDING).skip(1).limit(2)

query_b = db.products.find(
    {"category": "Electronics"},
    {"name": 1, "price": 1, "_id": 0}
).limit(2).skip(1).sort("price", DESCENDING)   # same result as query_a

print("Query A and B produce identical results:")
for p in query_a:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Full chain — filter + projection + sort + skip + limit
full_chain = (
    db.orders
    .find({"status": "delivered"}, {"_id": 1, "total": 1})
    .sort("total", DESCENDING)
    .skip(1)
    .limit(2)
)
print("\nDelivered orders — skip 1, next 2, sorted by total desc:")
for o in full_chain:
    print(f"  {o['_id']}  ${o['total']:.2f}")
Query A and B produce identical results:
$ 89.99 Mechanical Keyboard
$ 49.99 USB-C Hub

Delivered orders — skip 1, next 2, sorted by total desc:
o003 $99.98
o001 $44.96
  • Python call order on cursor methods does not matter — MongoDB always applies operations in its own fixed execution order
  • Write chains in the logical reading order: find().sort().skip().limit() — it is the clearest and most conventional
  • Each cursor method returns the cursor itself — allowing unlimited chaining

Summary Table

Method Purpose Key Detail Performance Note
sort(field, dir) Order results 1 = ASC, -1 = DESC Index the sort field for large collections
sort([(f,d), ...]) Multi-field sort List of tuples, applied left to right Compound index should match field order
limit(n) Cap result count Stops scanning at n docs Very efficient with an index
skip(n) Offset start position Always pair with sort Slow for large offsets — use keyset instead
Keyset pagination Scalable next-page Filter by last seen value O(1) per page — always index-backed

Practice Questions

Practice 1. Write the PyMongo cursor chain to return the top 3 products by rating descending, showing only name and rating.



Practice 2. What is the skip value for fetching page 4 with a page size of 5?



Practice 3. Why is keyset pagination faster than skip()-based pagination for large collections?



Practice 4. Does the order in which you chain sort(), skip(), and limit() in Python affect the results MongoDB returns?



Practice 5. What type does PyMongo expect for a multi-key sort argument?



Quiz

Quiz 1. What direction value returns documents from highest to lowest value?






Quiz 2. What happens when you call limit(0) on a cursor?






Quiz 3. In a multi-key sort, when does the second sort field come into effect?






Quiz 4. What is the main trade-off of keyset pagination compared to skip()-based pagination?






Quiz 5. Why should you always pair sort() with limit() when fetching top-N results?






Next up — Update Documents: Modifying existing documents with updateOne(), updateMany(), and powerful update operators.