MongoDB
Sorting & Limiting
Retrieving documents in a useful order and capping the number returned are two of the most fundamental query controls in MongoDB. Without sorting, documents come back in natural insertion order — fine for debugging, useless for a product listing or a leaderboard. Without limiting, a single query against a large collection can return millions of documents and bring your application to its knees. sort(), limit(), and skip() work as a chain on a cursor and compose cleanly with filters and projections to build precise, efficient read operations.
This lesson goes deeper than the brief introduction in Lesson 12 — covering multi-key sorting, sort stability, index-backed sorts, cursor method chaining, and the performance implications of each approach, all using the Dataplexa Store dataset.
sort() — Ordering Results
sort() orders the documents returned by a query. Pass a field name and a direction: 1 for ascending (A→Z, 0→9, oldest→newest), -1 for descending (Z→A, 9→0, newest→oldest). You can import the constants ASCENDING and DESCENDING from PyMongo to make code more readable.
Why it matters: users expect product catalogues sorted by price, activity feeds sorted by recency, and leaderboards sorted by score. Sorting in the database is always faster than fetching all documents and sorting in Python.
# sort() — ascending and descending on a single field
from pymongo import MongoClient, ASCENDING, DESCENDING
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Ascending — cheapest products first
print("Products cheapest first:")
results = db.products.find(
{}, {"name": 1, "price": 1, "_id": 0}
).sort("price", ASCENDING)
for p in results:
print(f" ${p['price']:>7.2f} {p['name']}")
# Descending — highest-rated products first
print("\nProducts highest-rated first:")
results = db.products.find(
{}, {"name": 1, "rating": 1, "_id": 0}
).sort("rating", DESCENDING)
for p in results:
print(f" {p['rating']} {p['name']}")$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
$349.99 Standing Desk
Products highest-rated first:
4.8 Standing Desk
4.7 Mechanical Keyboard
4.6 Monitor 27-inch
4.5 Wireless Mouse
4.3 USB-C Hub
4.2 Notebook A5
4.0 Ballpoint Pens 10-pack
- Use
ASCENDINGandDESCENDINGconstants — more readable than raw1and-1 - Sorting without an index on the sort field forces MongoDB to load all matching documents into memory and sort them there — always index high-traffic sort fields
- String sorting follows UTF-8 byte order by default — case-sensitive and locale-dependent
Multi-Key Sorting
Pass a list of (field, direction) tuples to sort by multiple fields. MongoDB applies the sort fields left to right — the second field only comes into play when two documents have equal values in the first field.
# Multi-key sort — sort by category first, then by price within each category
from pymongo import MongoClient, ASCENDING, DESCENDING
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Category A→Z, then price low→high within each category
print("Products by category (A-Z), then price (low-high):")
results = db.products.find(
{}, {"name": 1, "category": 1, "price": 1, "_id": 0}
).sort([
("category", ASCENDING),
("price", ASCENDING)
])
current_cat = None
for p in results:
if p["category"] != current_cat:
current_cat = p["category"]
print(f"\n [{current_cat}]")
print(f" ${p['price']:>7.2f} {p['name']}")
# Orders by status A→Z, then total highest→lowest within status
print("\n\nOrders by status (A-Z), then total (high-low):")
results = db.orders.find(
{}, {"_id": 1, "status": 1, "total": 1}
).sort([
("status", ASCENDING),
("total", DESCENDING)
])
for o in results:
print(f" {o['_id']} {o['status']:12} ${o['total']:.2f}")[Electronics]
$ 29.99 Wireless Mouse
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
[Furniture]
$349.99 Standing Desk
[Stationery]
$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
Orders by status (A-Z), then total (high-low):
o006 cancelled $89.99
o001 delivered $44.96
o003 delivered $99.98
o005 delivered $329.98
o007 delivered $11.97
o004 processing $349.99
o002 shipped $89.99
- Multi-key sort requires a list of tuples — a plain dict does not preserve order reliably in older Python versions
- The sort is applied in order: first by the first field, then by the second field for ties, and so on
- A compound index on
(category, price)makes this sort index-backed and very fast — both field order and direction should match the index
limit() — Capping the Result Count
limit(n) tells MongoDB to stop returning documents after n have been sent. It reduces both network transfer and server-side work — MongoDB stops scanning as soon as the limit is reached (especially when combined with an index).
# limit() — cap results to the top N documents
from pymongo import MongoClient, DESCENDING
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Top 3 most expensive products
print("Top 3 most expensive products:")
results = db.products.find(
{}, {"name": 1, "price": 1, "_id": 0}
).sort("price", DESCENDING).limit(3)
for p in results:
print(f" ${p['price']:>7.2f} {p['name']}")
# Top 3 highest-rated Electronics
print("\nTop 3 highest-rated Electronics:")
results = db.products.find(
{"category": "Electronics"},
{"name": 1, "rating": 1, "_id": 0}
).sort("rating", DESCENDING).limit(3)
for p in results:
print(f" rating: {p['rating']} {p['name']}")
# Most recent 2 delivered orders (sorted by date descending)
print("\n2 most recent delivered orders:")
results = db.orders.find(
{"status": "delivered"},
{"_id": 1, "date": 1, "total": 1}
).sort("date", DESCENDING).limit(2)
for o in results:
print(f" {o['_id']} date: {o['date']} ${o['total']:.2f}")$349.99 Standing Desk
$299.99 Monitor 27-inch
$ 89.99 Mechanical Keyboard
Top 3 highest-rated Electronics:
rating: 4.7 Mechanical Keyboard
rating: 4.6 Monitor 27-inch
rating: 4.5 Wireless Mouse
2 most recent delivered orders:
o007 date: 2024-04-01 $11.97
o005 date: 2024-03-10 $329.98
- Always pair
limit()withsort()— without a stable sort the returned documents are non-deterministic limit(0)is equivalent to no limit — all documents are returned- MongoDB stops scanning immediately when the limit is reached if the sort field is indexed — making top-N queries very efficient
skip() — Offsetting the Start Position
skip(n) skips over the first n documents before starting to return results. Combined with limit() it implements offset-based pagination — the most common approach for "previous / next page" UIs.
# skip() + limit() — offset-based pagination
from pymongo import MongoClient, ASCENDING
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
PAGE_SIZE = 3
def get_products_page(page: int):
"""Return one page of products sorted by price ascending."""
skip_count = (page - 1) * PAGE_SIZE
cursor = db.products.find(
{}, {"name": 1, "price": 1, "_id": 0}
).sort("price", ASCENDING).skip(skip_count).limit(PAGE_SIZE)
return list(cursor)
# Page 1
print("Page 1:")
for p in get_products_page(1):
print(f" ${p['price']:>7.2f} {p['name']}")
# Page 2
print("\nPage 2:")
for p in get_products_page(2):
print(f" ${p['price']:>7.2f} {p['name']}")
# Page 3 — only 1 product left
print("\nPage 3:")
for p in get_products_page(3):
print(f" ${p['price']:>7.2f} {p['name']}")
# Total pages calculation
total = db.products.count_documents({})
import math
total_pages = math.ceil(total / PAGE_SIZE)
print(f"\nTotal products: {total} | Page size: {PAGE_SIZE} | Total pages: {total_pages}")$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse
Page 2:
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
Page 3:
$349.99 Standing Desk
Total products: 7 | Page size: 3 | Total pages: 3
- Skip formula:
skip = (page_number - 1) * page_size skip()is inefficient for very large offsets — MongoDB still scans all the skipped documents internally- For collections with thousands of pages, use keyset (cursor-based) pagination instead of
skip()
Keyset Pagination — The Scalable Alternative to skip()
Keyset pagination uses the last seen value of the sort field as the starting point for the next page — instead of skipping. This is O(1) per page regardless of how deep into the dataset you are, whereas skip() gets slower with every page.
# Keyset pagination — O(1) per page, scales to millions of documents
from pymongo import MongoClient, ASCENDING
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
PAGE_SIZE = 3
def get_next_page(last_price=None, last_id=None):
"""
Fetch the next page of products sorted by price ascending.
Pass the price and _id of the last document on the previous page.
"""
if last_price is None:
# First page — no previous cursor
filter_query = {}
else:
# Subsequent pages — start after the last seen document
filter_query = {"$or": [
{"price": {"$gt": last_price}},
{"price": last_price, "_id": {"$gt": last_id}}
]}
return list(
db.products.find(
filter_query,
{"name": 1, "price": 1, "_id": 1}
).sort([("price", ASCENDING), ("_id", ASCENDING)]).limit(PAGE_SIZE)
)
# Page 1
page = get_next_page()
print("Page 1 (keyset):")
for p in page:
print(f" ${p['price']:>7.2f} {p['name']}")
# Page 2 — pass last price and _id from page 1
last = page[-1]
page = get_next_page(last_price=last["price"], last_id=last["_id"])
print("\nPage 2 (keyset):")
for p in page:
print(f" ${p['price']:>7.2f} {p['name']}")$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse
Page 2 (keyset):
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
- Keyset pagination is always index-backed — each page query uses a range scan, not a full scan with offset
- The
_idtie-breaker in the filter ensures stable pagination when multiple documents share the same sort value - The trade-off: keyset pagination cannot jump to an arbitrary page number — it can only go forward from the current position
Cursor Method Chaining
All cursor methods — sort(), limit(), skip(), projection — chain together and the order in which you call them in Python does not affect the query execution order. MongoDB always processes filter → sort → skip → limit on the server, regardless of the Python call order.
# Cursor method chaining — order of calls does not affect execution order
from pymongo import MongoClient, DESCENDING
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# These two are identical — MongoDB executes filter → sort → skip → limit
# regardless of Python call order
query_a = db.products.find(
{"category": "Electronics"},
{"name": 1, "price": 1, "_id": 0}
).sort("price", DESCENDING).skip(1).limit(2)
query_b = db.products.find(
{"category": "Electronics"},
{"name": 1, "price": 1, "_id": 0}
).limit(2).skip(1).sort("price", DESCENDING) # same result as query_a
print("Query A and B produce identical results:")
for p in query_a:
print(f" ${p['price']:>7.2f} {p['name']}")
# Full chain — filter + projection + sort + skip + limit
full_chain = (
db.orders
.find({"status": "delivered"}, {"_id": 1, "total": 1})
.sort("total", DESCENDING)
.skip(1)
.limit(2)
)
print("\nDelivered orders — skip 1, next 2, sorted by total desc:")
for o in full_chain:
print(f" {o['_id']} ${o['total']:.2f}")$ 89.99 Mechanical Keyboard
$ 49.99 USB-C Hub
Delivered orders — skip 1, next 2, sorted by total desc:
o003 $99.98
o001 $44.96
- Python call order on cursor methods does not matter — MongoDB always applies operations in its own fixed execution order
- Write chains in the logical reading order:
find().sort().skip().limit()— it is the clearest and most conventional - Each cursor method returns the cursor itself — allowing unlimited chaining
Summary Table
| Method | Purpose | Key Detail | Performance Note |
|---|---|---|---|
sort(field, dir) |
Order results | 1 = ASC, -1 = DESC | Index the sort field for large collections |
sort([(f,d), ...]) |
Multi-field sort | List of tuples, applied left to right | Compound index should match field order |
limit(n) |
Cap result count | Stops scanning at n docs | Very efficient with an index |
skip(n) |
Offset start position | Always pair with sort | Slow for large offsets — use keyset instead |
| Keyset pagination | Scalable next-page | Filter by last seen value | O(1) per page — always index-backed |
Practice Questions
Practice 1. Write the PyMongo cursor chain to return the top 3 products by rating descending, showing only name and rating.
Practice 2. What is the skip value for fetching page 4 with a page size of 5?
Practice 3. Why is keyset pagination faster than skip()-based pagination for large collections?
Practice 4. Does the order in which you chain sort(), skip(), and limit() in Python affect the results MongoDB returns?
Practice 5. What type does PyMongo expect for a multi-key sort argument?
Quiz
Quiz 1. What direction value returns documents from highest to lowest value?
Quiz 2. What happens when you call limit(0) on a cursor?
Quiz 3. In a multi-key sort, when does the second sort field come into effect?
Quiz 4. What is the main trade-off of keyset pagination compared to skip()-based pagination?
Quiz 5. Why should you always pair sort() with limit() when fetching top-N results?
Next up — Update Documents: Modifying existing documents with updateOne(), updateMany(), and powerful update operators.