MongoDB
Find Documents
Reading data is the most frequent operation in almost every application. MongoDB provides two core read methods: find_one() for retrieving a single document and find() for retrieving multiple documents as a cursor. Both accept a filter to narrow results, a projection to control which fields are returned, and additional options for sorting, limiting, and skipping. This lesson covers all of these using the Dataplexa Store dataset so every example is immediately runnable against real data.
find_one() — Retrieving a Single Document
find_one() returns the first document that matches the filter, or None if no match is found. It is the right choice any time you expect exactly one result — looking up a user by ID, fetching a product by SKU, or checking whether a record exists.
Why it exists: fetching a full cursor for a single expected result is wasteful. find_one() tells MongoDB to stop scanning after the first match, making it faster and cleaner for single-document lookups.
# find_one() — retrieve a single document from the Dataplexa Store
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Fetch user Alice by her _id
alice = db.users.find_one({"_id": "u001"})
print("User:", alice["name"], "—", alice["city"], alice["country"])
# Fetch the first product in the Electronics category
first_electronic = db.users.find_one({"membership": "premium"})
print("First premium user:", first_electronic["name"])
# find_one() returns None when no match exists
missing = db.users.find_one({"_id": "u999"})
print("Missing user:", missing)
# Always guard against None before accessing fields
user = db.users.find_one({"_id": "u003"})
if user:
print("Found:", user["name"])
else:
print("No user found")First premium user: Alice Johnson
Missing user: None
Found: Clara Diaz
find_one()returns a plain Pythondict— access fields with standard dictionary syntax- Passing an empty dict
{}returns the very first document in the collection — useful for a quick schema inspection - Always check for
Nonebefore accessing fields on the result — a missing document will raise aTypeErrorotherwise - When multiple documents match,
find_one()returns the first in natural order (insertion order on uncapped collections)
find() — Retrieving Multiple Documents
find() returns a cursor — a lazy iterator over all matching documents. The database does not send all results at once; it streams them in batches as you iterate. This means find() is safe to use even when a collection has millions of documents — only the documents you actually consume are transferred.
# find() — retrieve multiple documents as a cursor
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Fetch all users — empty filter matches every document
all_users = db.users.find({})
print("All users:")
for user in all_users:
print(f" {user['_id']} — {user['name']} ({user['country']})")
# Fetch all premium users
print("\nPremium users:")
premium = db.users.find({"membership": "premium"})
for user in premium:
print(f" {user['name']}")
# Convert cursor to a list — loads all results into memory
products_list = list(db.products.find({}))
print(f"\nTotal products in memory: {len(products_list)}")u001 — Alice Johnson (UK)
u002 — Bob Smith (UK)
u003 — Clara Diaz (Spain)
u004 — David Lee (USA)
u005 — Eva Müller (Germany)
Premium users:
Alice Johnson
Clara Diaz
Eva Müller
Total products in memory: 7
- A cursor is lazy — it does not execute the query until you start iterating
- Use
list(cursor)to load all results into memory — fine for small result sets, dangerous for large collections - Iterate the cursor directly with a
forloop for memory-efficient processing of large result sets - A cursor can only be iterated once — if you need to iterate again, re-run the query
Projections — Controlling Which Fields Are Returned
By default MongoDB returns the entire document including every field. A projection tells MongoDB exactly which fields to include or exclude — reducing network transfer, memory usage, and the amount of data your application needs to process.
Why it matters: returning only the fields you need is one of the simplest performance improvements available. A user list that needs only names and emails should never fetch addresses, tags, and order history.
# Projections — include or exclude specific fields
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# ── INCLUSION projection — specify fields to return ──────
# 1 = include, _id is included by default unless excluded
print("Inclusion projection — name and email only:")
users = db.users.find({}, {"name": 1, "email": 1})
for u in users:
print(f" {u}")
# ── EXCLUSION projection — specify fields to hide ────────
# 0 = exclude
print("\nExclusion projection — hide tags and joined:")
users = db.users.find({}, {"tags": 0, "joined": 0})
for u in users:
print(f" {u}")
# ── Exclude _id from an inclusion projection ─────────────
print("\nInclusion without _id:")
products = db.products.find(
{"category": "Electronics"},
{"name": 1, "price": 1, "_id": 0} # _id: 0 explicitly excluded
)
for p in products:
print(f" {p}"){'_id': 'u001', 'name': 'Alice Johnson', 'email': 'alice@example.com'}
{'_id': 'u002', 'name': 'Bob Smith', 'email': 'bob@example.com'}
{'_id': 'u003', 'name': 'Clara Diaz', 'email': 'clara@example.com'}
{'_id': 'u004', 'name': 'David Lee', 'email': 'david@example.com'}
{'_id': 'u005', 'name': 'Eva Müller', 'email': 'eva@example.com'}
Exclusion projection — hide tags and joined:
{'_id': 'u001', 'name': 'Alice Johnson', 'email': 'alice@example.com', 'age': 30, ...}
Inclusion without _id:
{'name': 'Wireless Mouse', 'price': 29.99}
{'name': 'Mechanical Keyboard', 'price': 89.99}
{'name': 'USB-C Hub', 'price': 49.99}
{'name': 'Monitor 27-inch', 'price': 299.99}
- You cannot mix inclusion and exclusion in the same projection — except for
_idwhich can always be excluded from an inclusion projection - Inclusion projection: list the fields you want — everything else is hidden
- Exclusion projection: list the fields you do not want — everything else is returned
- Always exclude sensitive fields like passwords and tokens in API responses using projection
Sorting Results
The sort() method orders the results of a find() query. Pass a field name and a direction — 1 for ascending, -1 for descending. You can sort by multiple fields.
# Sorting — order results by one or more fields
from pymongo import MongoClient, ASCENDING, DESCENDING
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Sort products by price ascending (cheapest first)
print("Products cheapest first:")
cheap_first = db.products.find({}, {"name": 1, "price": 1, "_id": 0}).sort("price", ASCENDING)
for p in cheap_first:
print(f" ${p['price']:>7.2f} {p['name']}")
# Sort products by price descending (most expensive first)
print("\nProducts most expensive first:")
expensive_first = db.products.find({}, {"name": 1, "price": 1, "_id": 0}).sort("price", DESCENDING)
for p in expensive_first:
print(f" ${p['price']:>7.2f} {p['name']}")
# Multi-field sort — category ascending, then price descending within each category
print("\nBy category A-Z, then price high to low:")
multi_sort = db.products.find(
{}, {"name": 1, "category": 1, "price": 1, "_id": 0}
).sort([("category", ASCENDING), ("price", DESCENDING)])
for p in multi_sort:
print(f" {p['category']:12} ${p['price']:>7.2f} {p['name']}")$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
$349.99 Standing Desk
Products most expensive first:
$349.99 Standing Desk
$299.99 Monitor 27-inch
$ 89.99 Mechanical Keyboard
...
By category A-Z, then price high to low:
Electronics $299.99 Monitor 27-inch
Electronics $ 89.99 Mechanical Keyboard
Electronics $ 49.99 USB-C Hub
Electronics $ 29.99 Wireless Mouse
Furniture $349.99 Standing Desk
Stationery $ 4.99 Notebook A5
Stationery $ 3.49 Ballpoint Pens 10-pack
- Use the
ASCENDINGandDESCENDINGconstants frompymongorather than raw1and-1for readability - Multi-field sort requires a list of tuples — order matters, fields are sorted left to right
- Sorting without an index on the sort field performs an in-memory sort — add an index for large collections
Limiting and Skipping Results
limit() caps the number of documents returned. skip() skips over a number of documents before returning results. Together they implement pagination — essential for any API or UI that shows data in pages.
# limit() and skip() — pagination
from pymongo import MongoClient, ASCENDING
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
page_size = 3
# Page 1 — first 3 products sorted by price
page_1 = db.products.find(
{}, {"name": 1, "price": 1, "_id": 0}
).sort("price", ASCENDING).limit(page_size).skip(0)
print("Page 1 (products 1–3):")
for p in page_1:
print(f" ${p['price']:>7.2f} {p['name']}")
# Page 2 — next 3 products
page_2 = db.products.find(
{}, {"name": 1, "price": 1, "_id": 0}
).sort("price", ASCENDING).limit(page_size).skip(page_size)
print("\nPage 2 (products 4–6):")
for p in page_2:
print(f" ${p['price']:>7.2f} {p['name']}")
# Pagination helper function
def get_page(collection, filter_query, sort_field, page_num, page_size=3):
skip_count = (page_num - 1) * page_size
return list(
collection.find(filter_query)
.sort(sort_field, ASCENDING)
.skip(skip_count)
.limit(page_size)
)
page = get_page(db.products, {}, "price", page_num=1)
print(f"\nPagination helper — page 1: {len(page)} products")$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse
Page 2 (products 4–6):
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
Pagination helper — page 1: 3 products
- Always combine
skip()withsort()— without a stable sort order, pages may return inconsistent results skip()is inefficient on very large collections — MongoDB still scans the skipped documents. For deep pagination, use range-based cursors with_idor a timestamp field instead- Page formula:
skip = (page_number - 1) * page_size
Counting Documents
Use count_documents() to count matching documents. Pass an empty filter {} to count all documents in a collection.
# count_documents() — counting results
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Total documents per collection
print("Collection counts:")
print(" users: ", db.users.count_documents({}))
print(" products:", db.products.count_documents({}))
print(" orders: ", db.orders.count_documents({}))
print(" reviews: ", db.reviews.count_documents({}))
# Count with a filter
premium_count = db.users.count_documents({"membership": "premium"})
delivered_count = db.orders.count_documents({"status": "delivered"})
expensive_count = db.products.count_documents({"price": {"$gt": 100}})
print(f"\nPremium users: {premium_count}")
print(f"Delivered orders: {delivered_count}")
print(f"Products over $100: {expensive_count}")users: 5
products: 7
orders: 7
reviews: 5
Premium users: 3
Delivered orders: 4
Products over $100: 2
count_documents({})is accurate — it counts every matching document including those not yet in the cache- Avoid the deprecated
count()method on a cursor — always usecount_documents()on the collection - For a very fast approximate count of all documents, use
estimated_document_count()— it reads collection metadata rather than scanning
Summary Table
| Method | Returns | Use When | Key Options |
|---|---|---|---|
find_one(filter) |
dict or None | Expecting one result | projection |
find(filter) |
Cursor (iterator) | Multiple results | projection, sort, limit, skip |
.sort(field, dir) |
Cursor | Ordering results | ASCENDING / DESCENDING |
.limit(n) |
Cursor | Cap result count | — |
.skip(n) |
Cursor | Pagination offset | Always pair with sort |
count_documents(filter) |
int | Counting matches | Use {} for all documents |
Practice Questions
Practice 1. What does find_one() return when no document matches the filter?
Practice 2. What is a MongoDB cursor and why is it more memory-efficient than returning a list?
Practice 3. Write the projection to return only the name and price fields from a products query, excluding _id.
Practice 4. What is the skip value formula for fetching page 3 of results with a page size of 10?
Practice 5. What is the difference between count_documents({}) and estimated_document_count()?
Quiz
Quiz 1. What does passing an empty filter {} to find() do?
Quiz 2. Why can you not mix inclusion and exclusion in the same projection?
Quiz 3. What sort direction value makes MongoDB return results from highest to lowest?
Quiz 4. Why is skip() inefficient on very large collections?
Quiz 5. How many times can a MongoDB cursor be iterated?
Next up — Query Filters: using comparison, logical, and element operators to pinpoint exactly the documents you need.