Mango DBLesson 12 – Find Documents | Dataplexa

Find Documents

Reading data is the most frequent operation in almost every application. MongoDB provides two core read methods: find_one() for retrieving a single document and find() for retrieving multiple documents as a cursor. Both accept a filter to narrow results, a projection to control which fields are returned, and additional options for sorting, limiting, and skipping. This lesson covers all of these using the Dataplexa Store dataset so every example is immediately runnable against real data.

find_one() — Retrieving a Single Document

find_one() returns the first document that matches the filter, or None if no match is found. It is the right choice any time you expect exactly one result — looking up a user by ID, fetching a product by SKU, or checking whether a record exists.

Why it exists: fetching a full cursor for a single expected result is wasteful. find_one() tells MongoDB to stop scanning after the first match, making it faster and cleaner for single-document lookups.

# find_one() — retrieve a single document from the Dataplexa Store

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Fetch user Alice by her _id
alice = db.users.find_one({"_id": "u001"})
print("User:", alice["name"], "—", alice["city"], alice["country"])

# Fetch the first product in the Electronics category
first_electronic = db.users.find_one({"membership": "premium"})
print("First premium user:", first_electronic["name"])

# find_one() returns None when no match exists
missing = db.users.find_one({"_id": "u999"})
print("Missing user:", missing)

# Always guard against None before accessing fields
user = db.users.find_one({"_id": "u003"})
if user:
    print("Found:", user["name"])
else:
    print("No user found")

User: Alice Johnson — London UK
First premium user: Alice Johnson
Missing user: None
Found: Clara Diaz

find_one() returns a plain Python dict — access fields with standard dictionary syntax
Passing an empty dict {} returns the very first document in the collection — useful for a quick schema inspection
Always check for None before accessing fields on the result — a missing document will raise a TypeError otherwise
When multiple documents match, find_one() returns the first in natural order (insertion order on uncapped collections)

find() — Retrieving Multiple Documents

find() returns a cursor — a lazy iterator over all matching documents. The database does not send all results at once; it streams them in batches as you iterate. This means find() is safe to use even when a collection has millions of documents — only the documents you actually consume are transferred.

# find() — retrieve multiple documents as a cursor

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Fetch all users — empty filter matches every document
all_users = db.users.find({})

print("All users:")
for user in all_users:
    print(f"  {user['_id']} — {user['name']} ({user['country']})")

# Fetch all premium users
print("\nPremium users:")
premium = db.users.find({"membership": "premium"})
for user in premium:
    print(f"  {user['name']}")

# Convert cursor to a list — loads all results into memory
products_list = list(db.products.find({}))
print(f"\nTotal products in memory: {len(products_list)}")

All users:
u001 — Alice Johnson (UK)
u002 — Bob Smith (UK)
u003 — Clara Diaz (Spain)
u004 — David Lee (USA)
u005 — Eva Müller (Germany)

Premium users:
Alice Johnson
Clara Diaz
Eva Müller

Total products in memory: 7

A cursor is lazy — it does not execute the query until you start iterating
Use list(cursor) to load all results into memory — fine for small result sets, dangerous for large collections
Iterate the cursor directly with a for loop for memory-efficient processing of large result sets
A cursor can only be iterated once — if you need to iterate again, re-run the query

Projections — Controlling Which Fields Are Returned

By default MongoDB returns the entire document including every field. A projection tells MongoDB exactly which fields to include or exclude — reducing network transfer, memory usage, and the amount of data your application needs to process.

Why it matters: returning only the fields you need is one of the simplest performance improvements available. A user list that needs only names and emails should never fetch addresses, tags, and order history.

# Projections — include or exclude specific fields

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# ── INCLUSION projection — specify fields to return ──────
# 1 = include, _id is included by default unless excluded
print("Inclusion projection — name and email only:")
users = db.users.find({}, {"name": 1, "email": 1})
for u in users:
    print(f"  {u}")

# ── EXCLUSION projection — specify fields to hide ────────
# 0 = exclude
print("\nExclusion projection — hide tags and joined:")
users = db.users.find({}, {"tags": 0, "joined": 0})
for u in users:
    print(f"  {u}")

# ── Exclude _id from an inclusion projection ─────────────
print("\nInclusion without _id:")
products = db.products.find(
    {"category": "Electronics"},
    {"name": 1, "price": 1, "_id": 0}   # _id: 0 explicitly excluded
)
for p in products:
    print(f"  {p}")

Inclusion projection — name and email only:
{'_id': 'u001', 'name': 'Alice Johnson', 'email': 'alice@example.com'}
{'_id': 'u002', 'name': 'Bob Smith', 'email': 'bob@example.com'}
{'_id': 'u003', 'name': 'Clara Diaz', 'email': 'clara@example.com'}
{'_id': 'u004', 'name': 'David Lee', 'email': 'david@example.com'}
{'_id': 'u005', 'name': 'Eva Müller', 'email': 'eva@example.com'}

Exclusion projection — hide tags and joined:
{'_id': 'u001', 'name': 'Alice Johnson', 'email': 'alice@example.com', 'age': 30, ...}

Inclusion without _id:
{'name': 'Wireless Mouse', 'price': 29.99}
{'name': 'Mechanical Keyboard', 'price': 89.99}
{'name': 'USB-C Hub', 'price': 49.99}
{'name': 'Monitor 27-inch', 'price': 299.99}

You cannot mix inclusion and exclusion in the same projection — except for _id which can always be excluded from an inclusion projection
Inclusion projection: list the fields you want — everything else is hidden
Exclusion projection: list the fields you do not want — everything else is returned
Always exclude sensitive fields like passwords and tokens in API responses using projection

Sorting Results

The sort() method orders the results of a find() query. Pass a field name and a direction — 1 for ascending, -1 for descending. You can sort by multiple fields.

# Sorting — order results by one or more fields

from pymongo import MongoClient, ASCENDING, DESCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Sort products by price ascending (cheapest first)
print("Products cheapest first:")
cheap_first = db.products.find({}, {"name": 1, "price": 1, "_id": 0}).sort("price", ASCENDING)
for p in cheap_first:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Sort products by price descending (most expensive first)
print("\nProducts most expensive first:")
expensive_first = db.products.find({}, {"name": 1, "price": 1, "_id": 0}).sort("price", DESCENDING)
for p in expensive_first:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Multi-field sort — category ascending, then price descending within each category
print("\nBy category A-Z, then price high to low:")
multi_sort = db.products.find(
    {}, {"name": 1, "category": 1, "price": 1, "_id": 0}
).sort([("category", ASCENDING), ("price", DESCENDING)])
for p in multi_sort:
    print(f"  {p['category']:12} ${p['price']:>7.2f}  {p['name']}")

Products cheapest first:
$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
$349.99 Standing Desk

Products most expensive first:
$349.99 Standing Desk
$299.99 Monitor 27-inch
$ 89.99 Mechanical Keyboard
...

By category A-Z, then price high to low:
Electronics $299.99 Monitor 27-inch
Electronics $ 89.99 Mechanical Keyboard
Electronics $ 49.99 USB-C Hub
Electronics $ 29.99 Wireless Mouse
Furniture $349.99 Standing Desk
Stationery $ 4.99 Notebook A5
Stationery $ 3.49 Ballpoint Pens 10-pack

Use the ASCENDING and DESCENDING constants from pymongo rather than raw 1 and -1 for readability
Multi-field sort requires a list of tuples — order matters, fields are sorted left to right
Sorting without an index on the sort field performs an in-memory sort — add an index for large collections

Limiting and Skipping Results

limit() caps the number of documents returned. skip() skips over a number of documents before returning results. Together they implement pagination — essential for any API or UI that shows data in pages.

# limit() and skip() — pagination

from pymongo import MongoClient, ASCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

page_size = 3

# Page 1 — first 3 products sorted by price
page_1 = db.products.find(
    {}, {"name": 1, "price": 1, "_id": 0}
).sort("price", ASCENDING).limit(page_size).skip(0)

print("Page 1 (products 1–3):")
for p in page_1:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Page 2 — next 3 products
page_2 = db.products.find(
    {}, {"name": 1, "price": 1, "_id": 0}
).sort("price", ASCENDING).limit(page_size).skip(page_size)

print("\nPage 2 (products 4–6):")
for p in page_2:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Pagination helper function
def get_page(collection, filter_query, sort_field, page_num, page_size=3):
    skip_count = (page_num - 1) * page_size
    return list(
        collection.find(filter_query)
        .sort(sort_field, ASCENDING)
        .skip(skip_count)
        .limit(page_size)
    )

page = get_page(db.products, {}, "price", page_num=1)
print(f"\nPagination helper — page 1: {len(page)} products")

Page 1 (products 1–3):
$ 3.49 Ballpoint Pens 10-pack
$ 4.99 Notebook A5
$ 29.99 Wireless Mouse

Page 2 (products 4–6):
$ 49.99 USB-C Hub
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch

Pagination helper — page 1: 3 products

Always combine skip() with sort() — without a stable sort order, pages may return inconsistent results
skip() is inefficient on very large collections — MongoDB still scans the skipped documents. For deep pagination, use range-based cursors with _id or a timestamp field instead
Page formula: skip = (page_number - 1) * page_size

Counting Documents

Use count_documents() to count matching documents. Pass an empty filter {} to count all documents in a collection.

# count_documents() — counting results

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Total documents per collection
print("Collection counts:")
print("  users:   ", db.users.count_documents({}))
print("  products:", db.products.count_documents({}))
print("  orders:  ", db.orders.count_documents({}))
print("  reviews: ", db.reviews.count_documents({}))

# Count with a filter
premium_count   = db.users.count_documents({"membership": "premium"})
delivered_count = db.orders.count_documents({"status": "delivered"})
expensive_count = db.products.count_documents({"price": {"$gt": 100}})

print(f"\nPremium users:       {premium_count}")
print(f"Delivered orders:    {delivered_count}")
print(f"Products over $100:  {expensive_count}")

Collection counts:
users: 5
products: 7
orders: 7
reviews: 5

Premium users: 3
Delivered orders: 4
Products over $100: 2

count_documents({}) is accurate — it counts every matching document including those not yet in the cache
Avoid the deprecated count() method on a cursor — always use count_documents() on the collection
For a very fast approximate count of all documents, use estimated_document_count() — it reads collection metadata rather than scanning

Summary Table

Method	Returns	Use When	Key Options
`find_one(filter)`	dict or None	Expecting one result	projection
`find(filter)`	Cursor (iterator)	Multiple results	projection, sort, limit, skip
`.sort(field, dir)`	Cursor	Ordering results	ASCENDING / DESCENDING
`.limit(n)`	Cursor	Cap result count	—
`.skip(n)`	Cursor	Pagination offset	Always pair with sort
`count_documents(filter)`	int	Counting matches	Use `{}` for all documents

Practice Questions

Practice 1. What does find_one() return when no document matches the filter?

Practice 2. What is a MongoDB cursor and why is it more memory-efficient than returning a list?

Practice 3. Write the projection to return only the name and price fields from a products query, excluding _id.

Practice 4. What is the skip value formula for fetching page 3 of results with a page size of 10?

Practice 5. What is the difference between count_documents({}) and estimated_document_count()?

Quiz

Quiz 1. What does passing an empty filter {} to find() do?

Returns all documents in the collection
Returns no documents
Returns only documents with no fields
Raises a ValueError

Quiz 2. Why can you not mix inclusion and exclusion in the same projection?

MongoDB only allows one mode per projection — either list fields to include or fields to exclude, not both (except _id)
It would cause a network error
Inclusion and exclusion cancel each other out
PyMongo silently ignores exclusions in inclusion projections

Quiz 3. What sort direction value makes MongoDB return results from highest to lowest?

DESCENDING (-1)
ASCENDING (1)
DESC (0)
REVERSE (2)

Quiz 4. Why is skip() inefficient on very large collections?

MongoDB still scans and discards the skipped documents — for deep pagination use range-based cursors instead
skip() loads all skipped documents into memory
skip() is not supported on indexed collections
skip() requires a sort to be specified first

Quiz 5. How many times can a MongoDB cursor be iterated?

Once — to iterate again you must re-run the query
Unlimited times — cursors reset automatically
Twice — once forward and once in reverse
Only if converted to a list first

Next up — Query Filters: using comparison, logical, and element operators to pinpoint exactly the documents you need.

← Previous Course Index Next →