MongoDB
Projection
Every MongoDB query returns full documents by default — every field, every nested object, every array element. In a small dataset that is fine. In production, returning fields your application never uses wastes network bandwidth, inflates memory usage, and exposes data that should stay server-side. Projection is the second argument to find() and find_one() — a dictionary that tells MongoDB exactly which fields to include or exclude from the result. Getting projection right is one of the simplest and most impactful query optimisations available.
This lesson covers inclusion projection, exclusion projection, projecting nested fields and array elements, and the $slice and $elemMatch projection operators — all against the Dataplexa Store dataset.
Inclusion Projection — Return Only What You Need
An inclusion projection lists the fields you want MongoDB to return. Every field not listed is hidden. Set each wanted field to 1. The _id field is included by default — suppress it explicitly with "_id": 0 if you do not need it.
Why it matters: a user list page that only needs names and emails should never receive addresses, tags, membership tiers, and join dates. Inclusion projection enforces that discipline at the database layer.
# Inclusion projection — return only specified fields
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Return only name and email — _id included by default
print("Name and email only (with _id):")
users = db.users.find({}, {"name": 1, "email": 1})
for u in users:
print(f" {u}")
# Suppress _id — clean output for an API response
print("\nName and email — no _id:")
users = db.users.find({}, {"name": 1, "email": 1, "_id": 0})
for u in users:
print(f" {u}")
# Product listing — name, price, rating only
print("\nProduct listing fields:")
products = db.products.find(
{"category": "Electronics"},
{"name": 1, "price": 1, "rating": 1, "_id": 0}
)
for p in products:
print(f" {p}"){'_id': 'u001', 'name': 'Alice Johnson', 'email': 'alice@example.com'}
{'_id': 'u002', 'name': 'Bob Smith', 'email': 'bob@example.com'}
{'_id': 'u003', 'name': 'Clara Diaz', 'email': 'clara@example.com'}
{'_id': 'u004', 'name': 'David Lee', 'email': 'david@example.com'}
{'_id': 'u005', 'name': 'Eva Müller', 'email': 'eva@example.com'}
Name and email — no _id:
{'name': 'Alice Johnson', 'email': 'alice@example.com'}
{'name': 'Bob Smith', 'email': 'bob@example.com'}
...
Product listing fields:
{'name': 'Wireless Mouse', 'price': 29.99, 'rating': 4.5}
{'name': 'Mechanical Keyboard', 'price': 89.99, 'rating': 4.7}
{'name': 'USB-C Hub', 'price': 49.99, 'rating': 4.3}
{'name': 'Monitor 27-inch', 'price': 299.99, 'rating': 4.6}
- Inclusion projection: list exactly the fields you want —
1means include _idis the only field you can mix modes for — it can be excluded ("_id": 0) inside an inclusion projection- Fields not listed are completely absent from the returned document — not null, not empty, simply not there
- Use inclusion projection when you know exactly which fields you need — the most common and recommended approach
Exclusion Projection — Hide Specific Fields
An exclusion projection lists fields you want to hide. Every other field is returned. Set each unwanted field to 0. This is useful when a document has many fields and you only want to remove one or two sensitive or bulky ones.
# Exclusion projection — hide specific fields
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Hide tags and joined — return everything else
print("Users without tags and joined fields:")
users = db.users.find({}, {"tags": 0, "joined": 0})
for u in users:
print(f" {u}")
# Hide internal/sensitive fields in a hypothetical users collection
# that had a password_hash field — never send it to the client
print("\nProducts without stock field (stock is internal):")
products = db.products.find(
{},
{"stock": 0, "_id": 0}
)
for p in products:
print(f" {p}"){'_id': 'u001', 'name': 'Alice Johnson', 'email': 'alice@example.com', 'age': 30, 'city': 'London', 'country': 'UK', 'membership': 'premium'}
{'_id': 'u002', 'name': 'Bob Smith', 'email': 'bob@example.com', 'age': 25, 'city': 'Manchester', 'country': 'UK', 'membership': 'basic'}
...
Products without stock field:
{'name': 'Wireless Mouse', 'category': 'Electronics', 'brand': 'TechCore', 'price': 29.99, 'rating': 4.5, 'tags': ['wireless', 'bestseller']}
...
- Exclusion projection: set each field to hide to
0— everything else comes back - You cannot mix inclusion and exclusion in the same projection — the only exception is
"_id": 0in an inclusion projection - Exclusion is best when a document has many fields and you only want to strip one or two — if you need most fields, exclude the few you do not want
- A common production pattern is to always exclude
password_hash,internal_notes, and similar sensitive fields using exclusion projection
Projecting Nested Fields with Dot Notation
Use dot notation to project specific fields inside embedded sub-documents. This is particularly useful when documents have deeply nested objects and you only want a subset of the nested data.
# Dot notation projection — include or exclude nested fields
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Imagine users had a nested address sub-document — query just the city
# db.users.find({}, {"address.city": 1, "name": 1, "_id": 0})
# Orders — project only the total and status, not the full items array
print("Order summaries (id, status, total only):")
orders = db.orders.find(
{},
{"status": 1, "total": 1, "_id": 1}
)
for o in orders:
print(f" {o['_id']} status: {o['status']:12} total: ${o['total']:.2f}")
# Project a nested field from embedded documents
# Show user_id and first item's product_id per order
print("\nOrders — user and items array:")
orders = db.orders.find(
{},
{"user_id": 1, "items": 1, "_id": 0}
)
for o in orders:
item_ids = [item["product_id"] for item in o["items"]]
print(f" user: {o['user_id']} products: {item_ids}")o001 status: delivered total: $44.96
o002 status: shipped total: $89.99
o003 status: delivered total: $99.98
o004 status: processing total: $349.99
o005 status: delivered total: $329.98
o006 status: cancelled total: $89.99
o007 status: delivered total: $11.97
Orders — user and items array:
user: u001 products: ['p001', 'p003']
user: u002 products: ['p002']
user: u001 products: ['p005']
user: u003 products: ['p004']
user: u004 products: ['p007', 'p001']
user: u005 products: ['p002']
user: u002 products: ['p006', 'p003']
- Dot notation in projections uses the same
"parent.child"syntax as in filter queries - Projecting a nested field returns the full parent object with only that nested field — not just the bare value
- You can chain as many levels as needed:
"address.location.coordinates": 1
$slice — Projecting a Subset of an Array
$slice lets you return only a portion of an array field — the first N elements, the last N elements, or a range. This is useful when documents contain large arrays (e.g. a post with hundreds of comments) and you only want the first few.
# $slice — return a portion of an array field
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Return only the first item from each order's items array
print("First item only from each order:")
orders = db.orders.find(
{},
{"user_id": 1, "items": {"$slice": 1}, "_id": 0}
)
for o in orders:
print(f" user: {o['user_id']} first item: {o['items']}")
# Return only the last 1 item from each order
print("\nLast item only from each order:")
orders = db.orders.find(
{},
{"user_id": 1, "items": {"$slice": -1}, "_id": 0}
)
for o in orders:
print(f" user: {o['user_id']} last item: {o['items']}")
# $slice with skip and limit — [skip, limit]
# Skip 1 item, return up to 1 item from the array
print("\nSecond item only (skip 1, take 1):")
orders = db.orders.find(
{"user_id": "u001"},
{"items": {"$slice": [1, 1]}, "_id": 0}
)
for o in orders:
print(f" items: {o.get('items', [])}")user: u001 first item: [{'product_id': 'p001', 'qty': 1, 'price': 29.99}]
user: u002 first item: [{'product_id': 'p002', 'qty': 1, 'price': 89.99}]
user: u001 first item: [{'product_id': 'p005', 'qty': 2, 'price': 49.99}]
user: u003 first item: [{'product_id': 'p004', 'qty': 1, 'price': 349.99}]
user: u004 first item: [{'product_id': 'p007', 'qty': 1, 'price': 299.99}]
user: u005 first item: [{'product_id': 'p002', 'qty': 1, 'price': 89.99}]
user: u002 first item: [{'product_id': 'p006', 'qty': 2, 'price': 3.49}]
Last item only from each order:
user: u001 last item: [{'product_id': 'p003', 'qty': 3, 'price': 4.99}]
user: u002 last item: [{'product_id': 'p002', 'qty': 1, 'price': 89.99}]
...
Second item only (skip 1, take 1):
items: [{'product_id': 'p003', 'qty': 3, 'price': 4.99}]
- Positive
$slice: Nreturns the first N elements; negative$slice: -Nreturns the last N $slice: [skip, limit]skips N elements then returns up to limit elements — useful for array pagination$sliceis a projection operator only — it does not filter which documents are returned, only how much of the array is shown
$elemMatch in Projection — Return One Matching Array Element
The $elemMatch projection operator returns only the first array element that matches a condition. Unlike the filter $elemMatch which selects documents, the projection $elemMatch trims the array in the output.
# $elemMatch as a projection operator — return first matching array element
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# From each order, return only the item with product_id "p001" if present
print("Orders — show only the p001 line item if it exists:")
orders = db.orders.find(
{"items.product_id": "p001"}, # filter: orders containing p001
{
"user_id": 1,
"items": {"$elemMatch": {"product_id": "p001"}}, # projection
"_id": 1
}
)
for o in orders:
print(f" {o['_id']} user: {o['user_id']} matched item: {o.get('items')}")
# Compare: without $elemMatch projection — entire items array returned
print("\nWithout $elemMatch projection — full items array:")
orders = db.orders.find(
{"items.product_id": "p001"},
{"user_id": 1, "items": 1, "_id": 0}
)
for o in orders:
print(f" user: {o['user_id']} all items: {[i['product_id'] for i in o['items']]}")o001 user: u001 matched item: [{'product_id': 'p001', 'qty': 1, 'price': 29.99}]
o005 user: u004 matched item: [{'product_id': 'p001', 'qty': 1, 'price': 29.99}]
Without $elemMatch projection — full items array:
user: u001 all items: ['p001', 'p003']
user: u004 all items: ['p007', 'p001']
- Projection
$elemMatchalways returns an array — even if only one element matched, it comes back wrapped in a list - If no element in the array matches the
$elemMatchcondition the field is omitted entirely from the result - Projection
$elemMatchonly returns the first matching element — use the aggregation$filterstage if you need all matching elements
Covered Queries — The Performance Payoff
When a query filter and projection together only reference fields that are covered by an index, MongoDB can satisfy the query entirely from the index without touching documents on disk. This is called a covered query and is the fastest possible read operation.
# Covered query concept — answered entirely from an index
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["dataplexa"]
# Create a compound index on category and price
db.products.create_index([("category", 1), ("price", 1)])
# This query is now a covered query:
# filter → uses category (in the index)
# projection → only category and price (both in the index)
# _id excluded → avoids fetching the full document
covered_result = db.products.find(
{"category": "Electronics"},
{"category": 1, "price": 1, "_id": 0} # only indexed fields
)
print("Covered query — Electronics name and price:")
for p in covered_result:
print(f" {p}")
print("\nKey rule: if filter + projection fields are all in one index")
print("MongoDB never reads the actual document from disk — pure index scan"){'category': 'Electronics', 'price': 29.99}
{'category': 'Electronics', 'price': 89.99}
{'category': 'Electronics', 'price': 49.99}
{'category': 'Electronics', 'price': 299.99}
Key rule: if filter + projection fields are all in one index
MongoDB never reads the actual document from disk — pure index scan
- A covered query is the highest-performance read path — it never touches the document store
- To achieve a covered query: all filter fields must be in the index, all projected fields must be in the index, and
_idmust be excluded (unless it is part of the index) - Verify a covered query using
.explain("executionStats")— look fortotalDocsExamined: 0
Summary Table
| Technique | Syntax | Returns | Best For |
|---|---|---|---|
| Inclusion | {"field": 1} |
Only listed fields | When you know exactly what you need |
| Exclusion | {"field": 0} |
Everything except listed | Hiding one or two sensitive fields |
| Suppress _id | {"_id": 0} |
Removes _id from result | Clean API responses |
| Dot notation | {"parent.child": 1} |
Nested field only | Picking fields from sub-documents |
| $slice | {"arr": {"$slice": N}} |
First / last N array elements | Large arrays — comments, items |
| $elemMatch | {"arr": {"$elemMatch": {...}}} |
First matching array element | Showing relevant array entry only |
Practice Questions
Practice 1. Write an inclusion projection that returns only the name, city, and country fields from the users collection, excluding _id.
Practice 2. Why can you not mix inclusion and exclusion values in a single projection?
Practice 3. What does {"items": {"$slice": -2}} return from an order document?
Practice 4. What are the three requirements for a MongoDB covered query?
Practice 5. What is the difference between $elemMatch used as a filter operator versus as a projection operator?
Quiz
Quiz 1. Which projection returns only the name field and always suppresses _id?
Quiz 2. What does $slice: [2, 3] return from an array?
Quiz 3. What metric in explain("executionStats") confirms a query is fully covered by an index?
Quiz 4. What happens when a projection $elemMatch condition matches no element in the array?
Quiz 5. Which projection approach is better when you need most fields from a large document but want to hide just a password_hash field?
Next up — Sorting & Limiting: Controlling the order and size of your result sets for performant, predictable queries.