Mango DBLesson 17 – Delete Documents | Dataplexa

Delete Documents

Deleting data is the most irreversible operation in MongoDB — once a document is gone it cannot be recovered without a backup. MongoDB provides two methods: delete_one() for removing a single matching document and delete_many() for removing every document that matches a filter. Both accept the same filter syntax used in find() and update_one(), making them easy to reason about. This lesson covers both methods, safe deletion patterns, soft deletes, and how to use find_one_and_delete() to atomically fetch and remove a document in one operation — all against the Dataplexa Store dataset.

delete_one() — Removing a Single Document

delete_one() finds the first document that matches the filter and removes it permanently. It returns a result object with a deleted_count property — either 1 if a document was found and deleted, or 0 if no match was found.

Why it exists: most application deletes target one specific record — a user closes their account, an admin removes a product, a job is dequeued. delete_one() is precise and safe — it will never accidentally remove more than one document even if multiple documents match the filter.

# delete_one() — remove the first matching document

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# First check what we are about to delete
target = db.reviews.find_one({"_id": "r008"})
print("About to delete:", target)

# Delete the review
result = db.reviews.delete_one({"_id": "r008"})
print("Deleted count:", result.deleted_count)

# Confirm it is gone
gone = db.reviews.find_one({"_id": "r008"})
print("After delete:", gone)

# delete_one() with no match — returns 0, no error raised
result = db.reviews.delete_one({"_id": "r999"})
print("No match deleted_count:", result.deleted_count)

About to delete: {'_id': 'r008', 'product_id': 'p007', 'user_id': 'u001', 'rating': 5, 'comment': 'Incredible colour accuracy for design work.', 'date': ...}
Deleted count: 1
After delete: None
No match deleted_count: 0

deleted_count is 1 on success and 0 when no matching document was found — no exception is raised for a no-match
Always identify the document with find_one() before deleting — confirm you are targeting the right record
Deleting by _id is the safest approach — it is unique and indexed, so there is zero ambiguity
If multiple documents match the filter, delete_one() removes only the first in natural order — use delete_many() if you need all of them gone

delete_many() — Removing Multiple Documents

delete_many() removes every document that matches the filter in a single operation. It is the right tool for bulk cleanup — removing all cancelled orders, purging expired sessions, clearing test data.

Why it matters: deleting records one by one in a loop is slow and creates unnecessary round trips. delete_many() handles bulk removal efficiently in a single server-side operation.

# delete_many() — remove all matching documents

from pymongo import MongoClient
from datetime import datetime, timezone

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# First — see what we are about to remove
cancelled = list(db.orders.find({"status": "cancelled"}, {"_id": 1, "status": 1}))
print("Cancelled orders to delete:", [o["_id"] for o in cancelled])

# Delete all cancelled orders
result = db.orders.delete_many({"status": "cancelled"})
print(f"Deleted {result.deleted_count} cancelled order(s)")

# Confirm the remaining orders
remaining = db.orders.count_documents({})
print("Remaining orders:", remaining)

# Delete products with zero stock
result = db.products.delete_many({"stock": {"$lte": 0}})
print(f"\nOut-of-stock products deleted: {result.deleted_count}")

# Remove all documents from a collection — pass empty filter
# CAUTION: this deletes everything — use drop() if you want to remove the collection too
result = db.user_sessions.delete_many({})
print(f"User sessions cleared: {result.deleted_count}")

Cancelled orders to delete: ['o006']
Deleted 1 cancelled order(s)
Remaining orders: 6

Out-of-stock products deleted: 0

User sessions cleared: 1

Always run the equivalent find() with the same filter before delete_many() — review what will be deleted first
delete_many({}) with an empty filter removes every document in the collection — use with extreme caution
For removing a collection entirely, db.collection.drop() is faster than delete_many({}) — it removes the collection and its indexes in one step
delete_many() is not transactional by default — if it is interrupted partway through, some documents will have been deleted and some will not

The Safe Delete Pattern

Deletes are permanent. A safe delete pattern always involves three steps: verify the target with a query first, perform the delete, and confirm the result. For critical data, consider wrapping in a transaction.

# Safe delete pattern — verify before deleting

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

def safe_delete_user(user_id: str) -> dict:
    """
    Delete a user only after confirming they exist.
    Returns a result dict with status and details.
    """
    # Step 1 — verify the target exists
    user = db.users.find_one({"_id": user_id})
    if not user:
        return {"status": "not_found", "deleted": False}

    # Step 2 — log what we are deleting (audit trail)
    print(f"Deleting user: {user['name']} ({user['email']})")

    # Step 3 — perform the delete
    result = db.users.delete_one({"_id": user_id})

    # Step 4 — confirm
    if result.deleted_count == 1:
        return {"status": "deleted", "deleted": True, "user": user["name"]}
    else:
        return {"status": "error", "deleted": False}

# Test the function
print(safe_delete_user("u006"))    # inserted in Lesson 11 — should succeed
print(safe_delete_user("u999"))    # does not exist

Deleting user: Frank Rossi (frank@example.com)
{'status': 'deleted', 'deleted': True, 'user': 'Frank Rossi'}
{'status': 'not_found', 'deleted': False}

Always fetch and log the document before deleting — creates an audit trail and confirms you have the right record
In regulated industries (finance, healthcare) deletes must be logged to an audit collection before execution
Consider returning the deleted document to the caller — the find_one_and_delete() method does this atomically

find_one_and_delete() — Atomic Fetch and Remove

find_one_and_delete() finds a document, returns it, and deletes it — all in a single atomic operation. This is essential for queue processing where two workers must never receive the same job. It eliminates the race condition between a find_one() and a subsequent delete_one().

# find_one_and_delete() — atomic fetch then delete

from pymongo import MongoClient, ASCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Simulate a job queue — pop the next pending order for processing
# In a real queue, multiple workers call this simultaneously — atomicity prevents double-processing

def pop_next_order(status="processing"):
    """
    Atomically fetch and remove the next order with the given status.
    Returns the order document or None if the queue is empty.
    """
    order = db.orders.find_one_and_delete(
        {"status": status},
        sort=[("date", ASCENDING)]        # process oldest first
    )
    return order

# Pop the next processing order
order = pop_next_order("processing")
if order:
    print(f"Processing order: {order['_id']}  user: {order['user_id']}  total: ${order['total']}")
else:
    print("No processing orders in queue")

# Confirm it was removed
count_after = db.orders.count_documents({"status": "processing"})
print(f"Remaining processing orders: {count_after}")

Processing order: o004 user: u003 total: $349.99
Remaining processing orders: 0

find_one_and_delete() returns the document as it was before deletion — useful for logging and processing
The sort parameter ensures the operation is deterministic — without it, which document is deleted is undefined
This is the correct pattern for building work queues and task processors — never use find_one() followed by delete_one() for this use case
Returns None if no document matches — always check before accessing the result

Soft Deletes — Hiding Instead of Removing

A soft delete does not remove the document — it marks it as deleted with a flag field and excludes it from normal queries. This preserves history, supports undo, and satisfies audit requirements. It is the preferred pattern in any system where data must be recoverable or auditable.

# Soft delete pattern — mark as deleted rather than remove

from pymongo import MongoClient
from datetime import datetime, timezone

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

def soft_delete_review(review_id: str, deleted_by: str):
    """Mark a review as deleted without removing it from the database."""
    result = db.reviews.update_one(
        {"_id": review_id, "deleted": {"$exists": False}},  # not already deleted
        {"$set": {
            "deleted":    True,
            "deleted_at": datetime.now(timezone.utc),
            "deleted_by": deleted_by
        }}
    )
    return result.modified_count == 1

# Soft delete review r007
success = soft_delete_review("r007", deleted_by="admin")
print("Soft deleted:", success)

# Normal query — exclude soft-deleted documents
active_reviews = db.reviews.find(
    {"deleted": {"$exists": False}},    # only non-deleted reviews
    {"_id": 1, "product_id": 1, "rating": 1}
)
print("\nActive reviews:")
for r in active_reviews:
    print(f"  {r['_id']}  product: {r['product_id']}  rating: {r['rating']}")

# Admin query — include soft-deleted documents to see full history
all_reviews = db.reviews.find({}, {"_id": 1, "deleted": 1})
print("\nAll reviews including soft-deleted:")
for r in all_reviews:
    status = "deleted" if r.get("deleted") else "active"
    print(f"  {r['_id']} — {status}")

Soft deleted: True

Active reviews:
r001 product: p001 rating: 5
r002 product: p002 rating: 4
r003 product: p004 rating: 5
r004 product: p007 rating: 4
r005 product: p005 rating: 4
r006 product: p002 rating: 5

All reviews including soft-deleted:
r001 — active
r002 — active
r003 — active
r004 — active
r005 — active
r006 — active
r007 — deleted

Always filter with {"deleted": {"$exists": False}} in application queries to exclude soft-deleted documents automatically
Add an index on the deleted field — or better, use a partial index to only index active documents
The trade-off: soft deletes grow your collection over time. Schedule a background job to hard-delete documents where deleted_at is older than your retention period
Soft deletes support undelete — simply remove the deleted, deleted_at, and deleted_by fields with $unset

Dropping a Collection vs Deleting All Documents

When you need to remove everything, choosing between delete_many({}) and drop() matters — they have different performance characteristics and different effects on indexes.

# drop() vs delete_many({}) — knowing when to use each

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

comparison = {
    "delete_many({})": {
        "removes_documents": True,
        "removes_indexes":   False,     # indexes are preserved
        "removes_collection": False,
        "speed":             "Slower — deletes document by document",
        "use_when":          "You want to clear data but keep the collection structure and indexes"
    },
    "collection.drop()": {
        "removes_documents":  True,
        "removes_indexes":    True,     # indexes are removed
        "removes_collection": True,
        "speed":              "Instant — removes all data files in one step",
        "use_when":           "You want to completely destroy and recreate the collection"
    }
}

for method, details in comparison.items():
    print(f"\n{method}:")
    for k, v in details.items():
        print(f"  {k:25} {v}")

# Example: drop and recreate for test data reset
# db.test_collection.drop()
# All documents and indexes removed instantly

delete_many({}):
removes_documents True
removes_indexes False
removes_collection False
speed Slower — deletes document by document
use_when You want to clear data but keep the collection structure and indexes

collection.drop():
removes_documents True
removes_indexes True
removes_collection True
speed Instant — removes all data files in one step
use_when You want to completely destroy and recreate the collection

Use drop() for resetting test environments, clearing temp collections, or rebuilding from scratch
Use delete_many({}) when you want to empty a collection but keep its indexes and validation rules intact
After a drop(), re-creating the collection and its indexes is necessary before re-importing data

Summary Table

Method	Removes	Returns	Best For
`delete_one(filter)`	First matching document	`deleted_count` (0 or 1)	Targeted single-record removal
`delete_many(filter)`	All matching documents	`deleted_count` (n)	Bulk cleanup, purging expired data
`find_one_and_delete()`	First matching document	The deleted document	Queue processing, atomic fetch-and-remove
Soft delete	Nothing — marks as deleted	`modified_count`	Auditable systems, undo support
`collection.drop()`	All docs + indexes + collection	None	Reset environments, rebuild from scratch

Practice Questions

Practice 1. What does delete_one() return when no document matches the filter?

Practice 2. Why is find_one_and_delete() safer than a find_one() followed by delete_one() for queue processing?

Practice 3. What are three advantages of using soft deletes over hard deletes?

Practice 4. What is the key difference between drop() and delete_many({}) when clearing a collection?

Practice 5. Write the filter for a normal application query that excludes soft-deleted documents.

Quiz

Quiz 1. If three documents match the filter in delete_one(), how many are removed?

One — delete_one() always removes only the first matching document
Three — all matching documents are removed
None — delete_one() raises an error when multiple documents match
It depends on the filter type

Quiz 2. What does find_one_and_delete() return?

The document as it was before deletion — or None if no match was found
A DeleteResult with deleted_count
The _id of the deleted document
True if deleted, False if not found

Quiz 3. What is the main disadvantage of soft deletes over time?

The collection grows indefinitely — soft-deleted documents accumulate and must be periodically hard-deleted to manage storage
Soft-deleted documents cannot be queried
MongoDB does not support the $exists operator
Soft deletes require a transaction to be safe

Quiz 4. Which delete method should you use to remove a whole collection including all its indexes in one step?

collection.drop()
delete_many({})
delete_one({})
db.dropDatabase()

Quiz 5. Why is deleting by _id safer than deleting by another field like name?

_id is guaranteed unique — deleting by name could match multiple documents with the same name, causing unintended data loss
_id queries bypass the storage engine for faster deletion
MongoDB only allows deletion by _id
Other fields are not indexed by default so deletion fails

Next up — Comparison Operators: Mastering $eq, $ne, $gt, $gte, $lt, $lte, $in, and $nin to build precise range and value queries.

← Previous Course Index Next →