Mango DBLesson 17 – Delete Documents | Dataplexa

Delete Documents

Deleting data is the most irreversible operation in MongoDB — once a document is gone it cannot be recovered without a backup. MongoDB provides two methods: delete_one() for removing a single matching document and delete_many() for removing every document that matches a filter. Both accept the same filter syntax used in find() and update_one(), making them easy to reason about. This lesson covers both methods, safe deletion patterns, soft deletes, and how to use find_one_and_delete() to atomically fetch and remove a document in one operation — all against the Dataplexa Store dataset.

delete_one() — Removing a Single Document

delete_one() finds the first document that matches the filter and removes it permanently. It returns a result object with a deleted_count property — either 1 if a document was found and deleted, or 0 if no match was found.

Why it exists: most application deletes target one specific record — a user closes their account, an admin removes a product, a job is dequeued. delete_one() is precise and safe — it will never accidentally remove more than one document even if multiple documents match the filter.

# delete_one() — remove the first matching document

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# First check what we are about to delete
target = db.reviews.find_one({"_id": "r008"})
print("About to delete:", target)

# Delete the review
result = db.reviews.delete_one({"_id": "r008"})
print("Deleted count:", result.deleted_count)

# Confirm it is gone
gone = db.reviews.find_one({"_id": "r008"})
print("After delete:", gone)

# delete_one() with no match — returns 0, no error raised
result = db.reviews.delete_one({"_id": "r999"})
print("No match deleted_count:", result.deleted_count)
About to delete: {'_id': 'r008', 'product_id': 'p007', 'user_id': 'u001', 'rating': 5, 'comment': 'Incredible colour accuracy for design work.', 'date': ...}
Deleted count: 1
After delete: None
No match deleted_count: 0
  • deleted_count is 1 on success and 0 when no matching document was found — no exception is raised for a no-match
  • Always identify the document with find_one() before deleting — confirm you are targeting the right record
  • Deleting by _id is the safest approach — it is unique and indexed, so there is zero ambiguity
  • If multiple documents match the filter, delete_one() removes only the first in natural order — use delete_many() if you need all of them gone

delete_many() — Removing Multiple Documents

delete_many() removes every document that matches the filter in a single operation. It is the right tool for bulk cleanup — removing all cancelled orders, purging expired sessions, clearing test data.

Why it matters: deleting records one by one in a loop is slow and creates unnecessary round trips. delete_many() handles bulk removal efficiently in a single server-side operation.

# delete_many() — remove all matching documents

from pymongo import MongoClient
from datetime import datetime, timezone

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# First — see what we are about to remove
cancelled = list(db.orders.find({"status": "cancelled"}, {"_id": 1, "status": 1}))
print("Cancelled orders to delete:", [o["_id"] for o in cancelled])

# Delete all cancelled orders
result = db.orders.delete_many({"status": "cancelled"})
print(f"Deleted {result.deleted_count} cancelled order(s)")

# Confirm the remaining orders
remaining = db.orders.count_documents({})
print("Remaining orders:", remaining)

# Delete products with zero stock
result = db.products.delete_many({"stock": {"$lte": 0}})
print(f"\nOut-of-stock products deleted: {result.deleted_count}")

# Remove all documents from a collection — pass empty filter
# CAUTION: this deletes everything — use drop() if you want to remove the collection too
result = db.user_sessions.delete_many({})
print(f"User sessions cleared: {result.deleted_count}")
Cancelled orders to delete: ['o006']
Deleted 1 cancelled order(s)
Remaining orders: 6

Out-of-stock products deleted: 0

User sessions cleared: 1
  • Always run the equivalent find() with the same filter before delete_many() — review what will be deleted first
  • delete_many({}) with an empty filter removes every document in the collection — use with extreme caution
  • For removing a collection entirely, db.collection.drop() is faster than delete_many({}) — it removes the collection and its indexes in one step
  • delete_many() is not transactional by default — if it is interrupted partway through, some documents will have been deleted and some will not

The Safe Delete Pattern

Deletes are permanent. A safe delete pattern always involves three steps: verify the target with a query first, perform the delete, and confirm the result. For critical data, consider wrapping in a transaction.

# Safe delete pattern — verify before deleting

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

def safe_delete_user(user_id: str) -> dict:
    """
    Delete a user only after confirming they exist.
    Returns a result dict with status and details.
    """
    # Step 1 — verify the target exists
    user = db.users.find_one({"_id": user_id})
    if not user:
        return {"status": "not_found", "deleted": False}

    # Step 2 — log what we are deleting (audit trail)
    print(f"Deleting user: {user['name']} ({user['email']})")

    # Step 3 — perform the delete
    result = db.users.delete_one({"_id": user_id})

    # Step 4 — confirm
    if result.deleted_count == 1:
        return {"status": "deleted", "deleted": True, "user": user["name"]}
    else:
        return {"status": "error", "deleted": False}

# Test the function
print(safe_delete_user("u006"))    # inserted in Lesson 11 — should succeed
print(safe_delete_user("u999"))    # does not exist
Deleting user: Frank Rossi (frank@example.com)
{'status': 'deleted', 'deleted': True, 'user': 'Frank Rossi'}
{'status': 'not_found', 'deleted': False}
  • Always fetch and log the document before deleting — creates an audit trail and confirms you have the right record
  • In regulated industries (finance, healthcare) deletes must be logged to an audit collection before execution
  • Consider returning the deleted document to the caller — the find_one_and_delete() method does this atomically

find_one_and_delete() — Atomic Fetch and Remove

find_one_and_delete() finds a document, returns it, and deletes it — all in a single atomic operation. This is essential for queue processing where two workers must never receive the same job. It eliminates the race condition between a find_one() and a subsequent delete_one().

# find_one_and_delete() — atomic fetch then delete

from pymongo import MongoClient, ASCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Simulate a job queue — pop the next pending order for processing
# In a real queue, multiple workers call this simultaneously — atomicity prevents double-processing

def pop_next_order(status="processing"):
    """
    Atomically fetch and remove the next order with the given status.
    Returns the order document or None if the queue is empty.
    """
    order = db.orders.find_one_and_delete(
        {"status": status},
        sort=[("date", ASCENDING)]        # process oldest first
    )
    return order

# Pop the next processing order
order = pop_next_order("processing")
if order:
    print(f"Processing order: {order['_id']}  user: {order['user_id']}  total: ${order['total']}")
else:
    print("No processing orders in queue")

# Confirm it was removed
count_after = db.orders.count_documents({"status": "processing"})
print(f"Remaining processing orders: {count_after}")
Processing order: o004 user: u003 total: $349.99
Remaining processing orders: 0
  • find_one_and_delete() returns the document as it was before deletion — useful for logging and processing
  • The sort parameter ensures the operation is deterministic — without it, which document is deleted is undefined
  • This is the correct pattern for building work queues and task processors — never use find_one() followed by delete_one() for this use case
  • Returns None if no document matches — always check before accessing the result

Soft Deletes — Hiding Instead of Removing

A soft delete does not remove the document — it marks it as deleted with a flag field and excludes it from normal queries. This preserves history, supports undo, and satisfies audit requirements. It is the preferred pattern in any system where data must be recoverable or auditable.

# Soft delete pattern — mark as deleted rather than remove

from pymongo import MongoClient
from datetime import datetime, timezone

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

def soft_delete_review(review_id: str, deleted_by: str):
    """Mark a review as deleted without removing it from the database."""
    result = db.reviews.update_one(
        {"_id": review_id, "deleted": {"$exists": False}},  # not already deleted
        {"$set": {
            "deleted":    True,
            "deleted_at": datetime.now(timezone.utc),
            "deleted_by": deleted_by
        }}
    )
    return result.modified_count == 1

# Soft delete review r007
success = soft_delete_review("r007", deleted_by="admin")
print("Soft deleted:", success)

# Normal query — exclude soft-deleted documents
active_reviews = db.reviews.find(
    {"deleted": {"$exists": False}},    # only non-deleted reviews
    {"_id": 1, "product_id": 1, "rating": 1}
)
print("\nActive reviews:")
for r in active_reviews:
    print(f"  {r['_id']}  product: {r['product_id']}  rating: {r['rating']}")

# Admin query — include soft-deleted documents to see full history
all_reviews = db.reviews.find({}, {"_id": 1, "deleted": 1})
print("\nAll reviews including soft-deleted:")
for r in all_reviews:
    status = "deleted" if r.get("deleted") else "active"
    print(f"  {r['_id']} — {status}")
Soft deleted: True

Active reviews:
r001 product: p001 rating: 5
r002 product: p002 rating: 4
r003 product: p004 rating: 5
r004 product: p007 rating: 4
r005 product: p005 rating: 4
r006 product: p002 rating: 5

All reviews including soft-deleted:
r001 — active
r002 — active
r003 — active
r004 — active
r005 — active
r006 — active
r007 — deleted
  • Always filter with {"deleted": {"$exists": False}} in application queries to exclude soft-deleted documents automatically
  • Add an index on the deleted field — or better, use a partial index to only index active documents
  • The trade-off: soft deletes grow your collection over time. Schedule a background job to hard-delete documents where deleted_at is older than your retention period
  • Soft deletes support undelete — simply remove the deleted, deleted_at, and deleted_by fields with $unset

Dropping a Collection vs Deleting All Documents

When you need to remove everything, choosing between delete_many({}) and drop() matters — they have different performance characteristics and different effects on indexes.

# drop() vs delete_many({}) — knowing when to use each

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

comparison = {
    "delete_many({})": {
        "removes_documents": True,
        "removes_indexes":   False,     # indexes are preserved
        "removes_collection": False,
        "speed":             "Slower — deletes document by document",
        "use_when":          "You want to clear data but keep the collection structure and indexes"
    },
    "collection.drop()": {
        "removes_documents":  True,
        "removes_indexes":    True,     # indexes are removed
        "removes_collection": True,
        "speed":              "Instant — removes all data files in one step",
        "use_when":           "You want to completely destroy and recreate the collection"
    }
}

for method, details in comparison.items():
    print(f"\n{method}:")
    for k, v in details.items():
        print(f"  {k:25} {v}")

# Example: drop and recreate for test data reset
# db.test_collection.drop()
# All documents and indexes removed instantly
delete_many({}):
removes_documents True
removes_indexes False
removes_collection False
speed Slower — deletes document by document
use_when You want to clear data but keep the collection structure and indexes

collection.drop():
removes_documents True
removes_indexes True
removes_collection True
speed Instant — removes all data files in one step
use_when You want to completely destroy and recreate the collection
  • Use drop() for resetting test environments, clearing temp collections, or rebuilding from scratch
  • Use delete_many({}) when you want to empty a collection but keep its indexes and validation rules intact
  • After a drop(), re-creating the collection and its indexes is necessary before re-importing data

Summary Table

Method Removes Returns Best For
delete_one(filter) First matching document deleted_count (0 or 1) Targeted single-record removal
delete_many(filter) All matching documents deleted_count (n) Bulk cleanup, purging expired data
find_one_and_delete() First matching document The deleted document Queue processing, atomic fetch-and-remove
Soft delete Nothing — marks as deleted modified_count Auditable systems, undo support
collection.drop() All docs + indexes + collection None Reset environments, rebuild from scratch

Practice Questions

Practice 1. What does delete_one() return when no document matches the filter?



Practice 2. Why is find_one_and_delete() safer than a find_one() followed by delete_one() for queue processing?



Practice 3. What are three advantages of using soft deletes over hard deletes?



Practice 4. What is the key difference between drop() and delete_many({}) when clearing a collection?



Practice 5. Write the filter for a normal application query that excludes soft-deleted documents.



Quiz

Quiz 1. If three documents match the filter in delete_one(), how many are removed?






Quiz 2. What does find_one_and_delete() return?






Quiz 3. What is the main disadvantage of soft deletes over time?






Quiz 4. Which delete method should you use to remove a whole collection including all its indexes in one step?






Quiz 5. Why is deleting by _id safer than deleting by another field like name?






Next up — Comparison Operators: Mastering $eq, $ne, $gt, $gte, $lt, $lte, $in, and $nin to build precise range and value queries.