Mango DBLesson 22 – Write & Read Concern | Dataplexa

Write & Read Concern

Every MongoDB operation carries two invisible settings that determine how strongly the database guarantees the correctness of what you write and what you read. Write concern controls how many replica set members must acknowledge a write before your application receives confirmation. Read concern controls how fresh and how isolated the data your application reads must be. Together they form a consistency dial — turn it up for maximum durability and correctness, turn it down for maximum throughput and lowest latency. Knowing how to set these correctly for different parts of your application is the difference between a system that loses data under failure and one that does not.

Why These Settings Exist

MongoDB runs in production as a replica set — one primary and one or more secondaries that continuously replicate writes. The moment a write lands on the primary it is not yet guaranteed to survive — if the primary crashes before replicating, that write is lost. Write concern and read concern let you choose exactly how much of that risk you are willing to accept, operation by operation.

# The core problem write concern solves — illustrated

replica_set_scenario = {
    "setup": "Primary + 2 Secondaries",
    "sequence": [
        "1. Application writes document to Primary",
        "2. Primary acknowledges write immediately (w=1 default)",
        "3. Primary begins replicating to Secondary 1 and Secondary 2",
        "4. Primary crashes BEFORE replication completes",
        "   → Write is LOST — secondaries never received it",
        "   → New primary elected from secondaries has a gap",
    ],
    "solution": (
        "With w='majority' — Primary waits until a majority of members "
        "confirm the write before acknowledging to the application. "
        "If Primary crashes after majority confirmation, "
        "at least one secondary already has the data."
    )
}

print(f"Setup: {replica_set_scenario['setup']}")
print("\nDefault behaviour (w=1):")
for step in replica_set_scenario["sequence"]:
    print(f"  {step}")
print(f"\nSolution:\n  {replica_set_scenario['solution']}")

Setup: Primary + 2 Secondaries

Default behaviour (w=1):
1. Application writes document to Primary
2. Primary acknowledges write immediately (w=1 default)
3. Primary begins replicating to Secondary 1 and Secondary 2
4. Primary crashes BEFORE replication completes
→ Write is LOST — secondaries never received it
→ New primary elected from secondaries has a gap

Solution:
With w='majority' — Primary waits until a majority of members confirm the write before acknowledging to the application.

A three-member replica set has a majority of 2 — so w="majority" requires the primary plus at least one secondary to confirm
The trade-off is latency — w="majority" adds a replication round-trip delay before your write is acknowledged
For most production applications, w="majority" is the correct default — the latency cost is small compared to the data safety gain

Write Concern — Levels and Options

Write concern is specified per-operation or at the client/collection level. It has three parts: w (how many members must acknowledge), j (whether the journal must be flushed to disk), and wtimeout (how long to wait before returning an error).

# Write concern levels — w, j, and wtimeout

from pymongo import MongoClient
from pymongo.write_concern import WriteConcern

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# ── w values ────────────────────────────────────────────────────────────────
write_concern_levels = {
    "w=0  (unacknowledged)": {
        "meaning":   "Fire and forget — no acknowledgement from any member",
        "durability": "None — write may be lost even on the primary",
        "latency":    "Lowest possible",
        "use_when":   "Metrics, logging — data loss is acceptable",
    },
    "w=1  (acknowledged — default)": {
        "meaning":    "Primary has received and applied the write",
        "durability": "Write survives primary restart but may be lost on failover",
        "latency":    "Low — single round trip",
        "use_when":   "Most general-purpose operations",
    },
    "w='majority'  (recommended for critical data)": {
        "meaning":    "Majority of replica set members confirmed the write",
        "durability": "Write survives primary failure — majority already has it",
        "latency":    "Moderate — waits for replication round trip",
        "use_when":   "Financial data, user records, anything you cannot afford to lose",
    },
    "w=N  (specific member count)": {
        "meaning":    "Exactly N members must confirm",
        "durability": "Depends on N — higher N = higher durability",
        "latency":    "Higher as N increases",
        "use_when":   "Rarely needed — 'majority' is almost always better",
    },
}

for level, details in write_concern_levels.items():
    print(f"\n{level}")
    for k, v in details.items():
        print(f"  {k:12} {v}")

w=0 (unacknowledged):
meaning Fire and forget — no acknowledgement from any member
durability None — write may be lost even on the primary
latency Lowest possible
use_when Metrics, logging — data loss is acceptable

w=1 (acknowledged — default):
meaning Primary has received and applied the write
durability Write survives primary restart but may be lost on failover
latency Low — single round trip
use_when Most general-purpose operations

w='majority' (recommended for critical data):
meaning Majority of replica set members confirmed the write
durability Write survives primary failure — majority already has it
latency Moderate — waits for replication round trip
use_when Financial data, user records, anything you cannot afford to lose

w=0 is called "unacknowledged" — the driver sends the write and immediately continues without waiting for any server response
w=1 is the default — adequate for most operations where the marginal risk of a primary failover during the replication window is acceptable
w="majority" is the MongoDB recommended setting for any data you care about in a replica set environment

The j (Journal) Option

The j option controls whether MongoDB must flush the write to the on-disk journal before acknowledging. Without journaling (j=False), a write acknowledged by the primary could still be lost if the server crashes before the next journal flush. With j=True, the write is durable to disk before your application receives confirmation.

# Write concern in PyMongo — applying per-operation and at collection level

from pymongo import MongoClient
from pymongo.write_concern import WriteConcern
from datetime import datetime, timezone

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# ── Per-operation write concern ───────────────────────────────────────────
# Majority + journaled — maximum durability for a critical order insert
critical_wc = WriteConcern(w="majority", j=True, wtimeout=5000)

new_order = {
    "_id":     "o_critical_001",
    "user_id": "u001",
    "status":  "processing",
    "total":   199.99,
    "date":    datetime.now(timezone.utc).strftime("%Y-%m-%d"),
    "items":   [{"product_id": "p002", "qty": 1, "price": 199.99}]
}

result = db.get_collection(
    "orders",
    write_concern=critical_wc
).insert_one(new_order)

print("Critical order inserted:")
print(f"  acknowledged: {result.acknowledged}")
print(f"  inserted_id:  {result.inserted_id}")

# ── Unacknowledged write for high-throughput logging ─────────────────────
log_wc = WriteConcern(w=0)
log_collection = db.get_collection("event_log", write_concern=log_wc)

log_collection.insert_one({
    "event":     "page_view",
    "user_id":   "u003",
    "timestamp": datetime.now(timezone.utc),
    "page":      "/products"
})
print("\nLog event inserted (unacknowledged — no wait)")

# ── Collection-level default write concern ───────────────────────────────
# All writes to this collection use majority + journal
safe_users = db.get_collection(
    "users",
    write_concern=WriteConcern(w="majority", j=True)
)
print(f"\nSafe users collection write concern: w=majority, j=True")

# Clean up
db.orders.delete_one({"_id": "o_critical_001"})

Critical order inserted:
acknowledged: True
inserted_id: o_critical_001

Log event inserted (unacknowledged — no wait)

Safe users collection write concern: w=majority, j=True

wtimeout in milliseconds — if the required members do not acknowledge within this time, MongoDB returns a WriteConcernError — the write may or may not have been applied
Set write concern at the collection level for consistent behaviour across all operations on that collection, rather than specifying it per-operation
j=True with w="majority" provides the strongest single-operation durability guarantee available in MongoDB

Read Concern — Reading Consistent Data

Read concern controls how up-to-date and how isolated the data your reads return. In a replica set, a secondary may be slightly behind the primary — read concern lets you decide whether you are willing to read potentially stale data from a secondary, or whether you require data that has been confirmed by a majority of the replica set.

# Read concern levels — local, available, majority, linearizable, snapshot

from pymongo import MongoClient
from pymongo.read_concern import ReadConcern

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

read_concern_levels = {
    "local (default)": {
        "returns":     "Most recent data on the queried node — may not be replicated yet",
        "isolation":   "None — can return data later rolled back on failover",
        "latency":     "Lowest",
        "use_when":    "Most reads — eventual consistency is acceptable",
    },
    "available": {
        "returns":     "Same as local but may return orphaned chunks in sharded clusters",
        "isolation":   "None — fastest possible",
        "latency":     "Lowest",
        "use_when":    "Maximum read throughput in sharded clusters",
    },
    "majority": {
        "returns":     "Only data acknowledged by a majority of replica set members",
        "isolation":   "Causal — data returned will not be rolled back on failover",
        "latency":     "Slightly higher — waits for majority confirmation",
        "use_when":    "Any read that must not see rolled-back data",
    },
    "linearizable": {
        "returns":     "Most recent majority-committed data — reflects all prior writes",
        "isolation":   "Linearizable — strongest guarantee",
        "latency":     "Highest — waits for all prior writes to propagate",
        "use_when":    "Strict single-document consistency (e.g. distributed locks)",
    },
    "snapshot": {
        "returns":     "A consistent snapshot at transaction start time",
        "isolation":   "Full snapshot isolation — no dirty or phantom reads",
        "latency":     "Varies",
        "use_when":    "Multi-document transactions",
    },
}

for level, details in read_concern_levels.items():
    print(f"\nread concern: {level}")
    for k, v in details.items():
        print(f"  {k:12} {v}")

read concern: local (default)
returns Most recent data on the queried node — may not be replicated yet
isolation None — can return data later rolled back on failover
latency Lowest
use_when Most reads — eventual consistency is acceptable

read concern: majority
returns Only data acknowledged by a majority of replica set members
isolation Causal — data returned will not be rolled back on failover
latency Slightly higher
use_when Any read that must not see rolled-back data

read concern: linearizable
returns Most recent majority-committed data
isolation Linearizable — strongest guarantee
latency Highest
use_when Strict single-document consistency

local is the default — it is fast and correct for the vast majority of application reads
majority read concern pairs naturally with w="majority" write concern — together they provide causal consistency
linearizable only works on primary reads and has significant latency overhead — use it only for operations that demand the absolute strongest correctness guarantees
snapshot is only valid inside a multi-document transaction

Applying Read Concern in PyMongo

# Read concern in PyMongo — per-collection and per-operation

from pymongo import MongoClient
from pymongo.read_concern import ReadConcern
from pymongo.read_preferences import ReadPreference

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Read with majority concern — data guaranteed not to be rolled back
majority_rc = ReadConcern(level="majority")

safe_products = db.get_collection(
    "products",
    read_concern=majority_rc
)

result = safe_products.find_one(
    {"_id": "p001"},
    {"name": 1, "price": 1, "_id": 0}
)
print("Product read with majority concern:", result)

# Local read concern (explicit — same as default)
local_products = db.get_collection(
    "products",
    read_concern=ReadConcern(level="local")
)
fast_result = local_products.find_one({"_id": "p002"}, {"name": 1, "price": 1, "_id": 0})
print("Product read with local concern:   ", fast_result)

# Practical rule summary
print("\nPractical rules:")
print("  Financial / user data writes:  w='majority', j=True")
print("  Logging / metrics writes:      w=0 or w=1")
print("  Critical reads (no rollback):  readConcern='majority'")
print("  General reads:                 readConcern='local' (default)")

Product read with majority concern: {'name': 'Wireless Mouse', 'price': 29.99}
Product read with local concern: {'name': 'Mechanical Keyboard', 'price': 89.99}

Practical rules:
Financial / user data writes: w='majority', j=True
Logging / metrics writes: w=0 or w=1
Critical reads (no rollback): readConcern='majority'
General reads: readConcern='local' (default)

Read concern can be set at the client, database, collection, or session level — more specific settings override broader ones
For most applications a single MongoClient configured with w="majority" and default readConcern="local" is sufficient and safe
Never mix a weak write concern with a strong read concern expecting consistency — they must be paired deliberately

Causal Consistency — Pairing Write and Read Concern

Causal consistency means that if your application writes a document and immediately reads it back, the read is guaranteed to see the write — even if the read happens on a secondary. This requires a paired combination of w="majority" and readConcern="majority", with a causally consistent session.

# Causal consistency — write then read guaranteed to see the write

from pymongo import MongoClient
from pymongo.write_concern import WriteConcern
from pymongo.read_concern import ReadConcern
from datetime import datetime, timezone

client = MongoClient("mongodb://localhost:27017/")

# Open a causally consistent session
with client.start_session(causal_consistency=True) as session:
    db = client["dataplexa"]

    # Write with majority concern
    wc = WriteConcern(w="majority")
    rc = ReadConcern(level="majority")

    # Insert a new review inside the causally consistent session
    db.get_collection("reviews", write_concern=wc).insert_one(
        {
            "_id":        "r_causal_test",
            "product_id": "p001",
            "user_id":    "u002",
            "rating":     4,
            "comment":    "Great build quality.",
            "date":       datetime.now(timezone.utc).strftime("%Y-%m-%d"),
        },
        session=session
    )

    # Read it back — guaranteed to see the just-inserted document
    # even if the read goes to a secondary
    review = db.get_collection("reviews", read_concern=rc).find_one(
        {"_id": "r_causal_test"},
        session=session
    )
    print("Causally consistent read:", review["comment"] if review else "NOT FOUND")

    # Clean up
    db.reviews.delete_one({"_id": "r_causal_test"})

print("Session closed — causal consistency context exited")

Causally consistent read: Great build quality.
Session closed — causal consistency context exited

Causal consistency sessions use a cluster time token to ensure the secondary you read from has caught up with your write before serving the result
Without causal consistency, a read on a secondary immediately after a primary write could return stale data — the secondary may not have replicated the write yet
For single-server local development, causal consistency makes no observable difference — it only matters in a replica set or sharded cluster

Summary Table

Setting	Level	What It Guarantees	Recommended Default
`w=0`	Write Concern	Nothing — fire and forget	Logs, metrics only
`w=1`	Write Concern	Primary received write	General purpose default
`w="majority"`	Write Concern	Survives primary failover	All critical data
`j=True`	Write Concern	Flushed to disk before ack	Pair with w="majority"
`readConcern local`	Read Concern	Most recent node data	General reads (default)
`readConcern majority`	Read Concern	Data won't be rolled back	Critical consistent reads
Causal consistency	Session	Read sees own writes	Write-then-read patterns

Practice Questions

Practice 1. What does w="majority" guarantee that w=1 does not?

Practice 2. What does j=True add to a write concern?

Practice 3. What is the risk of using readConcern="local" when reading from a secondary?

Practice 4. When should you use readConcern="snapshot" and where is it valid?

Practice 5. What two settings must be paired to achieve causal consistency — guaranteeing a read sees its own preceding write?

Quiz

Quiz 1. In a three-member replica set, how many members must confirm a write when w="majority"?

2 — the primary plus at least one secondary (majority of 3 is 2)
3 — all members must confirm
1 — only the primary
It depends on the network topology

Quiz 2. What happens if the required members do not acknowledge a write within the wtimeout period?

MongoDB returns a WriteConcernError — but the write may have been applied anyway, just not confirmed within the timeout
The write is automatically rolled back
MongoDB retries the write indefinitely
The connection is closed

Quiz 3. Which read concern level provides the strongest consistency guarantee and is intended for distributed lock scenarios?

linearizable — it guarantees the most recent majority-committed data and reflects all prior writes
majority
snapshot
local

Quiz 4. What is the correct write concern setting for a high-throughput event logging collection where occasional data loss is acceptable?

w=0 (unacknowledged) — fire and forget with no waiting for server confirmation
w="majority" — always use majority for correctness
j=True — journal all log events to disk
w=1 with wtimeout=0

Quiz 5. Why does causal consistency make no observable difference on a single standalone MongoDB server?

There is only one server — reads and writes go to the same node so there is no replication lag or secondary staleness to protect against
Causal consistency is disabled on standalone servers automatically
Causal consistency requires Atlas and does not work locally
Single servers use w='majority' by default

Next up — Data Modelling: Designing document schemas, choosing between embedding and referencing, and building structures that scale.

← Previous Course Index Next →