Mango DBLesson 18 – Comparison Operators | Dataplexa

Comparison Operators

Comparison operators are the building blocks of every non-trivial MongoDB query. They let you filter documents by asking questions like "is the price greater than $50?", "is the status anything other than delivered?", or "is the country one of these five values?". Every comparison operator follows the same pattern — it sits inside the filter document as the value of a field, wrapped in its own dictionary. Once you understand the pattern, all eight operators feel natural and consistent.

Lesson 13 introduced these operators briefly. This lesson goes deeper — covering edge cases, type-aware comparisons, performance implications, and a full set of worked examples against the Dataplexa Store dataset.

The Comparison Operator Pattern

Every comparison operator follows one consistent structure — the field name maps to a dictionary containing the operator and its operand. This pattern applies to all eight operators and composes cleanly with logical operators.

# The comparison operator pattern — consistent across all operators

# Basic structure:
# { "field": { "$operator": value } }

# Simple exact match (no operator needed — implicit $eq)
implicit_eq = {"membership": "premium"}

# Explicit $eq — identical result to above
explicit_eq = {"membership": {"$eq": "premium"}}

# Range query — combining two operators on one field
range_query = {"price": {"$gte": 20, "$lte": 100}}

# Operator on a nested field using dot notation
nested_query = {"items.price": {"$gt": 10}}

# Operator combined with a filter — all conditions must match
combined = {"category": "Electronics", "price": {"$lt": 50}}

patterns = [
    ("Implicit eq",  implicit_eq),
    ("Explicit eq",  explicit_eq),
    ("Range",        range_query),
    ("Nested field", nested_query),
    ("Combined",     combined),
]

for label, pattern in patterns:
    print(f"  {label:15} → {pattern}")
Implicit eq → {'membership': 'premium'}
Explicit eq → {'membership': {'$eq': 'premium'}}
Range → {'price': {'$gte': 20, '$lte': 100}}
Nested field → {'items.price': {'$gt': 10}}
Combined → {'category': 'Electronics', 'price': {'$lt': 50}}
  • The implicit equality shorthand {"field": value} is equivalent to {"field": {"$eq": value}} — use whichever is more readable
  • Multiple operators on the same field are combined with an implicit AND — both conditions must be true
  • Comparison operators work on any BSON type — strings, numbers, dates, ObjectIds — following BSON comparison order

$eq and $ne — Equal and Not Equal

$eq matches documents where a field equals a specific value. $ne matches documents where a field does not equal the value — including documents where the field does not exist at all.

# $eq and $ne — equality and inequality

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# $eq — find all premium users (explicit form)
print("Premium users ($eq):")
results = db.users.find(
    {"membership": {"$eq": "premium"}},
    {"name": 1, "membership": 1, "_id": 0}
)
for u in results:
    print(f"  {u['name']} — {u['membership']}")

# $ne — find all orders that are NOT delivered
print("\nNon-delivered orders ($ne):")
results = db.orders.find(
    {"status": {"$ne": "delivered"}},
    {"_id": 1, "status": 1, "total": 1}
)
for o in results:
    print(f"  {o['_id']}  {o['status']:12}  ${o['total']:.2f}")

# $ne also matches documents where the field does not exist
# Adding a document with no membership field to demonstrate
db.users.insert_one({"_id": "u_temp", "name": "Temp User"})
no_membership = db.users.find(
    {"membership": {"$ne": "premium"}},
    {"name": 1, "membership": 1, "_id": 0}
)
print("\nUsers who are not premium (including those with no membership field):")
for u in no_membership:
    print(f"  {u['name']} — {u.get('membership', '[field missing]')}")

# Clean up the temp document
db.users.delete_one({"_id": "u_temp"})
Premium users ($eq):
Alice Johnson — premium
Clara Diaz — premium
Eva Müller — premium

Non-delivered orders ($ne):
o002 shipped $89.99
o004 processing $349.99

Users who are not premium (including those with no membership field):
Bob Smith — basic
David Lee — basic
Temp User — [field missing]
  • $ne matches documents where the field is absent — use {"field": {"$ne": value}, "field": {"$exists": true}} if you want to exclude missing fields too
  • For exact equality the implicit shorthand is always preferred — {"status": "delivered"} is cleaner than {"status": {"$eq": "delivered"}}
  • On indexed fields, both $eq and $ne use the index — but $ne often results in a large portion of the index being scanned since it matches most documents

$gt and $gte — Greater Than

$gt matches documents where the field value is strictly greater than the operand. $gte includes equality — greater than or equal. Both work on numbers, dates, strings, and any other comparable BSON type.

# $gt and $gte — greater than comparisons

from pymongo import MongoClient
from datetime import datetime, timezone

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Products strictly more expensive than $50
print("Products over $50 ($gt):")
results = db.products.find(
    {"price": {"$gt": 50}},
    {"name": 1, "price": 1, "_id": 0}
)
for p in results:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Products $50 or more ($gte includes the boundary)
print("\nProducts $50 or more ($gte):")
results = db.products.find(
    {"price": {"$gte": 50}},
    {"name": 1, "price": 1, "_id": 0}
)
for p in results:
    print(f"  ${p['price']:>7.2f}  {p['name']}")

# Users aged 30 or older
print("\nUsers aged 30+ ($gte):")
results = db.users.find(
    {"age": {"$gte": 30}},
    {"name": 1, "age": 1, "_id": 0}
)
for u in results:
    print(f"  {u['name']} — age {u['age']}")

# Orders placed after a specific date ($gt on a date field)
cutoff = datetime(2024, 2, 1, tzinfo=timezone.utc)
print("\nOrders after 2024-02-01 ($gt on date):")
results = db.orders.find(
    {"date": {"$gt": cutoff.strftime("%Y-%m-%d")}},
    {"_id": 1, "date": 1, "total": 1}
)
for o in results:
    print(f"  {o['_id']}  {o['date']}  ${o['total']:.2f}")
Products over $50 ($gt):
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
$349.99 Standing Desk

Products $50 or more ($gte):
$ 89.99 Mechanical Keyboard
$299.99 Monitor 27-inch
$349.99 Standing Desk

Users aged 30+ ($gte):
Alice Johnson — age 30
David Lee — age 32

Orders after 2024-02-01 ($gt on date):
o003 2024-02-20 $99.98
o004 2024-03-01 $349.99
o005 2024-03-10 $329.98
o006 2024-03-15 $89.99
o007 2024-04-01 $11.97
  • The difference between $gt and $gte only matters at the boundary value — choose based on whether the boundary should be included
  • String comparison with $gt/$gte follows UTF-8 lexicographic order — "b" > "a" is true, but "B" < "a" because uppercase letters have lower byte values
  • For date comparisons always use BSON Date (datetime) — string date comparisons only work correctly if dates are stored in ISO 8601 format (YYYY-MM-DD)

$lt and $lte — Less Than

$lt matches documents where the field is strictly less than the operand. $lte includes the boundary. Combined with $gt/$gte on the same field, they create range queries — one of the most common query patterns in any application.

# $lt and $lte — less than comparisons and range queries

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Budget products — strictly under $10
print("Budget products under $10 ($lt):")
results = db.products.find(
    {"price": {"$lt": 10}},
    {"name": 1, "price": 1, "_id": 0}
)
for p in results:
    print(f"  ${p['price']:>6.2f}  {p['name']}")

# Products $50 or under ($lte includes the boundary)
print("\nProducts $50 and under ($lte):")
results = db.products.find(
    {"price": {"$lte": 50}},
    {"name": 1, "price": 1, "_id": 0}
)
for p in results:
    print(f"  ${p['price']:>6.2f}  {p['name']}")

# Range query — mid-range products between $20 and $100
print("\nMid-range products $20–$100 ($gte + $lte):")
results = db.products.find(
    {"price": {"$gte": 20, "$lte": 100}},
    {"name": 1, "price": 1, "_id": 0}
)
for p in results:
    print(f"  ${p['price']:>6.2f}  {p['name']}")

# Users under 30 years old
print("\nUsers under 30 ($lt):")
results = db.users.find(
    {"age": {"$lt": 30}},
    {"name": 1, "age": 1, "_id": 0}
)
for u in results:
    print(f"  {u['name']} — age {u['age']}")
Budget products under $10 ($lt):
$ 4.99 Notebook A5
$ 3.49 Ballpoint Pens 10-pack

Products $50 and under ($lte):
$29.99 Wireless Mouse
$ 4.99 Notebook A5
$49.99 USB-C Hub
$ 3.49 Ballpoint Pens 10-pack

Mid-range products $20–$100 ($gte + $lte):
$29.99 Wireless Mouse
$49.99 USB-C Hub
$89.99 Mechanical Keyboard

Users under 30 ($lt):
Bob Smith — age 25
Clara Diaz — age 28
  • The range pattern {"price": {"$gte": min, "$lte": max}} is extremely common — both operators apply to the same field simultaneously
  • A single index on price covers this range query efficiently — MongoDB scans only the relevant portion of the index
  • Open ranges are valid — {"age": {"$lt": 18}} has no lower bound, {"age": {"$gte": 65}} has no upper bound

$in and $nin — Match a List of Values

$in matches documents where the field equals any value in a provided list — equivalent to multiple $eq conditions joined with OR, but more efficient. $nin is the inverse — the field must not equal any value in the list.

# $in and $nin — matching against a list of values

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# $in — users from UK or Spain
print("Users from UK or Spain ($in):")
results = db.users.find(
    {"country": {"$in": ["UK", "Spain"]}},
    {"name": 1, "country": 1, "_id": 0}
)
for u in results:
    print(f"  {u['name']} ({u['country']})")

# $in — multiple specific orders by _id (bulk fetch)
print("\nFetch specific orders by _id list ($in):")
results = db.orders.find(
    {"_id": {"$in": ["o001", "o003", "o005"]}},
    {"_id": 1, "status": 1, "total": 1}
)
for o in results:
    print(f"  {o['_id']}  {o['status']:12}  ${o['total']:.2f}")

# $nin — products NOT in Stationery or Furniture
print("\nProducts not in Stationery or Furniture ($nin):")
results = db.products.find(
    {"category": {"$nin": ["Stationery", "Furniture"]}},
    {"name": 1, "category": 1, "_id": 0}
)
for p in results:
    print(f"  {p['name']} ({p['category']})")

# $in on an array field — find users who have any of these tags
print("\nUsers with 'vip' or 'early_adopter' tag ($in on array):")
results = db.users.find(
    {"tags": {"$in": ["vip", "early_adopter"]}},
    {"name": 1, "tags": 1, "_id": 0}
)
for u in results:
    print(f"  {u['name']} — {u['tags']}")
Users from UK or Spain ($in):
Alice Johnson (UK)
Bob Smith (UK)
Clara Diaz (Spain)

Fetch specific orders by _id list ($in):
o001 delivered $44.96
o003 delivered $99.98
o005 delivered $329.98

Products not in Stationery or Furniture ($nin):
Wireless Mouse (Electronics)
Mechanical Keyboard (Electronics)
USB-C Hub (Electronics)
Monitor 27-inch (Electronics)

Users with 'vip' or 'early_adopter' tag ($in on array):
Alice Johnson — ['early_adopter', 'vip']
Clara Diaz — ['early_adopter']
Eva Müller — ['early_adopter', 'newsletter']
  • $in against an array field matches documents where the array contains any of the listed values — not all of them (use $all for that)
  • Fetching multiple documents by a list of known IDs with $in on _id is the standard bulk-fetch pattern — one query, no loop
  • $in with a large list (hundreds of values) can be slow — consider a different data model or a lookup aggregation stage instead
  • $nin is generally less index-efficient than $in because it matches most of the collection — apply it only to well-indexed fields

BSON Type Ordering in Comparisons

When MongoDB compares values of different BSON types — for example, a string field against a number — it follows a defined BSON type ordering. Understanding this prevents surprising results in mixed-type collections.

# BSON comparison order — types compared when mixed types exist in a field

bson_type_order = [
    (1,  "MinKey (internal)"),
    (2,  "Null"),
    (3,  "Numbers — Int32, Int64, Double, Decimal128"),
    (4,  "Symbol, String"),
    (5,  "Object (embedded document)"),
    (6,  "Array"),
    (7,  "BinData (binary)"),
    (8,  "ObjectId"),
    (9,  "Boolean"),
    (10, "Date"),
    (11, "Timestamp"),
    (12, "Regular expression"),
    (13, "MaxKey (internal)"),
]

print("BSON type comparison order (lowest to highest):")
for rank, type_name in bson_type_order:
    print(f"  {rank:2}.  {type_name}")

print("\nPractical implication:")
print("  A field storing '30' (string) is GREATER than 29 (int)")
print("  because strings sort higher than numbers in BSON order.")
print("  Always store values in a consistent type to avoid this.")
BSON type comparison order (lowest to highest):
1. MinKey (internal)
2. Null
3. Numbers — Int32, Int64, Double, Decimal128
4. Symbol, String
5. Object (embedded document)
6. Array
7. BinData (binary)
8. ObjectId
9. Boolean
10. Date
11. Timestamp
12. Regular expression
13. MaxKey (internal)

Practical implication:
A field storing '30' (string) is GREATER than 29 (int)
because strings sort higher than numbers in BSON order.
Always store values in a consistent type to avoid this.
  • All numeric types (Int32, Int64, Double, Decimal128) compare by value — 30 as Int32 equals 30.0 as Double
  • Mixed-type collections produce surprising comparison results — use the $type operator to audit and clean up inconsistent field types
  • Null sorts lower than all other types including numbers — {"age": {"$gt": 0}} will not match documents where age is null

Summary Table

Operator Meaning Example Note
$eq Equals {"status": {"$eq": "shipped"}} Shorthand: {"status": "shipped"}
$ne Not equal {"status": {"$ne": "delivered"}} Also matches missing field
$gt Greater than {"price": {"$gt": 50}} Boundary not included
$gte Greater than or equal {"age": {"$gte": 18}} Boundary included
$lt Less than {"stock": {"$lt": 10}} Boundary not included
$lte Less than or equal {"rating": {"$lte": 3}} Boundary included
$in Equals any in list {"country": {"$in": ["UK","US"]}} Also works on array fields
$nin Not equal to any in list {"category": {"$nin": ["X","Y"]}} Less index-efficient than $in

Practice Questions

Practice 1. Write a filter to find all products with a rating greater than or equal to 4.5.



Practice 2. What is the difference between $gt and $gte — give an example where only one would match a document with price = 50?



Practice 3. Write the filter to find all Dataplexa users who are from Germany, USA, or Spain using a single operator.



Practice 4. Why might $ne produce a large result set on an indexed field compared to $eq?



Practice 5. What is the BSON comparison rule that means {"age": {"$gt": 0}} will not match a document where age is null?



Quiz

Quiz 1. Which operator would you use to find all orders with a total between $50 and $200 inclusive?






Quiz 2. What does $ne match in addition to documents where the field has a different value?






Quiz 3. What is the most efficient way to fetch five specific documents by known _id values?






Quiz 4. In BSON comparison order, which type sorts higher — a string or a number?






Quiz 5. When $in is applied to an array field, what does it match?






Next up — Logical Operators: Combining filters with $and, $or, $not, and $nor to express complex multi-condition queries.