Data Science Lesson 38 – NoSQL | Dataplexa
Data Storage · Lesson 38

NoSQL

Master document databases, key-value stores, and graph systems that power modern e-commerce platforms like Flipkart and Amazon.

Coming from SQL? You'll find NoSQL surprisingly liberating. No rigid schemas. No complex joins. Just flexible data storage that scales horizontally across thousands of servers.

The name "NoSQL" is honestly misleading — it doesn't mean "no SQL at all." It means "Not Only SQL". Most NoSQL databases support some SQL-like querying. The real difference? They trade ACID compliance for massive scalability.

Why NoSQL Exists

Picture Swiggy during dinner rush. 50,000 orders per minute. Customer profiles, restaurant menus, delivery locations, real-time tracking. Traditional SQL databases hit a wall around 10,000 concurrent users. That's the 90% case where SQL works fine — but the 10% trips everyone up.

1

Scale Horizontally

2

Handle Unstructured Data

3

Real-time Performance

4

Developer Flexibility

Four NoSQL Types

Document

MongoDB, CouchDB. JSON-like documents. Perfect for catalogs, user profiles.

Key-Value

Redis, DynamoDB. Simple pairs. Caching, session storage, real-time data.

Column

Cassandra, HBase. Wide columns. Analytics, time-series data.

Graph

Neo4j, Amazon Neptune. Relationships. Social networks, recommendations.

MongoDB Essentials

MongoDB dominates the document database space. Think of it as SQL tables but each row can have completely different columns. No predefined schema. Store nested objects, arrays, any JSON structure.

The scenario: You're the lead analyst at BigBasket. Product catalog has thousands of variations — electronics have specifications, groceries have nutritional info, books have authors. One flexible collection handles everything.
# Install and import pymongo for MongoDB connection
import pymongo
from pymongo import MongoClient
import pandas as pd

# Connect to MongoDB (local instance)
client = MongoClient('mongodb://localhost:27017/')
# Create or access database
db = client['bigbasket_catalog']

What just happened?

We connected to MongoDB running locally on port 27017. The database bigbasket_catalog gets created automatically when we first insert data. Try this: Check if MongoDB is running with brew services start mongodb-community on Mac.

# Create collection (like a SQL table)
products = db['products']

# Insert a complex product document
smartphone = {
    "product_id": "PROD001",
    "name": "iPhone 15 Pro",
    "category": "Electronics",
    "price": 129900,
    "specifications": {
        "storage": "256GB",
        "ram": "8GB",
        "camera": "48MP Triple Camera"
    }
}

What just happened?

We created a products collection and defined a document with nested objects. Notice the specifications field contains multiple sub-fields — impossible in traditional SQL without separate tables. Try this: Add more nested levels like specifications.camera.features.

# Insert the document
result = products.insert_one(smartphone)
print(f"Inserted document ID: {result.inserted_id}")

# Insert a completely different product structure
grocery_item = {
    "product_id": "PROD002",
    "name": "Organic Basmati Rice",
    "category": "Food",
    "price": 299,
    "nutrition": {
        "calories_per_100g": 130,
        "protein": "2.7g",
        "carbs": "28g"
    },
    "certifications": ["Organic", "Non-GMO"]
}

What just happened?

MongoDB auto-generated a unique _id field. The grocery item has completely different fields — nutrition instead of specifications, plus an array certifications. Same collection, totally different structure. Try this: Insert a book with author, ISBN, and page count.

Querying Documents

# Find all products
all_products = products.find()
for product in all_products:
    print(f"Product: {product['name']}")
    
# Find specific category
electronics = products.find({"category": "Electronics"})
print(f"\nElectronics found: {electronics.count()}")

# Query nested fields using dot notation
high_storage = products.find({"specifications.storage": "256GB"})
for item in high_storage:
    print(f"High storage device: {item['name']}")

What just happened?

The dot notation specifications.storage queries nested objects. find() returns a cursor, not the actual data — you iterate through it. The grocery item was skipped in the storage query because it doesn't have a specifications field. Try this: Query array elements with certifications: "Organic".

MongoDB dominates with 58% market share, followed by Redis for caching and real-time applications

Document databases lead because they match how developers think. JSON objects everywhere — APIs, frontend state, configuration files. Why transform data between different formats when you can store it natively? Key-value stores like Redis shine for specific use cases — session storage, caching, real-time leaderboards. Simple but blazingly fast. You wouldn't build a complex application on Redis alone, but it's perfect as a supporting actor.

Redis for Speed

Redis keeps everything in memory. That means sub-millisecond response times but limited by RAM capacity. Perfect for caching frequently accessed data, session management, and real-time analytics.

The scenario: Zomato's recommendation engine needs to track user preferences in real-time. Every click, every search, every order updates the preference score. Traditional databases can't handle 100,000 updates per second.
# Install and import redis
import redis

# Connect to Redis (default localhost:6379)
r = redis.Redis(host='localhost', port=6379, db=0)

# Test connection
r.ping()
print("Connected to Redis!")
# Store user preference scores
r.set("user:12345:cuisine:italian", 8.5)
r.set("user:12345:cuisine:chinese", 7.2)
r.set("user:12345:cuisine:indian", 9.1)

# Retrieve preference
italian_score = r.get("user:12345:cuisine:italian")
print(f"Italian cuisine score: {float(italian_score)}")

# Increment score atomically (thread-safe)
r.incrbyfloat("user:12345:cuisine:italian", 0.3)
new_score = r.get("user:12345:cuisine:italian")
print(f"Updated Italian score: {float(new_score)}")

What just happened?

Redis stores everything as strings — we convert to float for math. The key structure user:12345:cuisine:italian creates a namespace. incrbyfloat is atomic — no race conditions even with millions of concurrent users. Try this: Use expire to auto-delete keys after 24 hours.

📊 Data Insight

Redis can handle 500,000+ operations per second on standard hardware. MongoDB peaks around 10,000 inserts/second. The 50x speed difference makes Redis essential for real-time features like live chat, gaming leaderboards, and recommendation engines.

SQL vs NoSQL Trade-offs

Aspect SQL NoSQL
Schema Rigid, predefined Flexible, evolving
Scaling Vertical (bigger servers) Horizontal (more servers)
Consistency ACID guaranteed Eventual consistency
Query Language Standardized SQL Database-specific
Best For Complex relationships Rapid development, scale
The CAP theorem explains the fundamental trade-off. You can only guarantee two of three: Consistency, Availability, Partition tolerance. SQL chooses consistency. Most NoSQL systems choose availability and partition tolerance.

Common Mistake

Thinking NoSQL means "no relationships." Many NoSQL databases support references and joins — they're just not enforced at the database level. The exact fix: Design your data model to minimize relationships, but don't eliminate them entirely.

SQL excels at consistency and complex queries, while NoSQL dominates performance and scalability

The radar chart reveals why both technologies coexist. SQL databases shine for financial systems, inventory management, anything requiring perfect consistency. Banking transactions must never go missing or duplicate. NoSQL databases excel at user-facing features — social media feeds, product catalogs, real-time messaging. Instagram can survive showing you an old photo, but can't survive being slow. The performance and scalability advantages outweigh occasional inconsistencies.

Choosing the Right Database

Choose SQL When

  • Financial transactions
  • Complex reporting
  • Established data structure
  • Team knows SQL well

Choose NoSQL When

  • Rapid prototyping
  • Massive scale required
  • Varying data structures
  • Real-time performance

NoSQL delivers 5x faster response times and handles 10x more concurrent users than traditional SQL

The performance gap is dramatic. NoSQL response times of 8ms versus SQL's 45ms might seem small, but multiply by millions of requests. Those milliseconds translate to user engagement and revenue. Development speed tells the real story. NoSQL lets you iterate faster — add new fields, change data structures, deploy without migrations. SQL requires careful planning, schema changes, downtime. Both approaches work, but for different organizational rhythms.

Quiz

1. Your e-commerce platform needs to store product information where electronics have technical specifications, clothing has size charts, and books have author details. Each category requires completely different attributes. What makes document databases ideal for this scenario?


2. A food delivery app needs to update user preference scores in real-time as customers browse restaurants. The system handles 100,000 preference updates per second during peak hours. Why is Redis particularly suited for this use case compared to MongoDB?


3. A fintech startup is building a payment platform that needs both a banking transaction system (requiring perfect consistency) and a merchant product catalog (requiring fast reads and flexible schemas). What's the best architectural approach?


Up Next

Data Modeling

Learn how to design efficient database schemas and relationships that scale from startup to enterprise, building on the SQL and NoSQL foundations you've mastered.