Data Warehouse Basics | Dataplexa

Data Warehouse Basics

So far, we have focused on databases that support day-to-day operations such as inserting, updating, and deleting records.

In this lesson, we shift our focus to Data Warehouses, which are designed for analytics, reporting, and decision-making.


What is a Data Warehouse?

A data warehouse is a centralized system that stores historical data from multiple sources for analysis and reporting.

Unlike operational databases, data warehouses are optimized for:

  • Read-heavy workloads
  • Complex analytical queries
  • Large data volumes

OLTP vs OLAP

Aspect OLTP (Operational) OLAP (Analytical)
Purpose Daily transactions Analysis & reporting
Queries Simple, frequent Complex, long-running
Updates Frequent Rare / batch-based
Schema Highly normalized Denormalized

Why Not Use OLTP for Analytics?

Running heavy analytical queries on OLTP databases:

  • Slows down applications
  • Locks critical tables
  • Impacts user experience

Data warehouses solve this by separating analytics from operations.


Key Characteristics of a Data Warehouse

  • Subject-oriented – Organized by business topics
  • Integrated – Combines multiple data sources
  • Time-variant – Stores historical data
  • Non-volatile – Data is rarely updated

Common Data Sources

A data warehouse may collect data from:

  • Operational databases
  • CRM systems
  • ERP systems
  • APIs and logs

Fact Tables

A fact table stores measurable business data.

Examples:

  • Sales amount
  • Quantity sold
  • Revenue
sales_fact
-----------
date_id
product_id
customer_id
amount
quantity
  

Dimension Tables

Dimension tables provide descriptive context to facts.

Examples:

  • Time
  • Customer
  • Product
  • Location
product_dim
------------
product_id
product_name
category
brand
  

Star Schema

The star schema is the most common data warehouse design.

It consists of:

  • One central fact table
  • Multiple surrounding dimension tables

This structure simplifies queries and improves performance.


Snowflake Schema

A snowflake schema is a variation of the star schema.

Dimension tables are further normalized.

This saves space but increases query complexity.


Star vs Snowflake Schema

Aspect Star Schema Snowflake Schema
Query simplicity High Lower
Storage More Less
Performance Faster Slightly slower

Slowly Changing Dimensions (SCD)

Dimension data may change over time.

Examples:

  • Customer address changes
  • Product category changes

Data warehouses track these changes using Slowly Changing Dimensions.


Why SQL is Critical for Data Warehouses

SQL is the primary language for:

  • Analytical queries
  • Aggregations
  • Reporting
  • Dashboard backends

Strong SQL skills are essential for analytics roles.


Common Beginner Mistakes

  • Using OLTP schema for analytics
  • Over-normalizing warehouse tables
  • Ignoring historical data
  • Not planning for data growth

Real-World Use Cases

  • Business intelligence dashboards
  • Sales performance analysis
  • Customer behavior analytics
  • Financial reporting

What’s Next?

In the next lesson, we will explore ETL Basics, learning how data is extracted, transformed, and loaded into a data warehouse.