Data Warehouse Basics
So far, we have focused on databases that support day-to-day operations such as inserting, updating, and deleting records.
In this lesson, we shift our focus to Data Warehouses, which are designed for analytics, reporting, and decision-making.
What is a Data Warehouse?
A data warehouse is a centralized system that stores historical data from multiple sources for analysis and reporting.
Unlike operational databases, data warehouses are optimized for:
- Read-heavy workloads
- Complex analytical queries
- Large data volumes
OLTP vs OLAP
| Aspect | OLTP (Operational) | OLAP (Analytical) |
|---|---|---|
| Purpose | Daily transactions | Analysis & reporting |
| Queries | Simple, frequent | Complex, long-running |
| Updates | Frequent | Rare / batch-based |
| Schema | Highly normalized | Denormalized |
Why Not Use OLTP for Analytics?
Running heavy analytical queries on OLTP databases:
- Slows down applications
- Locks critical tables
- Impacts user experience
Data warehouses solve this by separating analytics from operations.
Key Characteristics of a Data Warehouse
- Subject-oriented – Organized by business topics
- Integrated – Combines multiple data sources
- Time-variant – Stores historical data
- Non-volatile – Data is rarely updated
Common Data Sources
A data warehouse may collect data from:
- Operational databases
- CRM systems
- ERP systems
- APIs and logs
Fact Tables
A fact table stores measurable business data.
Examples:
- Sales amount
- Quantity sold
- Revenue
sales_fact ----------- date_id product_id customer_id amount quantity
Dimension Tables
Dimension tables provide descriptive context to facts.
Examples:
- Time
- Customer
- Product
- Location
product_dim ------------ product_id product_name category brand
Star Schema
The star schema is the most common data warehouse design.
It consists of:
- One central fact table
- Multiple surrounding dimension tables
This structure simplifies queries and improves performance.
Snowflake Schema
A snowflake schema is a variation of the star schema.
Dimension tables are further normalized.
This saves space but increases query complexity.
Star vs Snowflake Schema
| Aspect | Star Schema | Snowflake Schema |
|---|---|---|
| Query simplicity | High | Lower |
| Storage | More | Less |
| Performance | Faster | Slightly slower |
Slowly Changing Dimensions (SCD)
Dimension data may change over time.
Examples:
- Customer address changes
- Product category changes
Data warehouses track these changes using Slowly Changing Dimensions.
Why SQL is Critical for Data Warehouses
SQL is the primary language for:
- Analytical queries
- Aggregations
- Reporting
- Dashboard backends
Strong SQL skills are essential for analytics roles.
Common Beginner Mistakes
- Using OLTP schema for analytics
- Over-normalizing warehouse tables
- Ignoring historical data
- Not planning for data growth
Real-World Use Cases
- Business intelligence dashboards
- Sales performance analysis
- Customer behavior analytics
- Financial reporting
What’s Next?
In the next lesson, we will explore ETL Basics, learning how data is extracted, transformed, and loaded into a data warehouse.