Database Design
Why Database Design is Important
A database can be built without any formal design process. Tables are created, columns are added as needs arise, and the application is shipped. This approach is common — and it works, for a while. The problems arrive later: a query that should take half a second starts taking forty. A simple update to a customer record requires touching eleven tables. A new business requirement means rebuilding half the schema from scratch. These are not bad luck — they are the predictable consequences of skipping design.
This lesson makes the case for why database design matters by working through the three categories of problems that poor design reliably produces: data anomalies, performance degradation, and unmaintainability.
The Cost of Poor Design
Every structural problem in a database that causes pain in production was a design decision — or the absence of one — made at the beginning. Fixing structural problems after data is live is significantly more expensive than preventing them during design.
The cost of poor design is not theoretical. It shows up as engineering time spent writing workarounds, business decisions made on incorrect data, and system rewrites that could have been avoided entirely. Understanding these costs concretely is what motivates the design discipline this course teaches.
Data Anomalies
A data anomaly is a situation where the same real-world fact is represented inconsistently in the database, or where a routine operation — insert, update, or delete — produces an unintended side effect. There are three classic anomaly types, and all three arise from the same root cause: storing more than one fact per row.
| Anomaly type | What happens | Example |
|---|---|---|
| Update anomaly | A fact stored in multiple rows must be changed in every row — if one is missed the database contains conflicting versions of the same fact | A product's price is stored on every order line item. The price is updated in three rows but missed in two. The database now shows five different prices for the same product depending on which row you look at. |
| Insert anomaly | A new fact cannot be recorded without also recording another unrelated fact | A new department cannot be added to the database until at least one employee is hired into it — because the table stores department and employee data together and employee_id is NOT NULL. |
| Delete anomaly | Deleting one fact unintentionally destroys another unrelated fact | The only employee in a department leaves the company. Deleting their row also deletes the only record of that department's existence, including its name, budget code, and location. |
A retailer stored unit_price directly on each order_items row rather than referencing the products table. When a supplier changed a wholesale price, a developer ran an UPDATE against order_items to reflect it. The UPDATE had a WHERE clause that was slightly too narrow — it updated 847 rows but missed 63 historical rows from a different sales channel. Those 63 rows now showed a price 12% lower than the current price. Reports comparing historical revenue to current margins produced numbers that could not be reconciled. The root cause was not the developer's WHERE clause — it was storing a fact that belongs to the product in the order item row.
Performance Degradation
A poorly designed schema does not just produce wrong data — it also produces slow queries. Performance problems from poor design are structurally different from performance problems that can be fixed with an index. When the schema itself is the problem, indexes can help at the margins but cannot fix the fundamental issue.
Wide, unnormalised tables — A table with 80 columns storing customer, order, product, and payment data in a single row forces every query to scan more data than it needs. Even a query that only wants the customer's name and email must read rows that contain dozens of irrelevant columns.
Missing foreign keys cause full scans — Without foreign key constraints, joins between tables have no guaranteed index on the relationship column. Queries that join orders to customers must scan every row of one table looking for matches rather than using an index lookup.
Duplicate data multiplies write cost — Every INSERT or UPDATE that modifies a fact stored in multiple places must touch multiple rows. At high transaction volumes — thousands of inserts per second — this multiplier becomes a throughput bottleneck.
No separation of read and write concerns — A schema designed only for OLTP workloads performs poorly for analytical queries that aggregate millions of rows. A schema not designed with any query pattern in mind performs poorly for everything. Design must consider how data will be queried, not just how it will be stored.
Unmaintainability
The most expensive long-term consequence of poor design is not wrong data or slow queries — it is a system that becomes progressively harder to change. Business requirements evolve constantly. New product categories are added. Regulations require new data to be captured. A new payment provider needs a different data structure. A well-designed database accommodates these changes with additive modifications — a new table, a new column, a new relationship. A poorly designed database requires restructuring existing tables that are already in production with live data, often with no safe migration path.
An e-commerce company launched a loyalty points programme two years after their platform went live. Their customers table had no concept of loyalty tiers — customer type was encoded as a single VARCHAR column called "notes" that contained free-text entries like "VIP" or "wholesale" entered manually by staff. Adding a structured loyalty_tier column required a data migration that parsed thousands of inconsistent free-text notes, wrote custom mapping logic for every variation, and still left 8% of records unclassifiable. The migration took three weeks of engineering time. A loyalty_tier column with a CHECK constraint would have taken ten minutes to add at design time.
What Good Design Prevents
Each problem category maps directly to a design technique this course covers. Understanding the problem is what gives the technique its purpose.
| Problem | Root cause | Design technique that prevents it |
|---|---|---|
| Update anomalies | The same fact is stored in more than one place | Normalisation — store each fact exactly once |
| Insert anomalies | Unrelated facts are forced into the same row | Entity separation — one table per real-world entity |
| Delete anomalies | Facts about different entities are stored together | Entity separation and referential integrity |
| Slow queries | Schema does not match query patterns; missing indexes | Physical design — indexing strategy and query-aware structure |
| Unmaintainability | Schema encodes assumptions that change over time | Logical design — flexible, normalised relational model |
| Inconsistent data | No constraints enforcing valid values | Integrity constraints — PRIMARY KEY, FOREIGN KEY, CHECK, NOT NULL |
Design as Risk Reduction
Database design is ultimately a risk reduction activity. The risks it manages are concrete: the risk that data becomes inconsistent and business decisions are made on wrong numbers; the risk that performance degrades as data volumes grow; the risk that the schema becomes a liability that slows every new feature. None of these risks are zero even with perfect design — but good design reduces their probability and their severity significantly.
The investment in design is front-loaded. It takes time to model entities correctly, to normalise tables, to define constraints. But that investment is paid back continuously over the lifetime of the system — in queries that run fast, in data that stays consistent, and in a schema that can evolve without being rebuilt from scratch.
Summary
| Concept | Key Point |
|---|---|
| Update anomaly | A fact stored in multiple rows becomes inconsistent when one row is updated and others are not |
| Insert anomaly | A new fact cannot be recorded without also recording an unrelated fact |
| Delete anomaly | Deleting one fact unintentionally destroys another unrelated fact stored in the same row |
| Performance degradation | Poor schema structure causes slow queries that indexes alone cannot fix |
| Unmaintainability | Schema that encodes rigid assumptions cannot accommodate evolving requirements without costly restructuring |
| Normalisation | Prevents anomalies by storing each fact in exactly one place |
| Design as risk reduction | Front-loaded investment that pays back continuously through data consistency, query performance, and schema flexibility |
Practice Questions
Practice 1. A product's price is stored on every order_items row. The price changes and a developer updates most but not all rows. What type of anomaly has occurred and what is its root cause?
Practice 2. A company cannot add a new department to the database until at least one employee is assigned to it. What type of anomaly is this and what design decision caused it?
Practice 3. Why can performance problems caused by poor schema design not always be fixed by adding indexes?
Practice 4. Describe a real scenario where a delete anomaly could cause serious business harm.
Practice 5. Why is the cost of database design described as front-loaded, and what does it pay back over time?
Quiz
Quiz 1. Which anomaly type makes it impossible to record a new fact without simultaneously recording an unrelated fact?
Quiz 2. A developer deletes the last employee in a department and the department record disappears from the database entirely. What type of anomaly is this?
Quiz 3. Which design technique directly prevents update anomalies by ensuring each fact is stored in exactly one place?
Quiz 4. A schema stores customer, order, product, and payment data all in one wide table with 80 columns. Queries that only need customer name and email are slow. Why can this not be fully fixed with indexes?
Quiz 5. Which of the following best describes why database design is called a risk reduction activity?
Next up — OLTP vs OLAP Systems — Understand the fundamental difference between systems designed for transactions and systems designed for analysis, and how that difference shapes every design decision.