DataBase Design Lesson 2 – Why Database is Important | Dataplexa

Why Database Design is Important

A database can be built without any formal design process. Tables are created, columns are added as needs arise, and the application is shipped. This approach is common — and it works, for a while. The problems arrive later: a query that should take half a second starts taking forty. A simple update to a customer record requires touching eleven tables. A new business requirement means rebuilding half the schema from scratch. These are not bad luck — they are the predictable consequences of skipping design.

This lesson makes the case for why database design matters by working through the three categories of problems that poor design reliably produces: data anomalies, performance degradation, and unmaintainability.

The Cost of Poor Design

Core Principle

Every structural problem in a database that causes pain in production was a design decision — or the absence of one — made at the beginning. Fixing structural problems after data is live is significantly more expensive than preventing them during design.

The cost of poor design is not theoretical. It shows up as engineering time spent writing workarounds, business decisions made on incorrect data, and system rewrites that could have been avoided entirely. Understanding these costs concretely is what motivates the design discipline this course teaches.

Data Anomalies

A data anomaly is a situation where the same real-world fact is represented inconsistently in the database, or where a routine operation — insert, update, or delete — produces an unintended side effect. There are three classic anomaly types, and all three arise from the same root cause: storing more than one fact per row.

Anomaly type	What happens	Example
Update anomaly	A fact stored in multiple rows must be changed in every row — if one is missed the database contains conflicting versions of the same fact	A product's price is stored on every order line item. The price is updated in three rows but missed in two. The database now shows five different prices for the same product depending on which row you look at.
Insert anomaly	A new fact cannot be recorded without also recording another unrelated fact	A new department cannot be added to the database until at least one employee is hired into it — because the table stores department and employee data together and employee_id is NOT NULL.
Delete anomaly	Deleting one fact unintentionally destroys another unrelated fact	The only employee in a department leaves the company. Deleting their row also deletes the only record of that department's existence, including its name, budget code, and location.

Real-world consequence — a retailer's pricing data

A retailer stored unit_price directly on each order_items row rather than referencing the products table. When a supplier changed a wholesale price, a developer ran an UPDATE against order_items to reflect it. The UPDATE had a WHERE clause that was slightly too narrow — it updated 847 rows but missed 63 historical rows from a different sales channel. Those 63 rows now showed a price 12% lower than the current price. Reports comparing historical revenue to current margins produced numbers that could not be reconciled. The root cause was not the developer's WHERE clause — it was storing a fact that belongs to the product in the order item row.

Performance Degradation

A poorly designed schema does not just produce wrong data — it also produces slow queries. Performance problems from poor design are structurally different from performance problems that can be fixed with an index. When the schema itself is the problem, indexes can help at the margins but cannot fix the fundamental issue.

Wide, unnormalised tables — A table with 80 columns storing customer, order, product, and payment data in a single row forces every query to scan more data than it needs. Even a query that only wants the customer's name and email must read rows that contain dozens of irrelevant columns.
Missing foreign keys cause full scans — Without foreign key constraints, joins between tables have no guaranteed index on the relationship column. Queries that join orders to customers must scan every row of one table looking for matches rather than using an index lookup.
Duplicate data multiplies write cost — Every INSERT or UPDATE that modifies a fact stored in multiple places must touch multiple rows. At high transaction volumes — thousands of inserts per second — this multiplier becomes a throughput bottleneck.
No separation of read and write concerns — A schema designed only for OLTP workloads performs poorly for analytical queries that aggregate millions of rows. A schema not designed with any query pattern in mind performs poorly for everything. Design must consider how data will be queried, not just how it will be stored.

Unmaintainability

The most expensive long-term consequence of poor design is not wrong data or slow queries — it is a system that becomes progressively harder to change. Business requirements evolve constantly. New product categories are added. Regulations require new data to be captured. A new payment provider needs a different data structure. A well-designed database accommodates these changes with additive modifications — a new table, a new column, a new relationship. A poorly designed database requires restructuring existing tables that are already in production with live data, often with no safe migration path.

Real-world consequence — a loyalty programme retrofit

An e-commerce company launched a loyalty points programme two years after their platform went live. Their customers table had no concept of loyalty tiers — customer type was encoded as a single VARCHAR column called "notes" that contained free-text entries like "VIP" or "wholesale" entered manually by staff. Adding a structured loyalty_tier column required a data migration that parsed thousands of inconsistent free-text notes, wrote custom mapping logic for every variation, and still left 8% of records unclassifiable. The migration took three weeks of engineering time. A loyalty_tier column with a CHECK constraint would have taken ten minutes to add at design time.

What Good Design Prevents

Each problem category maps directly to a design technique this course covers. Understanding the problem is what gives the technique its purpose.

Problem	Root cause	Design technique that prevents it
Update anomalies	The same fact is stored in more than one place	Normalisation — store each fact exactly once
Insert anomalies	Unrelated facts are forced into the same row	Entity separation — one table per real-world entity
Delete anomalies	Facts about different entities are stored together	Entity separation and referential integrity
Slow queries	Schema does not match query patterns; missing indexes	Physical design — indexing strategy and query-aware structure
Unmaintainability	Schema encodes assumptions that change over time	Logical design — flexible, normalised relational model
Inconsistent data	No constraints enforcing valid values	Integrity constraints — PRIMARY KEY, FOREIGN KEY, CHECK, NOT NULL

Design as Risk Reduction

Database design is ultimately a risk reduction activity. The risks it manages are concrete: the risk that data becomes inconsistent and business decisions are made on wrong numbers; the risk that performance degrades as data volumes grow; the risk that the schema becomes a liability that slows every new feature. None of these risks are zero even with perfect design — but good design reduces their probability and their severity significantly.

The investment in design is front-loaded. It takes time to model entities correctly, to normalise tables, to define constraints. But that investment is paid back continuously over the lifetime of the system — in queries that run fast, in data that stays consistent, and in a schema that can evolve without being rebuilt from scratch.

Summary

Concept	Key Point
Update anomaly	A fact stored in multiple rows becomes inconsistent when one row is updated and others are not
Insert anomaly	A new fact cannot be recorded without also recording an unrelated fact
Delete anomaly	Deleting one fact unintentionally destroys another unrelated fact stored in the same row
Performance degradation	Poor schema structure causes slow queries that indexes alone cannot fix
Unmaintainability	Schema that encodes rigid assumptions cannot accommodate evolving requirements without costly restructuring
Normalisation	Prevents anomalies by storing each fact in exactly one place
Design as risk reduction	Front-loaded investment that pays back continuously through data consistency, query performance, and schema flexibility

Practice Questions

Practice 1. A product's price is stored on every order_items row. The price changes and a developer updates most but not all rows. What type of anomaly has occurred and what is its root cause?

Practice 2. A company cannot add a new department to the database until at least one employee is assigned to it. What type of anomaly is this and what design decision caused it?

Practice 3. Why can performance problems caused by poor schema design not always be fixed by adding indexes?

Practice 4. Describe a real scenario where a delete anomaly could cause serious business harm.

Practice 5. Why is the cost of database design described as front-loaded, and what does it pay back over time?

Quiz

Quiz 1. Which anomaly type makes it impossible to record a new fact without simultaneously recording an unrelated fact?

Update anomaly
Insert anomaly
Delete anomaly
Referential anomaly

Quiz 2. A developer deletes the last employee in a department and the department record disappears from the database entirely. What type of anomaly is this?

Update anomaly
Insert anomaly
Delete anomaly
Cascade anomaly

Quiz 3. Which design technique directly prevents update anomalies by ensuring each fact is stored in exactly one place?

Indexing strategy
Normalisation
Partitioning
Denormalisation

Quiz 4. A schema stores customer, order, product, and payment data all in one wide table with 80 columns. Queries that only need customer name and email are slow. Why can this not be fully fixed with indexes?

Every row is oversized — queries must read the full row width regardless of which columns are needed
Indexes cannot be created on tables with more than 30 columns
Indexes only help with JOIN operations, not SELECT queries
Wide tables disable the query optimiser

Quiz 5. Which of the following best describes why database design is called a risk reduction activity?

It guarantees that no bugs will ever occur in the application layer
It eliminates the need for database administrators after go-live
It ensures queries will always run under one second
It reduces the probability and severity of data inconsistency, performance degradation, and unmaintainability

Next up — OLTP vs OLAP Systems — Understand the fundamental difference between systems designed for transactions and systems designed for analysis, and how that difference shapes every design decision.

← Previous Course Index Next →