DataBase Design Lesson 2 – Why Database is Important | Dataplexa

Why Database Design is Important

A database can be built without any formal design process. Tables are created, columns are added as needs arise, and the application is shipped. This approach is common — and it works, for a while. The problems arrive later: a query that should take half a second starts taking forty. A simple update to a customer record requires touching eleven tables. A new business requirement means rebuilding half the schema from scratch. These are not bad luck — they are the predictable consequences of skipping design.

This lesson makes the case for why database design matters by working through the three categories of problems that poor design reliably produces: data anomalies, performance degradation, and unmaintainability.

The Cost of Poor Design

Core Principle

Every structural problem in a database that causes pain in production was a design decision — or the absence of one — made at the beginning. Fixing structural problems after data is live is significantly more expensive than preventing them during design.

The cost of poor design is not theoretical. It shows up as engineering time spent writing workarounds, business decisions made on incorrect data, and system rewrites that could have been avoided entirely. Understanding these costs concretely is what motivates the design discipline this course teaches.

Data Anomalies

A data anomaly is a situation where the same real-world fact is represented inconsistently in the database, or where a routine operation — insert, update, or delete — produces an unintended side effect. There are three classic anomaly types, and all three arise from the same root cause: storing more than one fact per row.

Anomaly typeWhat happensExample
Update anomalyA fact stored in multiple rows must be changed in every row — if one is missed the database contains conflicting versions of the same factA product's price is stored on every order line item. The price is updated in three rows but missed in two. The database now shows five different prices for the same product depending on which row you look at.
Insert anomalyA new fact cannot be recorded without also recording another unrelated factA new department cannot be added to the database until at least one employee is hired into it — because the table stores department and employee data together and employee_id is NOT NULL.
Delete anomalyDeleting one fact unintentionally destroys another unrelated factThe only employee in a department leaves the company. Deleting their row also deletes the only record of that department's existence, including its name, budget code, and location.
Real-world consequence — a retailer's pricing data

A retailer stored unit_price directly on each order_items row rather than referencing the products table. When a supplier changed a wholesale price, a developer ran an UPDATE against order_items to reflect it. The UPDATE had a WHERE clause that was slightly too narrow — it updated 847 rows but missed 63 historical rows from a different sales channel. Those 63 rows now showed a price 12% lower than the current price. Reports comparing historical revenue to current margins produced numbers that could not be reconciled. The root cause was not the developer's WHERE clause — it was storing a fact that belongs to the product in the order item row.

Performance Degradation

A poorly designed schema does not just produce wrong data — it also produces slow queries. Performance problems from poor design are structurally different from performance problems that can be fixed with an index. When the schema itself is the problem, indexes can help at the margins but cannot fix the fundamental issue.

  • Wide, unnormalised tables — A table with 80 columns storing customer, order, product, and payment data in a single row forces every query to scan more data than it needs. Even a query that only wants the customer's name and email must read rows that contain dozens of irrelevant columns.

  • Missing foreign keys cause full scans — Without foreign key constraints, joins between tables have no guaranteed index on the relationship column. Queries that join orders to customers must scan every row of one table looking for matches rather than using an index lookup.

  • Duplicate data multiplies write cost — Every INSERT or UPDATE that modifies a fact stored in multiple places must touch multiple rows. At high transaction volumes — thousands of inserts per second — this multiplier becomes a throughput bottleneck.

  • No separation of read and write concerns — A schema designed only for OLTP workloads performs poorly for analytical queries that aggregate millions of rows. A schema not designed with any query pattern in mind performs poorly for everything. Design must consider how data will be queried, not just how it will be stored.

Unmaintainability

The most expensive long-term consequence of poor design is not wrong data or slow queries — it is a system that becomes progressively harder to change. Business requirements evolve constantly. New product categories are added. Regulations require new data to be captured. A new payment provider needs a different data structure. A well-designed database accommodates these changes with additive modifications — a new table, a new column, a new relationship. A poorly designed database requires restructuring existing tables that are already in production with live data, often with no safe migration path.

Real-world consequence — a loyalty programme retrofit

An e-commerce company launched a loyalty points programme two years after their platform went live. Their customers table had no concept of loyalty tiers — customer type was encoded as a single VARCHAR column called "notes" that contained free-text entries like "VIP" or "wholesale" entered manually by staff. Adding a structured loyalty_tier column required a data migration that parsed thousands of inconsistent free-text notes, wrote custom mapping logic for every variation, and still left 8% of records unclassifiable. The migration took three weeks of engineering time. A loyalty_tier column with a CHECK constraint would have taken ten minutes to add at design time.

What Good Design Prevents

Each problem category maps directly to a design technique this course covers. Understanding the problem is what gives the technique its purpose.

ProblemRoot causeDesign technique that prevents it
Update anomaliesThe same fact is stored in more than one placeNormalisation — store each fact exactly once
Insert anomaliesUnrelated facts are forced into the same rowEntity separation — one table per real-world entity
Delete anomaliesFacts about different entities are stored togetherEntity separation and referential integrity
Slow queriesSchema does not match query patterns; missing indexesPhysical design — indexing strategy and query-aware structure
UnmaintainabilitySchema encodes assumptions that change over timeLogical design — flexible, normalised relational model
Inconsistent dataNo constraints enforcing valid valuesIntegrity constraints — PRIMARY KEY, FOREIGN KEY, CHECK, NOT NULL

Design as Risk Reduction

Database design is ultimately a risk reduction activity. The risks it manages are concrete: the risk that data becomes inconsistent and business decisions are made on wrong numbers; the risk that performance degrades as data volumes grow; the risk that the schema becomes a liability that slows every new feature. None of these risks are zero even with perfect design — but good design reduces their probability and their severity significantly.

The investment in design is front-loaded. It takes time to model entities correctly, to normalise tables, to define constraints. But that investment is paid back continuously over the lifetime of the system — in queries that run fast, in data that stays consistent, and in a schema that can evolve without being rebuilt from scratch.

Summary

ConceptKey Point
Update anomalyA fact stored in multiple rows becomes inconsistent when one row is updated and others are not
Insert anomalyA new fact cannot be recorded without also recording an unrelated fact
Delete anomalyDeleting one fact unintentionally destroys another unrelated fact stored in the same row
Performance degradationPoor schema structure causes slow queries that indexes alone cannot fix
UnmaintainabilitySchema that encodes rigid assumptions cannot accommodate evolving requirements without costly restructuring
NormalisationPrevents anomalies by storing each fact in exactly one place
Design as risk reductionFront-loaded investment that pays back continuously through data consistency, query performance, and schema flexibility

Practice Questions

Practice 1. A product's price is stored on every order_items row. The price changes and a developer updates most but not all rows. What type of anomaly has occurred and what is its root cause?



Practice 2. A company cannot add a new department to the database until at least one employee is assigned to it. What type of anomaly is this and what design decision caused it?



Practice 3. Why can performance problems caused by poor schema design not always be fixed by adding indexes?



Practice 4. Describe a real scenario where a delete anomaly could cause serious business harm.



Practice 5. Why is the cost of database design described as front-loaded, and what does it pay back over time?



Quiz

Quiz 1. Which anomaly type makes it impossible to record a new fact without simultaneously recording an unrelated fact?






Quiz 2. A developer deletes the last employee in a department and the department record disappears from the database entirely. What type of anomaly is this?






Quiz 3. Which design technique directly prevents update anomalies by ensuring each fact is stored in exactly one place?






Quiz 4. A schema stores customer, order, product, and payment data all in one wide table with 80 columns. Queries that only need customer name and email are slow. Why can this not be fully fixed with indexes?






Quiz 5. Which of the following best describes why database design is called a risk reduction activity?






Next up — OLTP vs OLAP Systems — Understand the fundamental difference between systems designed for transactions and systems designed for analysis, and how that difference shapes every design decision.