MS SQL Lesson 9 – SQL Server Data Types | Dataplexa

SQL Server Data Types

Every column in every table has a data type, and that single decision — made once at table creation — affects storage size, query performance, index efficiency, and data accuracy for the entire lifetime of the database. A column defined too broadly wastes space and slows comparisons. A column defined too narrowly causes truncation errors or forces expensive workarounds later. A type mismatch between a column and the value being compared against it silently disables index use, turning a millisecond lookup into a full table scan. This lesson covers every major data type category in SQL Server, explains when to use each one, and shows the real consequences of getting the choice wrong — using the DataplexaStore schema as the working example throughout.

Exact Numeric Types — Integers

Integer types store whole numbers with no decimal component. SQL Server offers four integer types that differ only in their storage size and the range of values they can hold. TINYINT uses 1 byte and holds 0 to 255 — suitable for small lookup codes and ratings. SMALLINT uses 2 bytes and holds −32,768 to 32,767. INT uses 4 bytes and holds approximately −2.1 billion to 2.1 billion — the right default for most primary keys and quantity columns. BIGINT uses 8 bytes and holds up to 9.2 quintillion — reserved for columns that genuinely need that range, such as financial transaction counters at scale. The principle is always to use the smallest type that comfortably covers the expected range — not because storage is expensive today, but because narrower types fit more rows per page, which means more rows fit in the Buffer Pool per megabyte of RAM, which directly improves query speed.

-- Compare the four integer types: storage size and value range

SELECT
    'TINYINT'   AS type_name, 1    AS bytes, 0           AS min_value, 255          AS max_value
UNION ALL SELECT
    'SMALLINT',                2,             -32768,                    32767
UNION ALL SELECT
    'INT',                     4,             -2147483648,               2147483647
UNION ALL SELECT
    'BIGINT',                  8,             -9223372036854775808,      9223372036854775807;

-- Check which integer type each DataplexaStore column uses
SELECT
    TABLE_NAME,
    COLUMN_NAME,
    DATA_TYPE,
    NUMERIC_PRECISION               AS digits
FROM INFORMATION_SCHEMA.COLUMNS
WHERE DATA_TYPE IN ('int','tinyint','smallint','bigint')
  AND TABLE_NAME IN ('Users','Products','Orders','OrderItems','Reviews')
ORDER BY TABLE_NAME, COLUMN_NAME;

type_name bytes min_value max_value
───────── ───── ──────────────────── ────────────────────
TINYINT 1 0 255
SMALLINT 2 -32768 32767
INT 4 -2147483648 2147483647
BIGINT 8 -9223372036854775808 9223372036854775807

TABLE_NAME COLUMN_NAME DATA_TYPE digits
────────── ──────────── ───────── ──────
OrderItems OrderID int 10
OrderItems OrderItemID int 10
OrderItems ProductID int 10
OrderItems Quantity int 10
Orders OrderID int 10
Orders UserID int 10
Products ProductID int 10
Products StockQty int 10
Reviews ProductID int 10
Reviews Rating tinyint 3
Reviews ReviewID int 10
Reviews UserID int 10
Users UserID int 10

Rating uses TINYINT because valid values are 1–5 — there is no reason to allocate 4 bytes for a column that will never exceed the value 5
All primary and foreign key columns use INT — they will comfortably hold millions of rows before approaching the 2.1 billion limit, and INT aligns naturally with IDENTITY(1,1)
If a primary key column ever approached the INT limit you would ALTER it to BIGINT — a painful migration on a large table, which is why starting with INT rather than TINYINT for keys is the pragmatic choice even when early row counts are small

Exact Numeric Types — Decimal and Money

For numbers with a fractional component where exact precision is required — prices, totals, tax amounts, measurements — SQL Server provides DECIMAL (also called NUMERIC, which is identical). DECIMAL(p, s) takes two parameters: precision (total number of significant digits) and scale (digits after the decimal point). DECIMAL(10, 2) stores up to 10 total digits with exactly 2 after the decimal point, covering values up to 99,999,999.99 — which is appropriate for currency in most retail systems. The storage cost ranges from 5 to 17 bytes depending on precision. SQL Server also offers MONEY (8 bytes, 4 decimal places) and SMALLMONEY (4 bytes), but DECIMAL is preferred in modern schemas because its precision and scale are explicit and portable across database platforms. Never use FLOAT or REAL for financial values — these are approximate types that introduce rounding errors that compound unpredictably over calculations.

-- Demonstrate why FLOAT is wrong for money and DECIMAL is right

DECLARE @float_price   FLOAT   = 29.99;
DECLARE @decimal_price DECIMAL(10,2) = 29.99;

-- Multiply by 3 — a simple calculation any e-commerce system does constantly
SELECT
    @float_price   * 3  AS float_result,    -- floating point arithmetic introduces error
    @decimal_price * 3  AS decimal_result;  -- exact arithmetic preserves every cent

-- Show DECIMAL storage sizes by precision
SELECT
    'DECIMAL(5,2)'   AS declaration, 5  AS bytes_used UNION ALL
SELECT 'DECIMAL(9,2)',                5              UNION ALL
SELECT 'DECIMAL(10,2)',               9              UNION ALL
SELECT 'DECIMAL(19,2)',               9              UNION ALL
SELECT 'DECIMAL(28,2)',               13             UNION ALL
SELECT 'DECIMAL(38,2)',               17;

float_result decimal_result
──────────── ──────────────
89.97000...4 89.97

declaration bytes_used
──────────── ──────────
DECIMAL(5,2) 5
DECIMAL(9,2) 5
DECIMAL(10,2) 9
DECIMAL(19,2) 9
DECIMAL(28,2) 13
DECIMAL(38,2) 17

The FLOAT result is not exactly 89.97 — it has a tiny rounding error from binary floating-point representation; in a system processing thousands of transactions this error accumulates and produces incorrect totals
DECIMAL(10,2) jumps from 5 bytes to 9 bytes — if your values will never exceed 99,999.99 you can use DECIMAL(7,2) and stay in the 5-byte tier, saving 4 bytes per row across potentially millions of order records
DataplexaStore uses DECIMAL(10,2) for Price and TotalAmount — covering values up to $99,999,999.99, which is appropriate for a retail platform without being wastefully large

String Types — CHAR, VARCHAR, NCHAR, NVARCHAR

SQL Server has four main string types and the differences between them matter significantly. CHAR(n) and NCHAR(n) are fixed-length — every row uses exactly n bytes (or 2n bytes for the Unicode variants) regardless of how much text is actually stored; short values are padded with spaces. VARCHAR(n) and NVARCHAR(n) are variable-length — only the actual characters plus a 2-byte overhead are stored. The N prefix indicates Unicode storage using UTF-16 encoding, which supports every human language. Use VARCHAR for columns that will only ever contain ASCII text such as internal codes, URLs, or English-only identifiers. Use NVARCHAR for any column that might hold names, addresses, descriptions, or user-generated content from anywhere in the world. The special size MAX — as in NVARCHAR(MAX) — allows up to 2 GB of text per value, but MAX columns cannot be indexed and should be reserved for genuinely large text like product descriptions or article bodies.

-- Compare fixed vs variable length string storage behaviour

DECLARE @char_val    CHAR(20)     = 'SQL';      -- always 20 bytes — padded with 17 spaces
DECLARE @varchar_val VARCHAR(20)  = 'SQL';      -- only 5 bytes (3 chars + 2 overhead)
DECLARE @nvarchar_val NVARCHAR(20) = N'SQL';    -- only 8 bytes (3 chars × 2 bytes + 2 overhead)
DECLARE @unicode_val  NVARCHAR(50) = N'مرحبا';  -- Arabic — only works in NVARCHAR, not VARCHAR

SELECT
    DATALENGTH(@char_val)     AS char_bytes,      -- 20 regardless of content
    DATALENGTH(@varchar_val)  AS varchar_bytes,   -- 3 + 2 overhead = 5
    DATALENGTH(@nvarchar_val) AS nvarchar_bytes,  -- (3 × 2) + 2 = 8
    DATALENGTH(@unicode_val)  AS arabic_bytes;    -- 5 chars × 2 bytes = 10

char_bytes varchar_bytes nvarchar_bytes arabic_bytes
────────── ───────────── ────────────── ────────────
20 3 6 10

CHAR wastes 17 bytes storing the word 'SQL' in a CHAR(20) column — for a table with 10 million rows that is 170 MB of wasted space storing nothing but spaces
VARCHAR and NVARCHAR only store what is there — the right default for any text column where values vary in length
The N prefix on string literals (N'مرحبا') is required when assigning Unicode text to an NVARCHAR variable or column — without it SQL Server treats the literal as VARCHAR and may corrupt characters outside the ASCII range

Date and Time Types

SQL Server provides six date and time types, each suited to a different need. DATE stores only a calendar date (3 bytes) — no time component, no timezone. It is the right choice for birthdays, hire dates, and any column where time of day is irrelevant. DATETIME2(n) stores both date and time with up to 100-nanosecond precision, uses 6–8 bytes depending on the fractional seconds precision specified, and has a range from year 0001 to year 9999 — it is the correct replacement for the older DATETIME type in all new schemas. DATETIMEOFFSET includes a timezone offset and is essential for applications serving users across multiple time zones. TIME(n) stores only the time of day. Never store dates as VARCHAR strings — string dates cannot be compared with date arithmetic, cannot be range-scanned efficiently by an index, and sort alphabetically rather than chronologically, producing silently wrong results.

-- Compare date and time types: storage, range, and precision

SELECT
    'DATE'              AS type_name, 3  AS bytes, '0001-01-01'             AS min_value,
    '9999-12-31'        AS max_value,  'date only'                          AS notes
UNION ALL SELECT
    'TIME(7)',                         5,  '00:00:00.0000000',
    '23:59:59.9999999',                    '100ns precision'
UNION ALL SELECT
    'DATETIME2(7)',                    8,  '0001-01-01 00:00:00.0000000',
    '9999-12-31 23:59:59.9999999',         'recommended for all new schemas'
UNION ALL SELECT
    'DATETIMEOFFSET',                  10, '0001-01-01 +00:00',
    '9999-12-31 +14:00',                   'includes timezone offset'
UNION ALL SELECT
    'DATETIME',                        8,  '1753-01-01',
    '9999-12-31',                          'legacy — use DATETIME2 instead';

-- Demonstrate why storing dates as strings is dangerous
DECLARE @date_string VARCHAR(10) = '2024-13-01'; -- month 13 does not exist
DECLARE @date_proper DATE;

-- This silently stores invalid data — no error raised
SET @date_string = '2024-13-01';

-- This raises an error immediately — SQL Server validates the date
BEGIN TRY
    SET @date_proper = CAST('2024-13-01' AS DATE);
END TRY
BEGIN CATCH
    SELECT ERROR_MESSAGE() AS validation_error;
END CATCH

type_name bytes min_value max_value notes
─────────────── ───── ───────────────────────────── ───────────────────────────── ─────────────────────────────
DATE 3 0001-01-01 9999-12-31 date only
TIME(7) 5 00:00:00.0000000 23:59:59.9999999 100ns precision
DATETIME2(7) 8 0001-01-01 00:00:00.0000000 9999-12-31 23:59:59.9999999 recommended for all new schemas
DATETIMEOFFSET 10 0001-01-01 +00:00 9999-12-31 +14:00 includes timezone offset
DATETIME 8 1753-01-01 9999-12-31 legacy — use DATETIME2 instead

validation_error
───────────────────────────────────────────────────────
Conversion failed when converting date and/or time from character string.

DATE uses only 3 bytes vs DATETIME2's 8 — for a column like JoinDate in the Users table where time of day is irrelevant, DATE saves 5 bytes per row and makes the column's intent immediately clear to every developer who reads the schema
The string '2024-13-01' is stored silently without error — it is just a string — but the moment anyone tries to use it in a date calculation the query fails; using a DATE column catches this at INSERT time, where it is cheapest to fix
DATETIME is kept for backward compatibility but its range starting in 1753 and its 3ms rounding make it unsuitable for new development — always use DATETIME2

Other Important Types

Beyond the numeric, string, and date categories, three other types appear regularly in real SQL Server schemas. BIT stores a boolean value — 0 or 1 — and is the correct type for any true/false flag such as IsActive, IsVerified, or IsDeleted. SQL Server packs up to 8 BIT columns into a single byte. UNIQUEIDENTIFIER stores a 16-byte GUID, typically generated with NEWID() or NEWSEQUENTIALID(). GUIDs are globally unique across servers and databases, making them useful for distributed systems where rows are created on multiple nodes, but their random nature causes severe index fragmentation when used as clustered index keys — use NEWSEQUENTIALID() rather than NEWID() if you must use a GUID as a primary key. UNIQUEIDENTIFIER columns are 4x larger than INT and should not be used as primary keys without a specific reason. XML and JSON (stored as NVARCHAR with FOR JSON and OPENJSON functions) round out the commonly used types.

-- Demonstrate BIT, UNIQUEIDENTIFIER, and their storage characteristics

-- BIT: correct type for boolean flags
DECLARE @is_active   BIT = 1;       -- 1 = true
DECLARE @is_verified BIT = 0;       -- 0 = false

SELECT
    @is_active              AS is_active,
    @is_verified            AS is_verified,
    DATALENGTH(@is_active)  AS bit_bytes;   -- 1 byte holds up to 8 BIT columns

-- UNIQUEIDENTIFIER: global uniqueness at a storage cost
DECLARE @random_guid     UNIQUEIDENTIFIER = NEWID();           -- random — causes fragmentation
DECLARE @sequential_guid UNIQUEIDENTIFIER = NEWSEQUENTIALID(); -- sequential — safe for PKs

SELECT
    @random_guid                        AS random_guid,
    DATALENGTH(@random_guid)            AS guid_bytes,          -- always 16 bytes
    DATALENGTH(CAST(1 AS INT))          AS int_bytes,           -- 4 bytes
    DATALENGTH(CAST(1 AS BIGINT))       AS bigint_bytes;        -- 8 bytes — still half a GUID

is_active is_verified bit_bytes
───────── ─────────── ─────────
1 0 1

random_guid guid_bytes int_bytes bigint_bytes
──────────────────────────────────── ────────── ───────── ────────────
7B3F2A1C-4D8E-4F2A-9C1B-3E5F7A2D8B4C 16 4 8

A GUID primary key is 16 bytes — 4x larger than INT — meaning every foreign key column pointing to it is also 16 bytes, every index entry is larger, and the Buffer Pool holds fewer rows per page
NEWID() generates random GUIDs that insert in random positions in the clustered index, causing constant page splits and index fragmentation — NEWSEQUENTIALID() generates GUIDs that always increase, avoiding this problem at the cost of being predictable
BIT columns are the only SQL Server type that can share physical storage — 8 BIT columns in a row occupy just 1 byte total, making them extremely efficient for flag-heavy schemas

Type Conversion and Implicit Casting Danger

SQL Server can convert between compatible types automatically in a process called implicit conversion. This looks convenient but has a significant hidden cost: when a comparison involves an implicit conversion on an indexed column, SQL Server cannot use the index and falls back to a full table scan. The classic case is comparing a numeric column to a string literal or passing a VARCHAR value into a parameter declared as NVARCHAR. The fix is always to match the literal or parameter type exactly to the column type, or use an explicit CAST or CONVERT to make the conversion visible and deliberate.

-- Demonstrate implicit conversion disabling an index

-- This uses the index on UserID — types match (INT = INT)
SELECT UserID, FullName
FROM Users
WHERE UserID = 5;                   -- INT literal compared to INT column — index used

-- This causes an implicit conversion — SQL Server converts the column, not the literal
-- Converting the column means SQL Server must evaluate every row before filtering
SELECT UserID, FullName
FROM Users
WHERE UserID = '5';                 -- VARCHAR literal compared to INT column — risky

-- Explicit CAST — makes the conversion visible and intentional
SELECT UserID, FullName
FROM Users
WHERE UserID = CAST('5' AS INT);    -- now the literal is cast, not the column — index used

-- Safe date filtering — always use typed literals or explicit casts
SELECT OrderID, OrderDate, TotalAmount
FROM Orders
WHERE OrderDate >= CAST('2024-01-01' AS DATE)   -- explicit cast keeps the index
  AND OrderDate <  CAST('2025-01-01' AS DATE);

-- WHERE UserID = 5 (index used):
UserID FullName
────── ──────────────
5 Maria Garcia

-- WHERE UserID = '5' (implicit conversion — same result, slower path):
UserID FullName
────── ──────────────
5 Maria Garcia

-- WHERE UserID = CAST('5' AS INT) (explicit — index used):
UserID FullName
────── ──────────────
5 Maria Garcia

-- Date range result:
OrderID OrderDate TotalAmount
─────── ─────────────────────────── ───────────
1 2024-03-15 10:22:00.0000000 1029.98
2 2024-03-18 14:45:00.0000000 89.99
3 2024-03-20 09:11:00.0000000 159.98

All three UserID queries return the same result — but the middle one silently converts every UserID value in the table before comparing, which means a full table scan even if a perfect index exists
Execution plans in SSMS show a yellow warning triangle on operators where implicit conversion has disabled index use — always investigate these warnings before deploying a query to production
The habit of matching literal types to column types costs nothing in development and prevents a category of performance problems that are difficult to diagnose in production under load

Summary Table

Type	Storage	Use For	Avoid When
TINYINT	1 byte	Ratings, status codes (0–255)	Primary keys — range too small
INT	4 bytes	Primary keys, foreign keys, quantities	Billion-row tables — use BIGINT
DECIMAL(p,s)	5–17 bytes	All currency and financial values	Never use FLOAT for money
NVARCHAR(n)	2n + 2 bytes	Names, addresses, user content	Internal codes — use VARCHAR
DATE	3 bytes	Birthdates, hire dates, join dates	When time of day is needed
DATETIME2	6–8 bytes	Timestamps, order dates, log entries	Replace legacy DATETIME with this
BIT	1 byte (8 columns)	Boolean flags — IsActive, IsVerified	Storing more than true/false

Practice Questions

Practice 1. Why should FLOAT never be used for financial values like prices or totals?

Practice 2. What is the difference between VARCHAR and NVARCHAR and when should each be used?

Practice 3. How does an implicit type conversion on an indexed column affect query performance?

Practice 4. Why is NEWSEQUENTIALID() preferred over NEWID() when using a UNIQUEIDENTIFIER as a primary key?

Practice 5. What is the recommended replacement for the legacy DATETIME type in new SQL Server schemas and why?

Quiz

Quiz 1. How many bytes does DECIMAL(10,2) use for storage?

9 bytes
5 bytes
13 bytes
8 bytes

Quiz 2. What does the N prefix on a string literal (e.g. N'hello') tell SQL Server?

Treat the literal as a Unicode NVARCHAR string
It declares a NULL value
It marks the string as NOT NULL
It converts the string to uppercase

Quiz 3. How many BIT columns can SQL Server pack into a single byte of storage?

8
1
4
16

Quiz 4. Which date type should you use for a column that only needs to store a calendar date with no time component?

DATE
DATETIME
DATETIME2
SMALLDATETIME

Quiz 5. A column is defined as INT and a WHERE clause filters it with WHERE UserID = '5'. What problem does this cause?

Implicit conversion disables index use — full table scan instead of index seek
SQL Server raises a type mismatch error
The query returns no rows
No problem — SQL Server handles it automatically

Next up - Creating Databases & Tables - Build real database objects from scratch and understand every option behind the CREATE statement.

← Previous Course Index Next →

MS SQL Server