MS SQL Lesson 14 – SELECT Queries | Dataplexa

Select Queries

The SELECT statement is the most used statement in SQL — it is how you ask the database a question and get an answer back. Every report, every dashboard, every API response that pulls data from SQL Server starts with SELECT. Understanding how to write SELECT queries correctly, efficiently, and expressively is the single skill that will serve you in every lesson that follows and in every real-world project you work on. This lesson covers the full anatomy of a SELECT statement using the DataplexaStore data inserted in the previous lesson — from the simplest column retrieval to column aliases, expressions, sorting, and removing duplicates.

The Basic SELECT

A SELECT statement has two required clauses: SELECT and FROM. SELECT specifies which columns to return. FROM specifies which table to read from. SQL Server evaluates FROM first — it identifies the data source — then applies the SELECT to determine which columns from that source appear in the result. The order you write clauses in is not the order SQL Server evaluates them in, and understanding this distinction matters when you write more complex queries. The most important habit to form immediately is avoiding SELECT * in any query that will run in production. Selecting every column forces SQL Server to read every column from every page even when your application only needs two or three values — wasting I/O, memory, and network bandwidth. Name the columns you actually need.

-- Basic SELECT — specify only the columns you need
USE DataplexaStore;

SELECT
    UserID,
    FullName,
    City,
    Country
FROM Users;

-- SELECT * retrieves every column — acceptable for exploration, never for production
SELECT * FROM Users;
-- Named columns:
UserID FullName City Country
────── ────────────── ──────────── ────────────
1 Alice Johnson New York USA
2 Bob Williams London UK
3 Clara Davis Toronto Canada
4 David Kim Seoul South Korea
5 Eva Martinez Madrid Spain
6 Frank Chen Shanghai China
7 Grace Patel Mumbai India
8 Henry Osei Accra Ghana
9 Isla Nguyen Ho Chi Minh Vietnam
10 James Wilson Sydney Australia
  • SQL Server returns rows in no guaranteed order when no ORDER BY is specified — the order you see here is the clustered index order (by UserID) but this is an implementation detail, not a contract; always use ORDER BY when order matters
  • SELECT * is useful in SSMS when exploring an unfamiliar table — in stored procedures, views, and application queries it is a maintenance liability because adding a column to the table silently changes what the query returns
  • The FROM clause is evaluated before SELECT — SQL Server first identifies the full set of rows in Users, then picks the four named columns from each row

Column Aliases

A column alias renames a column in the result set without changing anything in the underlying table. Aliases make output readable for end users, give computed expressions a meaningful name, and are required when a SELECT expression has no inherent column name — such as a calculation or a function call. The standard syntax is column AS alias. The AS keyword is optional but always include it — omitting it makes the alias look like a second column name and is a common source of confusion for developers reading the query later. Aliases defined in SELECT cannot be referenced in a WHERE clause in the same query because WHERE is evaluated before SELECT in SQL Server's logical processing order — but they can be used in ORDER BY.

-- Column aliases — rename columns and give expressions meaningful names

SELECT
    ProductID                                   AS id,
    ProductName                                 AS product,
    Category                                    AS category,
    Price                                       AS unit_price,
    StockQty                                    AS stock,
    Price * StockQty                            AS inventory_value,  -- computed expression
    CASE
        WHEN StockQty = 0  THEN 'Out of Stock'
        WHEN StockQty < 15 THEN 'Low Stock'
        ELSE 'In Stock'
    END                                         AS stock_status      -- CASE expression needs alias
FROM Products
ORDER BY inventory_value DESC;                  -- ORDER BY can reference aliases
id product category unit_price stock inventory_value stock_status
─── ──────────────────── ─────────── ────────── ───── ─────────────── ────────────
6 Standing Desk Home Office 349.99 8 2799.92 Low Stock
10 Monitor 27 inch Electronics 399.99 10 3999.90 Low Stock
7 Ergonomic Chair Home Office 249.99 12 2999.88 Low Stock
2 Mechanical Keyboard Electronics 89.99 15 1349.85 In Stock
4 Webcam HD Electronics 79.99 18 1439.82 In Stock
3 USB-C Hub Electronics 49.99 30 1499.70 In Stock
5 Desk Lamp Home Office 34.99 55 1924.45 In Stock
1 Wireless Mouse Electronics 29.99 42 1259.58 In Stock
9 Ballpoint Pen Set Stationery 8.99 200 1798.00 In Stock
8 Notebook A5 Stationery 12.99 120 1558.80 In Stock
  • inventory_value is a computed expression — Price * StockQty has no column name of its own so the alias gives it one; without the alias SQL Server would label it with the expression text or a blank header
  • The CASE expression evaluates each row's StockQty against threshold values and returns a descriptive label — a powerful pattern for transforming raw numeric values into business-readable categories
  • ORDER BY inventory_value DESC uses the alias defined in SELECT — this works because ORDER BY is evaluated after SELECT in SQL Server's logical processing order

Expressions and Functions in SELECT

The SELECT clause is not limited to raw column values — it can contain any valid T-SQL expression including arithmetic, string functions, date functions, and conversion functions. These expressions are evaluated row by row and the result appears as a column in the output. They never modify the underlying data — a SELECT expression that calculates a discounted price does not change the Price column in the Products table; it only computes a new value in the result set. This makes SELECT expressions safe for exploration and reporting without any risk of altering stored data.

-- Common expressions and functions used in SELECT clauses

SELECT
    FullName,
    Email,
    -- String functions
    UPPER(FullName)                             AS name_uppercase,
    LEN(Email)                                  AS email_length,
    LEFT(Email, CHARINDEX('@', Email) - 1)      AS email_username,  -- text before @

    -- Date functions
    JoinDate,
    DATEDIFF(DAY, JoinDate, GETDATE())          AS days_since_joined,
    YEAR(JoinDate)                              AS join_year,

    -- Conversion
    CAST(UserID AS NVARCHAR(10)) + ' - ' + FullName AS user_label
FROM Users
ORDER BY days_since_joined DESC;
FullName Email name_uppercase email_length email_username JoinDate days_since_joined join_year user_label
────────────── ──────────────────────────── ─────────────── ──────────── ────────────── ────────── ───────────────── ───────── ─────────────────
Alice Johnson alice.johnson@example.com ALICE JOHNSON 25 alice.johnson 2024-03-28 365 2024 1 - Alice Johnson
Bob Williams bob.williams@example.com BOB WILLIAMS 24 bob.williams 2024-03-28 365 2024 2 - Bob Williams
Clara Davis clara.davis@example.com CLARA DAVIS 23 clara.davis 2024-03-28 365 2024 3 - Clara Davis
...
  • CHARINDEX('@', Email) returns the position of the @ symbol — subtracting 1 and wrapping in LEFT extracts everything before it, giving the username portion of the email address
  • DATEDIFF(DAY, JoinDate, GETDATE()) calculates the number of days between the join date and today — this value changes every day without any data being updated, because GETDATE() returns the current timestamp at query execution time
  • CAST(UserID AS NVARCHAR(10)) converts the integer UserID to a string so it can be concatenated with the + operator — trying to concatenate an INT directly to a string raises a type mismatch error

Sorting with ORDER BY

Without ORDER BY, SQL Server returns rows in an undefined order — the physical order depends on how data was stored, which indexes were used, and which execution plan was chosen. This order can change between executions of the same query. ORDER BY is the only reliable way to guarantee the sequence of rows in a result set. You can sort by one or more columns, mix ascending (ASC, the default) and descending (DESC) directions, and reference column aliases defined in SELECT. When sorting by multiple columns, SQL Server sorts by the first column, then breaks ties using the second column, and so on down the list.

-- Sorting with ORDER BY — single column, multi-column, and mixed direction

-- Sort products by Category ascending, then Price descending within each category
SELECT
    ProductName,
    Category,
    Price,
    StockQty
FROM Products
ORDER BY
    Category ASC,       -- primary sort — alphabetical by category
    Price    DESC;      -- secondary sort — most expensive first within each category
ProductName Category Price StockQty
──────────────────── ─────────── ────── ────────
Monitor 27 inch Electronics 399.99 10
Mechanical Keyboard Electronics 89.99 15
Webcam HD Electronics 79.99 18
USB-C Hub Electronics 49.99 30
Wireless Mouse Electronics 29.99 42
Standing Desk Home Office 349.99 8
Ergonomic Chair Home Office 249.99 12
Desk Lamp Home Office 34.99 55
Notebook A5 Stationery 12.99 120
Ballpoint Pen Set Stationery 8.99 200
  • Within Electronics, products are sorted most expensive first — the secondary sort only applies within groups where the primary sort produces ties (identical Category values)
  • ASC is the default sort direction — writing ORDER BY Category and ORDER BY Category ASC produce identical results; the explicit ASC makes intent clear in multi-column sorts where directions are mixed
  • ORDER BY can reference column position numbers — ORDER BY 2, 3 DESC would sort by the second column then the third — but this is brittle: adding a column changes the positions and silently changes the sort order

Removing Duplicates with DISTINCT

DISTINCT removes duplicate rows from the result set, returning only unique combinations of the selected columns. It applies to the entire row of selected columns, not just the first one. If you select two columns with DISTINCT, a row is considered a duplicate only when both column values are identical to another row. DISTINCT is commonly used when exploring data — finding all unique categories, all countries represented in the customer base, or all distinct status values in the Orders table. It carries a sorting cost because SQL Server must compare all returned rows to identify duplicates, so it should only be used when duplicates genuinely need to be removed rather than as a workaround for a query that is joining incorrectly and producing unintended duplicates.

-- DISTINCT on a single column — unique categories in the Products table
SELECT DISTINCT Category
FROM Products
ORDER BY Category;

-- DISTINCT on multiple columns — unique Country and MembershipTier combinations
SELECT DISTINCT
    Country,
    MembershipTier
FROM Users
ORDER BY Country, MembershipTier;

-- Count of distinct values — how many unique countries have registered users?
SELECT COUNT(DISTINCT Country) AS unique_countries
FROM Users;
-- Distinct categories:
Category
───────────
Electronics
Home Office
Stationery

-- Distinct Country + MembershipTier combinations:
Country MembershipTier
──────────── ──────────────
Australia Standard
Canada VIP
China Standard
Ghana Standard
India VIP
South Korea Standard
Spain Premium
UK Premium
USA Standard
Vietnam Premium

-- Unique countries:
unique_countries
────────────────
10
  • Three distinct categories from 10 products — DISTINCT collapsed the five Electronics rows, three Home Office rows, and two Stationery rows into one row each
  • Every Country and MembershipTier combination is unique in this dataset — DISTINCT made no change here, but it correctly confirms there are no two users from the same country with the same tier
  • COUNT(DISTINCT column) counts unique non-NULL values — one of the most frequently used aggregation patterns for data exploration and reporting

TOP — Limiting Result Rows

TOP limits the number of rows returned by a query. It is useful for finding the highest or lowest values in a table — the most expensive products, the most recent orders, the top spending customers. TOP is evaluated after ORDER BY, so it reliably returns the first N rows of the sorted result rather than N arbitrary rows. TOP can take an absolute row count or a percentage with TOP (n) PERCENT. The WITH TIES option includes additional rows that share the same value as the last row returned — useful when you want all rows that tie for the final position rather than an arbitrary cut-off in the middle of a tie group.

-- TOP N — the 3 most expensive products
SELECT TOP 3
    ProductName,
    Category,
    Price
FROM Products
ORDER BY Price DESC;    -- ORDER BY is evaluated first — TOP takes from the sorted result

-- TOP WITH TIES — include all rows that tie for the last position
SELECT TOP 3 WITH TIES
    ProductName,
    Category,
    Price
FROM Products
ORDER BY Price DESC;

-- TOP PERCENT — the most expensive 30% of products
SELECT TOP 30 PERCENT
    ProductName,
    Price
FROM Products
ORDER BY Price DESC;
-- TOP 3:
ProductName Category Price
─────────────── ─────────── ──────
Monitor 27 inch Electronics 399.99
Standing Desk Home Office 349.99
Ergonomic Chair Home Office 249.99

-- TOP 3 WITH TIES (same result — no ties at position 3):
ProductName Category Price
─────────────── ─────────── ──────
Monitor 27 inch Electronics 399.99
Standing Desk Home Office 349.99
Ergonomic Chair Home Office 249.99

-- TOP 30 PERCENT (30% of 10 rows = 3 rows):
ProductName Price
─────────────── ──────
Monitor 27 inch 399.99
Standing Desk 349.99
Ergonomic Chair 249.99
  • TOP without ORDER BY returns an arbitrary set of rows — always pair TOP with ORDER BY to get a meaningful result
  • TOP PERCENT rounds up — 30% of 10 rows is 3.0, which rounds to 3; if the table had 11 rows, 30% would be 3.3 and SQL Server would return 4 rows
  • WITH TIES matters when multiple rows share the same value at the cut-off point — for example if two products were both priced at 249.99, TOP 3 WITH TIES would return 4 rows to include both ties at position 3

Summary Table

Clause / Feature Purpose Key Point
SELECT col FROM table Retrieve specific columns Never use SELECT * in production
column AS alias Rename column in result set Required for expressions — usable in ORDER BY
Expressions & functions Compute values row by row Never modifies stored data
ORDER BY col DIR Guarantee row sequence Only reliable way to control output order
DISTINCT Remove duplicate rows Applies to full row — not a single column fix
TOP N / PERCENT Limit rows returned Always pair with ORDER BY for meaningful results

Practice Questions

Practice 1. Why should SELECT * be avoided in production queries?



Practice 2. Why can a column alias defined in SELECT not be used in a WHERE clause in the same query?



Practice 3. Write a SELECT query that returns the ProductName and a discounted price column showing each product's Price reduced by 10%, aliased as discounted_price.



Practice 4. What is the difference between TOP 3 and TOP 3 WITH TIES?



Practice 5. DISTINCT is applied to a query selecting Country and MembershipTier. Two users are both from the UK with Premium tier. How many rows do they contribute to the result?



Quiz

Quiz 1. In SQL Server's logical processing order, which clause is evaluated first?






Quiz 2. A query selects ProductName and Category with DISTINCT. Three rows have ProductName = 'Notebook A5' and Category = 'Stationery'. How many rows appear in the result for this combination?






Quiz 3. Which clause must always accompany TOP to produce a meaningful result?






Quiz 4. What does DATEDIFF(DAY, JoinDate, GETDATE()) return?






Quiz 5. A table has 7 rows. A query uses TOP 40 PERCENT with ORDER BY. How many rows are returned?






Next up - WHERE & Filtering - Narrow down exactly which rows SQL Server returns using conditions, comparisons, and logical operators.