PostgreSQL Lesson 33 – String Functions | Dataplexa

String Functions

Real data is messy. Names come in inconsistent cases. Emails have trailing spaces. Phone numbers are formatted a dozen different ways. Addresses need to be split apart or combined. PostgreSQL has a rich library of string functions that let you clean, format, search, split, and transform text data entirely within SQL — no application code needed. This lesson covers every string function you will use regularly, all demonstrated on the Dataplexa Store dataset.

Case Conversion

UPPER converts all characters to uppercase. LOWER converts all to lowercase. INITCAP capitalises the first letter of each word — useful for normalising name data that came in with inconsistent casing.

-- Normalise customer names to consistent casing
SELECT first_name,
       UPPER(first_name)   AS upper_name,
       LOWER(first_name)   AS lower_name,
       INITCAP(first_name) AS initcap_name
FROM customers
LIMIT 4;

-- Use LOWER for case-insensitive comparisons
SELECT first_name, email FROM customers
WHERE LOWER(email) = LOWER('Alice.Morgan@Example.COM');

-- Standardise all emails to lowercase on insert/update
UPDATE customers SET email = LOWER(email);
first_name | upper_name | lower_name | initcap_name
------------+------------+------------+--------------
alice      | ALICE      | alice      | Alice
BOB        | BOB        | bob        | Bob
carol      | CAROL      | carol      | Carol

-- Case-insensitive match:
first_name |            email
------------+------------------------
Alice      | alice.morgan@example.com

UPDATE 50

Length and Padding

LENGTH returns the number of characters in a string. LPAD and RPAD pad a string to a target length with a fill character — useful for formatting output, generating codes, or aligning columns. CHAR_LENGTH is an alias for LENGTH that counts Unicode characters correctly.

-- Check email length — useful for data validation
SELECT email, LENGTH(email) AS email_length
FROM customers
ORDER BY email_length DESC
LIMIT 5;

-- Pad order ID to 8 characters with leading zeros
SELECT id,
       LPAD(id::TEXT, 8, '0') AS formatted_order_id
FROM orders
LIMIT 4;

-- Right-pad product names to 20 characters for aligned output
SELECT RPAD(name, 20, '.') AS padded_name, price
FROM products
LIMIT 4;
-- Email lengths:
              email            | email_length
+------------------------+--------------
olivia.jones@example.com |           24

-- Padded order IDs:
id  | formatted_order_id
-----+--------------------
 101 | 00000101
 102 | 00000102

-- Right-padded names:
     padded_name         | price
---------------------+--------
Wireless Mouse...... |  29.99
Standing Desk....... | 349.99

Trimming Whitespace

Data imported from spreadsheets or entered by users often has hidden leading or trailing spaces. TRIM removes both ends, LTRIM removes only the left (start), and RTRIM removes only the right (end). You can also trim specific characters instead of spaces.

-- Remove surrounding whitespace from imported data
SELECT TRIM('  hello world  ')   AS trimmed,
       LTRIM('  hello world  ')  AS ltrimmed,
       RTRIM('  hello world  ')  AS rtrimmed;

-- Trim specific characters (remove trailing dots and spaces)
SELECT TRIM(BOTH '. ' FROM 'Standing Desk...') AS cleaned;

-- Clean up phone numbers with leading/trailing spaces in bulk
UPDATE customers
SET phone = TRIM(phone)
WHERE phone != TRIM(phone);
    trimmed    |     ltrimmed     |    rtrimmed
-------------+-----------------+----------------
hello world | hello world     |  hello world

   cleaned
--------------
Standing Desk

UPDATE 8

Extracting Parts of a String

LEFT and RIGHT extract a fixed number of characters from the start or end. SUBSTRING extracts a portion starting at a position for a given length. SPLIT_PART splits a string by a delimiter and returns the nth piece — ideal for parsing email domains, SKU components, or CSV-style fields stored in a column.

-- Extract first 3 characters of category as a code
SELECT name, LEFT(category, 3) AS cat_code FROM products LIMIT 4;

-- Extract last 3 characters
SELECT name, RIGHT(name, 3) AS name_end FROM products LIMIT 4;

-- SUBSTRING: extract characters starting at position 1 for 5 characters
SELECT SUBSTRING('Wireless Mouse' FROM 1 FOR 8) AS extracted;  -- 'Wireless'

-- SPLIT_PART: extract domain from email
SELECT email,
       SPLIT_PART(email, '@', 1) AS username,
       SPLIT_PART(email, '@', 2) AS domain
FROM customers
LIMIT 5;

-- Extract state code from a combined 'city, ST' format
SELECT SPLIT_PART('New York, NY', ', ', 2) AS state_code;
-- LEFT category code:
      name       | cat_code
----------------+----------
Wireless Mouse | Ele
Standing Desk  | Fur

-- SPLIT_PART on email:
              email            |  username    |    domain
+------------------------+-------------+-------------
alice.morgan@example.com | alice.morgan | example.com
bob.smith@example.com    | bob.smith   | example.com

Searching and Replacing

POSITION finds the position of a substring within a string (returns 0 if not found). STRPOS is an alias with different argument order. REPLACE substitutes every occurrence of a target string with a replacement. REGEXP_REPLACE does the same with a regular expression pattern.

-- Find position of '@' in email
SELECT email,
       POSITION('@' IN email) AS at_position
FROM customers
LIMIT 3;

-- STRPOS: same as POSITION but different syntax
SELECT STRPOS('alice.morgan@example.com', '@') AS at_position;  -- returns 13

-- Replace 'Desk' with 'Workstation' in product names
SELECT name,
       REPLACE(name, 'Desk', 'Workstation') AS updated_name
FROM products
WHERE name ILIKE '%desk%';

-- REGEXP_REPLACE: remove all non-digit characters from phone numbers
SELECT phone,
       REGEXP_REPLACE(phone, '[^0-9]', '', 'g') AS digits_only
FROM customers
WHERE phone IS NOT NULL
LIMIT 4;
-- POSITION:
            email           | at_position
+------------------------+-------------
alice.morgan@example.com |          13

-- REPLACE:
     name      |     updated_name
--------------+--------------------
Standing Desk | Standing Workstation

-- Digits only:
    phone      | digits_only
--------------+------------
(555) 123-4567 | 5551234567

Concatenation

The || operator joins strings together. CONCAT joins two or more values and silently ignores NULLs (unlike || which returns NULL if any part is NULL). CONCAT_WS joins with a separator and also ignores NULLs.

-- || operator — returns NULL if any part is NULL
SELECT first_name || ' ' || last_name AS full_name FROM customers LIMIT 3;

-- CONCAT — ignores NULLs, never returns NULL
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM customers LIMIT 3;

-- CONCAT_WS — separator is first argument, ignores NULLs
SELECT CONCAT_WS(', ', last_name, first_name) AS name_formal FROM customers LIMIT 3;

-- Build a mailing label — CONCAT_WS skips NULL phone gracefully
SELECT CONCAT_WS(' | ', first_name, last_name, email, phone) AS contact_line
FROM customers
LIMIT 3;
-- || full name:
   full_name
--------------
Alice Morgan
Bob Smith

-- CONCAT_WS formal:
  name_formal
--------------
Morgan, Alice
Smith, Bob

-- Contact line (phone skipped if NULL):
Alice | Morgan | alice.morgan@example.com
Bob | Smith | bob.smith@example.com | (555) 123-4567

String Functions Quick Reference

Function What It Does Example
UPPER / LOWER Change case UPPER('hello') → 'HELLO'
INITCAP Capitalise each word INITCAP('hello world') → 'Hello World'
LENGTH Number of characters LENGTH('hello') → 5
TRIM / LTRIM / RTRIM Remove whitespace or characters TRIM(' hi ') → 'hi'
LEFT / RIGHT Extract n chars from start or end LEFT('hello', 3) → 'hel'
SUBSTRING Extract by position and length SUBSTRING('hello' FROM 2 FOR 3) → 'ell'
SPLIT_PART Split by delimiter, return nth part SPLIT_PART('a@b.com','@',2) → 'b.com'
REPLACE Replace all occurrences REPLACE('foo bar','bar','baz') → 'foo baz'
POSITION Find position of substring POSITION('@' IN 'a@b.com') → 2
CONCAT_WS Join with separator, skip NULLs CONCAT_WS(', ','a',NULL,'b') → 'a, b'
REGEXP_REPLACE Replace using regex pattern REGEXP_REPLACE(phone,'[^0-9]','','g')

🧪 Practice Questions

Answer based on what you learned in this lesson.

1. Which function splits a string by a delimiter and returns the nth piece — perfect for extracting the domain from an email address?




2. Which concatenation function joins values with a separator and automatically skips NULL values?




3. Which function capitalises the first letter of every word in a string?




4. Write the expression to strip all non-digit characters from a phone column using regex.




5. Write the expression to pad an order ID to 8 characters with leading zeros — for example turning 101 into 00000101.



🎯 Quiz — Test Your Understanding

Q1. What is the key difference between CONCAT and the || operator when one of the values is NULL?







Q2. Which expression correctly extracts the domain part from an email like alice@example.com?







Q3. Why is TRIM important when working with data imported from spreadsheets?







Q4. A user searches for an email but may have typed it in any case. How do you make the WHERE clause case-insensitive?







Q5. POSITION('@' IN email) returns what value if '@' does not exist in the string?






Next up: Date and time functions — working with timestamps, intervals, formatting dates, and calculating durations.