Speech AI Lesson 44 – Keyword Spotting | Dataplexa

Keyword Spotting

Keyword Spotting is the ability of a system to detect specific words or phrases from continuous audio.

Unlike full speech recognition, keyword spotting focuses only on listening for particular words, often called wake words or trigger words.

This task is critical for always-on, low-power Speech AI systems.

What Is Keyword Spotting?

Keyword spotting answers the question:

“Did the target word appear in this audio stream?”

Examples of keywords:

“Hey Siri”
“OK Google”
“Alexa”
Custom command words

The system ignores all other speech.

Why Keyword Spotting Is Different

Keyword spotting systems must be:

Always listening
Extremely fast
Low power
Highly precise

False activations are more harmful than missed detections.

How Keyword Spotting Works (High-Level)

Most systems follow this pipeline:

Audio → Feature Extraction → Keyword Model → Decision

The model outputs a probability that the keyword is present.

Feature Extraction for Keyword Spotting

Keyword spotting does not need full linguistic information.

Instead, it relies on short-term acoustic patterns.

Common features:

MFCCs
Log-mel spectrograms
Energy features

Why This Code Exists

This code simulates extracting short audio features used in keyword detection.


import numpy as np

# Simulated feature window (time frames × features)
features = np.random.rand(30, 40)

print(features.shape)

What happens inside:

Audio is split into short windows
Each window captures acoustic patterns

(30, 40)

Sliding Window Detection

Keyword spotting systems scan audio continuously.

They use a sliding window to check overlapping segments.

Why This Code Exists

This example simulates scanning audio segments for keyword probability.


def keyword_probability(features):
    return np.random.rand()

windows = [np.random.rand(30, 40) for _ in range(5)]

scores = [keyword_probability(w) for w in windows]
print(scores)

What happens:

Each window is evaluated independently
A probability score is produced

[0.12, 0.08, 0.91, 0.15, 0.05]

How to read this:

Higher values mean higher confidence that the keyword is present.

Threshold-Based Decision

A detection occurs only if the probability exceeds a threshold.

Why This Code Exists

This logic prevents false activations.


threshold = 0.8

detections = [i for i, s in enumerate(scores) if s > threshold]
print(detections)

What happens:

Low-confidence windows are ignored
Only strong signals trigger activation

[2]

Small Models, Big Impact

Keyword spotting models are usually tiny:

CNNs
Depthwise separable networks
Quantized models

This allows them to run on:

Smart speakers
Wearables
IoT devices

False Positives vs False Negatives

Designing thresholds involves trade-offs:

Lower threshold → more false positives
Higher threshold → missed activations

Production systems favor precision.

Noise and Robustness

Keyword spotting systems must handle:

Background conversations
TV or music
Different accents

Noise augmentation during training is essential.

Real-World Applications

Voice assistants
Hands-free control
Emergency keyword detection
Smart home activation

Practice

What task detects specific words from continuous audio?

What technique scans overlapping audio segments?

What value controls activation sensitivity?

Quick Quiz

Keyword spotting often detects:

Wake word
Full sentence
Paragraph

Keyword spotting models must be:

Large
Low power
Slow

What controls false activations?

Threshold
Fonts
Colors

Recap: Keyword spotting detects specific trigger words using lightweight models and threshold-based decisions.

Next up: You’ll learn about Speech Emotion Recognition and how systems infer emotions from voice.

← Previous Course Index Next →

Speech AI Course

Keyword Spotting

What Is Keyword Spotting?

Why Keyword Spotting Is Different

How Keyword Spotting Works (High-Level)

Feature Extraction for Keyword Spotting

Why This Code Exists

Sliding Window Detection

Why This Code Exists

Threshold-Based Decision

Why This Code Exists

Small Models, Big Impact

False Positives vs False Negatives

Noise and Robustness

Real-World Applications

Practice

Quick Quiz