Speech AI Lesson 10 – Noise Reduction | Dataplexa

Noise Reduction

Until now, we assumed speech audio is clean and clear. In real life, that is almost never true.

Noise is one of the biggest challenges in Speech AI. If noise is not handled correctly, even the best models fail.

This lesson explains why noise exists, how it affects Speech AI, and how we reduce it in practice.

What Is Noise in Speech?

Noise is any unwanted sound that interferes with the target speech signal.

Examples of common noise sources include:

Background conversations
Traffic and street sounds
Fan and AC noise
Keyboard typing
Microphone static

Speech AI models do not naturally ignore noise the way humans do.

Why Noise Hurts Speech AI Models

Noise directly alters the audio signal used for feature extraction.

As a result:

Important speech features get masked
Phoneme boundaries become unclear
Word error rate increases
Model confidence drops

This is why noise reduction is usually applied before feature extraction.

Types of Noise

Stationary Noise

Stationary noise remains relatively constant over time.

Examples:

Fan noise
Electrical hum
Air conditioner sound

These noises are easier to remove.

Non-Stationary Noise

Non-stationary noise changes over time.

Examples:

People talking
Traffic sounds
Sudden background events

These are much harder to suppress.

Signal-to-Noise Ratio (SNR)

Signal-to-Noise Ratio (SNR) measures how strong the speech signal is compared to noise.

Higher SNR means cleaner audio. Lower SNR means noisy audio.

Most Speech AI systems perform poorly when SNR drops below a threshold.

Basic Noise Reduction Strategy

Most noise reduction techniques follow this logic:

Estimate the noise
Separate noise from speech
Suppress noise components
Reconstruct clean speech

This process happens in either the time domain or frequency domain.

Noise Reduction Using Spectral Subtraction

Spectral subtraction is one of the earliest and most intuitive noise reduction techniques.

It works by estimating noise from silent portions of audio and subtracting it from the signal spectrum.


import numpy as np
import librosa

audio, sr = librosa.load("noisy.wav", sr=16000)

stft = librosa.stft(audio)
magnitude, phase = np.abs(stft), np.angle(stft)

noise_estimate = np.mean(magnitude[:, :10], axis=1, keepdims=True)
clean_magnitude = np.maximum(magnitude - noise_estimate, 0)

clean_stft = clean_magnitude * np.exp(1j * phase)
clean_audio = librosa.istft(clean_stft)

Noise spectrum estimated and reduced

This method is simple but can introduce artifacts if noise estimation is inaccurate.

Filtering-Based Noise Reduction

Another common approach is filtering.

Filters remove frequency ranges that mostly contain noise.

For example, low-pass or band-pass filters can be applied when speech frequency range is known.


from scipy.signal import butter, lfilter

def bandpass_filter(data, lowcut, highcut, fs, order=4):
    nyquist = 0.5 * fs
    low = lowcut / nyquist
    high = highcut / nyquist
    b, a = butter(order, [low, high], btype='band')
    return lfilter(b, a, data)

filtered_audio = bandpass_filter(audio, 300, 3400, sr)

Filtered speech frequency range retained

Noise Reduction in Modern Speech AI

Modern systems go beyond classical signal processing.

They use:

Deep learning based denoising
Neural speech enhancement models
Self-supervised noise suppression

Examples include:

Denoising autoencoders
U-Net based speech enhancers
Transformer-based enhancement models

These models learn noise patterns directly from data.

Real-World Pipeline Placement

In production Speech AI pipelines, noise reduction is usually placed:

Audio Input → Noise Reduction → Feature Extraction → Model

Skipping this step often causes severe accuracy degradation.

Practice

What do we call unwanted sound in speech recordings?

Which metric compares speech strength to noise?

Which noise reduction method subtracts estimated noise from spectrum?

Quick Quiz

Which type of noise remains constant over time?

Stationary
Non-stationary
Impulsive

Noise reduction is typically applied at which stage?

After modeling
Before feature extraction
During training only

Spectral subtraction operates mainly in which domain?

Time domain
Frequency domain
Spatial domain

Recap: Noise reduction improves Speech AI performance by separating speech from unwanted background sounds.

Next up: You’ll learn how to evaluate Speech AI systems using objective metrics.

← Previous Course Index Next →

Speech AI Course