Speech AI Course
Noise Reduction
Until now, we assumed speech audio is clean and clear. In real life, that is almost never true.
Noise is one of the biggest challenges in Speech AI. If noise is not handled correctly, even the best models fail.
This lesson explains why noise exists, how it affects Speech AI, and how we reduce it in practice.
What Is Noise in Speech?
Noise is any unwanted sound that interferes with the target speech signal.
Examples of common noise sources include:
- Background conversations
- Traffic and street sounds
- Fan and AC noise
- Keyboard typing
- Microphone static
Speech AI models do not naturally ignore noise the way humans do.
Why Noise Hurts Speech AI Models
Noise directly alters the audio signal used for feature extraction.
As a result:
- Important speech features get masked
- Phoneme boundaries become unclear
- Word error rate increases
- Model confidence drops
This is why noise reduction is usually applied before feature extraction.
Types of Noise
Stationary Noise
Stationary noise remains relatively constant over time.
Examples:
- Fan noise
- Electrical hum
- Air conditioner sound
These noises are easier to remove.
Non-Stationary Noise
Non-stationary noise changes over time.
Examples:
- People talking
- Traffic sounds
- Sudden background events
These are much harder to suppress.
Signal-to-Noise Ratio (SNR)
Signal-to-Noise Ratio (SNR) measures how strong the speech signal is compared to noise.
Higher SNR means cleaner audio. Lower SNR means noisy audio.
Most Speech AI systems perform poorly when SNR drops below a threshold.
Basic Noise Reduction Strategy
Most noise reduction techniques follow this logic:
- Estimate the noise
- Separate noise from speech
- Suppress noise components
- Reconstruct clean speech
This process happens in either the time domain or frequency domain.
Noise Reduction Using Spectral Subtraction
Spectral subtraction is one of the earliest and most intuitive noise reduction techniques.
It works by estimating noise from silent portions of audio and subtracting it from the signal spectrum.
import numpy as np
import librosa
audio, sr = librosa.load("noisy.wav", sr=16000)
stft = librosa.stft(audio)
magnitude, phase = np.abs(stft), np.angle(stft)
noise_estimate = np.mean(magnitude[:, :10], axis=1, keepdims=True)
clean_magnitude = np.maximum(magnitude - noise_estimate, 0)
clean_stft = clean_magnitude * np.exp(1j * phase)
clean_audio = librosa.istft(clean_stft)
This method is simple but can introduce artifacts if noise estimation is inaccurate.
Filtering-Based Noise Reduction
Another common approach is filtering.
Filters remove frequency ranges that mostly contain noise.
For example, low-pass or band-pass filters can be applied when speech frequency range is known.
from scipy.signal import butter, lfilter
def bandpass_filter(data, lowcut, highcut, fs, order=4):
nyquist = 0.5 * fs
low = lowcut / nyquist
high = highcut / nyquist
b, a = butter(order, [low, high], btype='band')
return lfilter(b, a, data)
filtered_audio = bandpass_filter(audio, 300, 3400, sr)
Noise Reduction in Modern Speech AI
Modern systems go beyond classical signal processing.
They use:
- Deep learning based denoising
- Neural speech enhancement models
- Self-supervised noise suppression
Examples include:
- Denoising autoencoders
- U-Net based speech enhancers
- Transformer-based enhancement models
These models learn noise patterns directly from data.
Real-World Pipeline Placement
In production Speech AI pipelines, noise reduction is usually placed:
Audio Input → Noise Reduction → Feature Extraction → Model
Skipping this step often causes severe accuracy degradation.
Practice
What do we call unwanted sound in speech recordings?
Which metric compares speech strength to noise?
Which noise reduction method subtracts estimated noise from spectrum?
Quick Quiz
Which type of noise remains constant over time?
Noise reduction is typically applied at which stage?
Spectral subtraction operates mainly in which domain?
Recap: Noise reduction improves Speech AI performance by separating speech from unwanted background sounds.
Next up: You’ll learn how to evaluate Speech AI systems using objective metrics.