NLP Lesson 22 – Naive Bayes | Dataplexa

Naive Bayes for NLP

In the previous lesson, you learned the basics of Text Classification and saw how text is converted into numbers and passed to a classifier.

Now we focus on one of the most important and widely used algorithms in NLP: Naive Bayes.

Even though it is mathematically simple, Naive Bayes is extremely powerful for text-based problems and is still used in real-world systems today.

What Is Naive Bayes?

Naive Bayes is a probabilistic classification algorithm based on Bayes’ Theorem.

It predicts the class of a text by calculating how likely the text belongs to each class and choosing the most probable one.

It is called “naive” because it assumes:

All words are independent of each other
The presence of one word does not affect another

This assumption is unrealistic — but surprisingly, the algorithm still works very well for text.

Why Naive Bayes Works So Well for NLP

Text data has some special properties:

Very high dimensional (thousands of words)
Most word counts are zero
Word frequency matters more than word order (in classic NLP)

Naive Bayes handles these properties efficiently, which is why it is a classic choice for NLP tasks.

Bayes’ Theorem (Intuition, Not Math)

At its core, Bayes’ Theorem answers this question:

“What is the probability of a class given the observed words?”

Naive Bayes compares:

How often words appear in each class
How common each class is overall

Then it picks the class with the highest probability.

Simple Intuition with an Example

Suppose we have two classes:

Spam
Not Spam

Words like “free”, “win”, and “offer” appear more often in spam emails.

When a new email arrives, Naive Bayes checks:

How often each word appeared in spam emails
How often it appeared in non-spam emails

Then it decides which class is more likely.

Types of Naive Bayes Used in NLP

There are multiple Naive Bayes variants. For NLP, the most common ones are:

Multinomial Naive Bayes (most popular for text)
Bernoulli Naive Bayes
Gaussian Naive Bayes (rare for text)

In practice, Multinomial Naive Bayes is the default choice.

Naive Bayes NLP Pipeline

The standard pipeline looks like this:

Text cleaning
Vectorization (Bag of Words or TF-IDF)
Train Naive Bayes model
Predict class for new text

You have already learned steps 1 and 2.

Simple Code Example: Naive Bayes for Text Classification

This example uses:

CountVectorizer for text → numbers
Multinomial Naive Bayes for classification

Where to run this code:

Google Colab (recommended)
Jupyter Notebook (Anaconda)

Python Example: Naive Bayes Text Classification

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

texts = [
    "I love this product",
    "This is amazing",
    "I hate this item",
    "This is terrible"
]

labels = [1, 1, 0, 0]  # 1 = Positive, 0 = Negative

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)

test_text = ["This product is amazing"]
X_test = vectorizer.transform(test_text)

prediction = model.predict(X_test)
print("Prediction:", prediction)

Output Explanation:

The model learns word probabilities per class
It compares probabilities for each class
The class with the highest probability is selected

Why Multinomial Naive Bayes Is Ideal for Text

Multinomial Naive Bayes works on:

Word counts
Word frequencies

This matches perfectly with Bag of Words and TF-IDF representations.

That is why Naive Bayes + TF-IDF is a very common combination.

Advantages of Naive Bayes

Very fast training and prediction
Works well with small datasets
Handles high-dimensional text efficiently
Simple and interpretable

Limitations of Naive Bayes

Assumes word independence
Does not understand word order
Cannot capture deep semantics

These limitations lead us to more advanced models later.

Real-Life Applications

Email spam detection
Sentiment analysis
Topic classification
Document filtering

Many large systems still use Naive Bayes for fast baseline models.

Assignment / Homework

Theory:

Explain why Naive Bayes is called “naive”
Explain why it still works well for text

Practical:

Add more training sentences
Change test sentences
Observe prediction changes

Practice environment:

Google Colab
Jupyter Notebook

Practice Questions

Q1. Why is Naive Bayes suitable for NLP?

Because it handles high-dimensional sparse text data efficiently.

Q2. Which Naive Bayes variant is most common for text?

Multinomial Naive Bayes.

Quick Quiz

Q1. Does Naive Bayes consider word order?

No.

Q2. Can Naive Bayes work with TF-IDF?

Yes, it works very well with TF-IDF.

Quick Recap

Naive Bayes is a probabilistic classifier
It works extremely well for text
Multinomial NB is the default NLP choice
Fast, simple, and effective

In the next lesson, we will explore Logistic Regression for Text Classification and see how it differs from Naive Bayes.

← Previous Course Index Next →