Naive Bayes for NLP
In the previous lesson, you learned the basics of Text Classification and saw how text is converted into numbers and passed to a classifier.
Now we focus on one of the most important and widely used algorithms in NLP: Naive Bayes.
Even though it is mathematically simple, Naive Bayes is extremely powerful for text-based problems and is still used in real-world systems today.
What Is Naive Bayes?
Naive Bayes is a probabilistic classification algorithm based on Bayes’ Theorem.
It predicts the class of a text by calculating how likely the text belongs to each class and choosing the most probable one.
It is called “naive” because it assumes:
- All words are independent of each other
- The presence of one word does not affect another
This assumption is unrealistic — but surprisingly, the algorithm still works very well for text.
Why Naive Bayes Works So Well for NLP
Text data has some special properties:
- Very high dimensional (thousands of words)
- Most word counts are zero
- Word frequency matters more than word order (in classic NLP)
Naive Bayes handles these properties efficiently, which is why it is a classic choice for NLP tasks.
Bayes’ Theorem (Intuition, Not Math)
At its core, Bayes’ Theorem answers this question:
“What is the probability of a class given the observed words?”
Naive Bayes compares:
- How often words appear in each class
- How common each class is overall
Then it picks the class with the highest probability.
Simple Intuition with an Example
Suppose we have two classes:
- Spam
- Not Spam
Words like “free”, “win”, and “offer” appear more often in spam emails.
When a new email arrives, Naive Bayes checks:
- How often each word appeared in spam emails
- How often it appeared in non-spam emails
Then it decides which class is more likely.
Types of Naive Bayes Used in NLP
There are multiple Naive Bayes variants. For NLP, the most common ones are:
- Multinomial Naive Bayes (most popular for text)
- Bernoulli Naive Bayes
- Gaussian Naive Bayes (rare for text)
In practice, Multinomial Naive Bayes is the default choice.
Naive Bayes NLP Pipeline
The standard pipeline looks like this:
- Text cleaning
- Vectorization (Bag of Words or TF-IDF)
- Train Naive Bayes model
- Predict class for new text
You have already learned steps 1 and 2.
Simple Code Example: Naive Bayes for Text Classification
This example uses:
- CountVectorizer for text → numbers
- Multinomial Naive Bayes for classification
Where to run this code:
- Google Colab (recommended)
- Jupyter Notebook (Anaconda)
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
texts = [
"I love this product",
"This is amazing",
"I hate this item",
"This is terrible"
]
labels = [1, 1, 0, 0] # 1 = Positive, 0 = Negative
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
model = MultinomialNB()
model.fit(X, labels)
test_text = ["This product is amazing"]
X_test = vectorizer.transform(test_text)
prediction = model.predict(X_test)
print("Prediction:", prediction)
Output Explanation:
- The model learns word probabilities per class
- It compares probabilities for each class
- The class with the highest probability is selected
Why Multinomial Naive Bayes Is Ideal for Text
Multinomial Naive Bayes works on:
- Word counts
- Word frequencies
This matches perfectly with Bag of Words and TF-IDF representations.
That is why Naive Bayes + TF-IDF is a very common combination.
Advantages of Naive Bayes
- Very fast training and prediction
- Works well with small datasets
- Handles high-dimensional text efficiently
- Simple and interpretable
Limitations of Naive Bayes
- Assumes word independence
- Does not understand word order
- Cannot capture deep semantics
These limitations lead us to more advanced models later.
Real-Life Applications
- Email spam detection
- Sentiment analysis
- Topic classification
- Document filtering
Many large systems still use Naive Bayes for fast baseline models.
Assignment / Homework
Theory:
- Explain why Naive Bayes is called “naive”
- Explain why it still works well for text
Practical:
- Add more training sentences
- Change test sentences
- Observe prediction changes
Practice environment:
- Google Colab
- Jupyter Notebook
Practice Questions
Q1. Why is Naive Bayes suitable for NLP?
Q2. Which Naive Bayes variant is most common for text?
Quick Quiz
Q1. Does Naive Bayes consider word order?
Q2. Can Naive Bayes work with TF-IDF?
Quick Recap
- Naive Bayes is a probabilistic classifier
- It works extremely well for text
- Multinomial NB is the default NLP choice
- Fast, simple, and effective
In the next lesson, we will explore Logistic Regression for Text Classification and see how it differs from Naive Bayes.