NLP Lesson 22 – Naive Bayes | Dataplexa

Naive Bayes for NLP

In the previous lesson, you learned the basics of Text Classification and saw how text is converted into numbers and passed to a classifier.

Now we focus on one of the most important and widely used algorithms in NLP: Naive Bayes.

Even though it is mathematically simple, Naive Bayes is extremely powerful for text-based problems and is still used in real-world systems today.


What Is Naive Bayes?

Naive Bayes is a probabilistic classification algorithm based on Bayes’ Theorem.

It predicts the class of a text by calculating how likely the text belongs to each class and choosing the most probable one.

It is called “naive” because it assumes:

  • All words are independent of each other
  • The presence of one word does not affect another

This assumption is unrealistic — but surprisingly, the algorithm still works very well for text.


Why Naive Bayes Works So Well for NLP

Text data has some special properties:

  • Very high dimensional (thousands of words)
  • Most word counts are zero
  • Word frequency matters more than word order (in classic NLP)

Naive Bayes handles these properties efficiently, which is why it is a classic choice for NLP tasks.


Bayes’ Theorem (Intuition, Not Math)

At its core, Bayes’ Theorem answers this question:

“What is the probability of a class given the observed words?”

Naive Bayes compares:

  • How often words appear in each class
  • How common each class is overall

Then it picks the class with the highest probability.


Simple Intuition with an Example

Suppose we have two classes:

  • Spam
  • Not Spam

Words like “free”, “win”, and “offer” appear more often in spam emails.

When a new email arrives, Naive Bayes checks:

  • How often each word appeared in spam emails
  • How often it appeared in non-spam emails

Then it decides which class is more likely.


Types of Naive Bayes Used in NLP

There are multiple Naive Bayes variants. For NLP, the most common ones are:

  • Multinomial Naive Bayes (most popular for text)
  • Bernoulli Naive Bayes
  • Gaussian Naive Bayes (rare for text)

In practice, Multinomial Naive Bayes is the default choice.


Naive Bayes NLP Pipeline

The standard pipeline looks like this:

  1. Text cleaning
  2. Vectorization (Bag of Words or TF-IDF)
  3. Train Naive Bayes model
  4. Predict class for new text

You have already learned steps 1 and 2.


Simple Code Example: Naive Bayes for Text Classification

This example uses:

  • CountVectorizer for text → numbers
  • Multinomial Naive Bayes for classification

Where to run this code:

  • Google Colab (recommended)
  • Jupyter Notebook (Anaconda)
Python Example: Naive Bayes Text Classification
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

texts = [
    "I love this product",
    "This is amazing",
    "I hate this item",
    "This is terrible"
]

labels = [1, 1, 0, 0]  # 1 = Positive, 0 = Negative

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)

test_text = ["This product is amazing"]
X_test = vectorizer.transform(test_text)

prediction = model.predict(X_test)
print("Prediction:", prediction)

Output Explanation:

  • The model learns word probabilities per class
  • It compares probabilities for each class
  • The class with the highest probability is selected

Why Multinomial Naive Bayes Is Ideal for Text

Multinomial Naive Bayes works on:

  • Word counts
  • Word frequencies

This matches perfectly with Bag of Words and TF-IDF representations.

That is why Naive Bayes + TF-IDF is a very common combination.


Advantages of Naive Bayes

  • Very fast training and prediction
  • Works well with small datasets
  • Handles high-dimensional text efficiently
  • Simple and interpretable

Limitations of Naive Bayes

  • Assumes word independence
  • Does not understand word order
  • Cannot capture deep semantics

These limitations lead us to more advanced models later.


Real-Life Applications

  • Email spam detection
  • Sentiment analysis
  • Topic classification
  • Document filtering

Many large systems still use Naive Bayes for fast baseline models.


Assignment / Homework

Theory:

  • Explain why Naive Bayes is called “naive”
  • Explain why it still works well for text

Practical:

  • Add more training sentences
  • Change test sentences
  • Observe prediction changes

Practice environment:

  • Google Colab
  • Jupyter Notebook

Practice Questions

Q1. Why is Naive Bayes suitable for NLP?

Because it handles high-dimensional sparse text data efficiently.

Q2. Which Naive Bayes variant is most common for text?

Multinomial Naive Bayes.

Quick Quiz

Q1. Does Naive Bayes consider word order?

No.

Q2. Can Naive Bayes work with TF-IDF?

Yes, it works very well with TF-IDF.

Quick Recap

  • Naive Bayes is a probabilistic classifier
  • It works extremely well for text
  • Multinomial NB is the default NLP choice
  • Fast, simple, and effective

In the next lesson, we will explore Logistic Regression for Text Classification and see how it differs from Naive Bayes.