K-Nearest Neighbors (KNN)
In the previous lesson, we studied Support Vector Machines and saw how a model can draw a strong decision boundary by focusing on the most critical data points. SVM was all about margins and mathematical optimization.
In this lesson, we move to a completely different way of thinking. There is no training phase, no complex equations, and no model building in advance.
Welcome to K-Nearest Neighbors, commonly known as KNN.
The Core Idea Behind KNN
KNN works exactly the way humans think in everyday life. When you meet a new person, you often compare them with people you already know.
If most of the similar people you know behave in a certain way, you assume this new person might behave the same way.
KNN applies this idea to data. To predict the class of a new data point, it looks at the K most similar points in the dataset and lets them vote.
The class with the majority vote becomes the final prediction.
Why KNN Is Called a Lazy Algorithm
Unlike most machine learning algorithms, KNN does not learn anything during training.
It simply stores the entire dataset in memory. When a new prediction is required, it performs all calculations at that moment.
Because of this behavior, KNN is called a lazy learner.
Using Our Dataset
We continue using the same dataset introduced earlier so that learning feels continuous and realistic.
Dataplexa ML Housing & Customer Dataset
Our task remains to predict whether a loan will be approved.
Preparing the Data
KNN is entirely based on distance. If one feature has larger values than others, it will dominate the distance calculation.
That is why feature scaling is not optional for KNN. It is mandatory.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")
X = df.drop("loan_approved", axis=1)
y = df["loan_approved"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Training the KNN Model
Although we say KNN does not train in the traditional sense, we still create a model object and define the value of K.
The value of K controls how many neighbors participate in the voting.
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
A small K makes the model sensitive to noise, while a large K makes it smoother but sometimes less accurate.
Making Predictions
Now the model compares each test point with all training points to find its nearest neighbors.
y_pred = model.predict(X_test)
y_pred[:10]
Evaluating the Model
Let us evaluate how well KNN performs on unseen data.
from sklearn.metrics import accuracy_score, classification_report
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
KNN often performs well for small to medium datasets, but performance drops when the dataset becomes very large.
Real-World Intuition
Imagine a bank officer reviewing a new loan application. Instead of using rules or formulas, the officer looks at five similar past customers.
If most of them repaid their loans successfully, the officer approves the new application.
This is exactly how KNN behaves.
Mini Practice
Suppose K is set to 1. The model only looks at the single closest customer.
Now imagine K is set to 20. The decision becomes more stable, but individual patterns may get diluted.
This trade-off is central to KNN.
Exercises
Exercise 1:
Why must features be scaled for KNN?
Exercise 2:
What happens if K is too small?
Quick Quiz
Q1. Does KNN learn parameters during training?
In the next lesson, we will study Naive Bayes and understand how probability-based learning works.