Python Lesson 30 – | Dataplexa

Python Libraries Overview

Python's greatest strength is not just the language itself — it is the enormous ecosystem of libraries built around it. Whether you are building web applications, analyzing data, automating tasks, or training machine learning models, there is almost certainly a battle-tested library that handles the heavy lifting for you.

This lesson gives you a working overview of the most important libraries across every major domain — what they do, when to use them, and just enough code to see them in action. Think of this as your map of the Python ecosystem before you dive deep into any one area.

The Standard Library — Built In, Always Available

Python ships with a large collection of modules called the standard library. These require no installation — just import and use. You have already seen several of them throughout this course.

# A quick tour of essential standard library modules

import os           # file system, environment variables, paths
import sys          # Python runtime info, command-line arguments
import math         # mathematical functions and constants
import random       # random number generation
import collections  # specialized data structures
import itertools    # tools for working with iterators
import functools    # higher-order functions (lru_cache, reduce, partial)
import pathlib      # modern file path handling
import shutil       # high-level file operations (copy, move, delete)
import subprocess   # run shell commands from Python
import threading    # concurrent execution (covered in Lesson 38)
import logging      # production-quality logging

# Quick examples
import math
print(math.pi)           # 3.141592653589793
print(math.sqrt(144))    # 12.0
print(math.ceil(4.2))    # 5

import random
print(random.randint(1, 10))          # random int between 1 and 10
print(random.choice(["a","b","c"]))   # random item from a list

import collections
counter = collections.Counter("mississippi")
print(counter.most_common(3))   # [('s', 4), ('i', 4), ('p', 2)]
3.141592653589793
12.0
5
7
b
[('s', 4), ('i', 4), ('p', 2)]
  • The full standard library reference lives at docs.python.org/3/library
  • Always check the standard library first before reaching for a third-party package
  • collections alone is worth learning deeply — Counter, defaultdict, OrderedDict, and deque solve common problems elegantly

1. requests — HTTP for Humans

requests is the most downloaded Python package of all time. It makes sending HTTP requests — GET, POST, PUT, DELETE — so simple that the official Python docs recommend it over the built-in urllib.

Install: pip install requests

Use it for: calling REST APIs, downloading files, scraping web pages, authenticating with OAuth tokens.

# requests — fetching data from a public API

import requests

# GET request — fetch JSON data from a public API
url = "https://httpbin.org/get"
response = requests.get(url, params={"name": "dataplexa"})

print("Status code:", response.status_code)   # 200 = success
print("Content type:", response.headers["Content-Type"])

data = response.json()                         # parse response body as JSON
print("URL called:", data["url"])

# POST request — send data to an endpoint
payload = {"username": "alice", "score": 99}
r = requests.post("https://httpbin.org/post", json=payload)
print("Posted:", r.json()["json"])
Status code: 200
Content type: application/json
URL called: https://httpbin.org/get?name=dataplexa
Posted: {'username': 'alice', 'score': 99}
  • response.status_code — 200 OK, 404 Not Found, 500 Server Error
  • response.json() — parses the response body as JSON directly
  • response.text — raw response as a string; response.content — raw bytes
  • Use response.raise_for_status() to automatically raise an exception on 4xx/5xx responses

2. pandas — Data Analysis

pandas is the backbone of data analysis in Python. It introduces the DataFrame — a two-dimensional table similar to a spreadsheet — and gives you powerful tools to load, clean, filter, group, and summarize data.

Install: pip install pandas

Use it for: reading CSV and Excel files, data cleaning, aggregation, merging datasets, exploratory analysis.

# pandas — loading and exploring a DataFrame

import pandas as pd   # pd is the universal alias

data = {
    "product": ["notebook", "pen", "desk", "lamp", "chair"],
    "price":   [4.99, 1.50, 89.99, 24.99, 149.99],
    "sold":    [120, 300, 15, 45, 22]
}

df = pd.DataFrame(data)   # create a DataFrame from a dict

print(df)
print("\nAverage price:", df["price"].mean())
print("Total revenue:", (df["price"] * df["sold"]).sum())

# Filter — items priced under $30
affordable = df[df["price"] < 30]
print("\nAffordable items:\n", affordable)
product price sold
0 notebook 4.99 120
1 pen 1.50 300
2 desk 89.99 15
3 lamp 24.99 45
4 chair 149.99 22

Average price: 54.292
Total revenue: 4768.25

Affordable items:
product price sold
0 notebook 4.99 120
1 pen 1.50 300
3 lamp 24.99 45
  • pd.read_csv("file.csv") and pd.read_excel("file.xlsx") load files in one line
  • df.head(), df.info(), df.describe() are the first three calls on any new dataset
  • df.groupby("column").agg({"value": "sum"}) is the pandas equivalent of SQL GROUP BY

3. NumPy — Numerical Computing

NumPy provides the ndarray — a fast, memory-efficient multi-dimensional array — and hundreds of mathematical operations that run at C speed. Nearly every scientific and machine learning library in Python is built on top of NumPy.

Install: pip install numpy

Use it for: matrix operations, linear algebra, statistical calculations, signal processing, anything needing fast numerical computation.

# NumPy — fast array operations

import numpy as np   # np is the universal alias

arr = np.array([10, 20, 30, 40, 50])

print("Array:", arr)
print("Mean:", arr.mean())
print("Sum:", arr.sum())
print("Max:", arr.max())

# Vectorized math — no loop needed
doubled = arr * 2
print("Doubled:", doubled)

# 2D array (matrix)
matrix = np.array([[1, 2], [3, 4]])
print("Matrix:\n", matrix)
print("Transpose:\n", matrix.T)
print("Determinant:", np.linalg.det(matrix))
Array: [10 20 30 40 50]
Mean: 30.0
Sum: 150
Max: 50
Doubled: [20 40 60 80 100]
Matrix:
[[1 2]
[3 4]]
Transpose:
[[1 3]
[2 4]]
Determinant: -2.0
  • NumPy operations run on entire arrays at once — no Python loops, much faster
  • np.zeros(n), np.ones(n), np.arange(start, stop, step), np.linspace() create arrays quickly
  • NumPy is a prerequisite for pandas, scikit-learn, TensorFlow, and PyTorch

4. matplotlib — Data Visualization

matplotlib is Python's foundational plotting library. It creates static charts — line plots, bar charts, histograms, scatter plots — and saves them as image files or displays them interactively.

Install: pip install matplotlib

Use it for: visualizing data distributions, trends, comparisons, and model results.

# matplotlib — creating a simple line chart

import matplotlib.pyplot as plt   # plt is the universal alias

months  = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
revenue = [12000, 15400, 13800, 17200, 19500, 21000]

plt.figure(figsize=(8, 4))
plt.plot(months, revenue, marker="o", color="#7c3aed", linewidth=2)
plt.title("Monthly Revenue 2024")
plt.xlabel("Month")
plt.ylabel("Revenue ($)")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("revenue.png")   # save to file
# plt.show()                 # display interactively in a notebook or IDE
print("Chart saved.")
Chart saved.
  • plt.plot() line chart, plt.bar() bar chart, plt.hist() histogram, plt.scatter() scatter plot
  • plt.savefig("name.png") saves as PNG, PDF, SVG — format is inferred from the extension
  • seaborn (built on matplotlib) produces statistically-oriented, publication-quality charts with less code

5. Flask — Web Development

Flask is a lightweight web framework for building web applications and REST APIs. It is minimal by design — you add only what you need, making it ideal for APIs, microservices, and small to medium web applications.

Install: pip install flask

Use it for: REST APIs, web dashboards, backend services, prototyping.

# Flask — a minimal web API (save as app.py and run with: flask run)

from flask import Flask, jsonify, request

app = Flask(__name__)

products = [
    {"id": 1, "name": "notebook", "price": 4.99},
    {"id": 2, "name": "pen",      "price": 1.50},
]

@app.route("/products", methods=["GET"])
def get_products():
    return jsonify(products)   # return list as JSON response

@app.route("/products/", methods=["GET"])
def get_product(pid):
    match = next((p for p in products if p["id"] == pid), None)
    if match:
        return jsonify(match)
    return jsonify({"error": "not found"}), 404

# Run with: flask --app app run
# Visit:    http://127.0.0.1:5000/products
  • @app.route() maps a URL path to a Python function
  • jsonify() serializes a Python dict or list to a JSON HTTP response
  • Django is the alternative full-stack framework — larger, more opinionated, built-in admin, ORM, and auth
  • For high-performance async APIs, FastAPI is the modern choice — automatic docs, type hints, async support

6. scikit-learn — Machine Learning

scikit-learn is the standard library for classical machine learning in Python — classification, regression, clustering, dimensionality reduction, and model evaluation. Its consistent API makes switching between algorithms trivial.

Install: pip install scikit-learn

Use it for: training and evaluating ML models, feature engineering, cross-validation, pipelines.

# scikit-learn — training a simple classifier

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load a built-in dataset
iris = load_iris()
X, y = iris.data, iris.target   # features and labels

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train a model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Evaluate
predictions = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, predictions))
Accuracy: 1.0
  • Every scikit-learn model follows the same API: .fit(X, y).predict(X).score(X, y)
  • For deep learning — neural networks, image recognition, NLP — use TensorFlow or PyTorch
  • Lesson 44 covers ML with Python in full detail

Library Quick-Reference Map

Domain Library Install
HTTP / APIs requests, httpx pip install requests
Data Analysis pandas pip install pandas
Numerical Computing numpy pip install numpy
Visualization matplotlib, seaborn, plotly pip install matplotlib
Web Frameworks flask, django, fastapi pip install flask
Machine Learning scikit-learn pip install scikit-learn
Deep Learning tensorflow, torch pip install tensorflow
Database sqlalchemy, sqlite3 (built-in) pip install sqlalchemy
Testing pytest, unittest (built-in) pip install pytest
Web Scraping beautifulsoup4, scrapy pip install beautifulsoup4

Practice Questions

Practice 1. Which Python library is most commonly used for sending HTTP requests to REST APIs?



Practice 2. What is the universal alias used when importing pandas?



Practice 3. What is the core data structure that NumPy provides?



Practice 4. What three methods does every scikit-learn model share in its API?



Practice 5. Which standard library module provides the Counter and defaultdict data structures?



Quiz

Quiz 1. Which library would you use to read a CSV file into a table-like structure for analysis?






Quiz 2. What makes NumPy arrays faster than Python lists for mathematical operations?






Quiz 3. Which web framework is described as lightweight and minimal, ideal for APIs and microservices?






Quiz 4. What does response.raise_for_status() do in the requests library?






Quiz 5. Which library would you choose for deep learning and neural networks instead of scikit-learn?






Next up — OOP Basics: classes, objects, attributes, and methods — the foundation of object-oriented programming in Python.