Pandas Lesson 19 – Merging | Dataplexa

Merging DataFrames in Pandas

In real-world data analysis, information is often spread across multiple datasets. To perform meaningful analysis, these datasets must be combined.

In this lesson, you will learn how to merge DataFrames in Pandas using different join techniques.


Why Do We Merge DataFrames?

Merging allows you to:

  • Combine related data from different sources
  • Add missing details to a dataset
  • Build complete analytical tables

This is very common in business, analytics, and data science projects.


Loading the Main Dataset

We continue using the Pandas sales dataset.

import pandas as pd

sales = pd.read_csv("dataplexa_pandas_sales.csv")

Creating a Second DataFrame

Let’s create a second DataFrame containing product category information.

product_info = pd.DataFrame({
    "product": ["Laptop", "Phone", "Tablet"],
    "category": ["Electronics", "Electronics", "Electronics"]
})

This table contains additional information not present in the sales data.


Basic Merge Using merge()

The merge() function is used to combine DataFrames.

Let’s merge the sales data with product information using the product column.

merged_df = pd.merge(
    sales,
    product_info,
    on="product"
)

merged_df.head()

Understanding Join Types

Pandas supports several types of joins:

  • Inner Join – keeps matching rows only
  • Left Join – keeps all left DataFrame rows
  • Right Join – keeps all right DataFrame rows
  • Outer Join – keeps all rows from both DataFrames

Inner Join

An inner join returns rows that exist in both DataFrames.

pd.merge(
    sales,
    product_info,
    on="product",
    how="inner"
)

Left Join

A left join keeps all rows from the left DataFrame and adds matching data from the right DataFrame.

pd.merge(
    sales,
    product_info,
    on="product",
    how="left"
)

Right Join

A right join keeps all rows from the right DataFrame.

pd.merge(
    sales,
    product_info,
    on="product",
    how="right"
)

Outer Join

An outer join keeps all rows from both DataFrames. Missing values are filled with NaN.

pd.merge(
    sales,
    product_info,
    on="product",
    how="outer"
)

Merging on Different Column Names

Sometimes column names differ between DataFrames.

Example:

pd.merge(
    sales,
    product_info,
    left_on="product",
    right_on="product"
)

Merging on Index

You can also merge using DataFrame indexes.

sales.set_index("product").merge(
    product_info.set_index("product"),
    left_index=True,
    right_index=True
)

Common Merge Issues

  • Duplicate column names
  • Missing keys
  • Unexpected NaN values

Always inspect your merged DataFrame using head() and info().


Practice Exercise

Try the following:

  • Create a second DataFrame with region descriptions
  • Merge it with the sales dataset
  • Try inner, left, and outer joins

What’s Next?

In the next lesson, you will learn about concatenating DataFrames, which allows you to stack datasets vertically or horizontally.