Pandas Lesson 20 – Concatenation | Dataplexa

Concatenation in Pandas

In many data projects, datasets are not merged by matching columns, but instead stacked together or placed side by side.

Pandas provides the concat() function to handle this scenario. In this lesson, you will learn how to concatenate DataFrames both vertically and horizontally.


What is Concatenation?

Concatenation means joining DataFrames along a particular axis:

  • Vertical concatenation – adding rows
  • Horizontal concatenation – adding columns

Unlike merging, concatenation does not rely on keys.


Loading the Dataset

We continue using the Pandas sales dataset.

import pandas as pd

sales = pd.read_csv("dataplexa_pandas_sales.csv")

Splitting the Dataset

To demonstrate concatenation, let’s split the dataset into two parts.

sales_part1 = sales.iloc[:5]
sales_part2 = sales.iloc[5:]

Now we have two DataFrames that originally belonged together.


Vertical Concatenation (Row-wise)

To stack DataFrames vertically, use axis=0 (which is the default).

combined_rows = pd.concat(
    [sales_part1, sales_part2]
)

combined_rows.head()

This recreates the original dataset.


Resetting the Index

After concatenation, index values may repeat. To fix this, use ignore_index=True.

pd.concat(
    [sales_part1, sales_part2],
    ignore_index=True
)

Horizontal Concatenation (Column-wise)

Let’s create another DataFrame with additional information.

ratings = pd.DataFrame({
    "rating": [4.5, 4.2, 4.8, 4.0, 4.6, 4.3, 4.1, 4.7]
})

Now concatenate it with the sales DataFrame column-wise.

combined_columns = pd.concat(
    [sales, ratings],
    axis=1
)

combined_columns.head()

Concatenating with Different Column Names

When columns don’t match, Pandas fills missing values with NaN.

extra_data = pd.DataFrame({
    "discount": [10, 15, 5]
})

pd.concat([sales, extra_data], axis=1)

Concatenating Multiple DataFrames

You can concatenate more than two DataFrames at once.

pd.concat(
    [sales_part1, sales_part2, sales_part1],
    ignore_index=True
)

Difference Between merge() and concat()

  • merge() – combines based on keys
  • concat() – combines by stacking or aligning

Choosing the correct method depends on how your data is structured.


Common Concatenation Issues

  • Duplicate indexes
  • Mismatched row counts
  • Unexpected NaN values

Always inspect the result using head() and info().


Practice Exercise

Try the following:

  • Split the dataset into three parts
  • Concatenate them back together
  • Add a new column using horizontal concatenation

What’s Next?

In the next lesson, you will learn how to work efficiently with large datasets and optimize memory usage in Pandas.