Concatenation in Pandas
In many data projects, datasets are not merged by matching columns, but instead stacked together or placed side by side.
Pandas provides the concat() function to handle this scenario.
In this lesson, you will learn how to concatenate DataFrames
both vertically and horizontally.
What is Concatenation?
Concatenation means joining DataFrames along a particular axis:
- Vertical concatenation – adding rows
- Horizontal concatenation – adding columns
Unlike merging, concatenation does not rely on keys.
Loading the Dataset
We continue using the Pandas sales dataset.
import pandas as pd
sales = pd.read_csv("dataplexa_pandas_sales.csv")
Splitting the Dataset
To demonstrate concatenation, let’s split the dataset into two parts.
sales_part1 = sales.iloc[:5]
sales_part2 = sales.iloc[5:]
Now we have two DataFrames that originally belonged together.
Vertical Concatenation (Row-wise)
To stack DataFrames vertically, use axis=0
(which is the default).
combined_rows = pd.concat(
[sales_part1, sales_part2]
)
combined_rows.head()
This recreates the original dataset.
Resetting the Index
After concatenation, index values may repeat.
To fix this, use ignore_index=True.
pd.concat(
[sales_part1, sales_part2],
ignore_index=True
)
Horizontal Concatenation (Column-wise)
Let’s create another DataFrame with additional information.
ratings = pd.DataFrame({
"rating": [4.5, 4.2, 4.8, 4.0, 4.6, 4.3, 4.1, 4.7]
})
Now concatenate it with the sales DataFrame column-wise.
combined_columns = pd.concat(
[sales, ratings],
axis=1
)
combined_columns.head()
Concatenating with Different Column Names
When columns don’t match, Pandas fills missing values with NaN.
extra_data = pd.DataFrame({
"discount": [10, 15, 5]
})
pd.concat([sales, extra_data], axis=1)
Concatenating Multiple DataFrames
You can concatenate more than two DataFrames at once.
pd.concat(
[sales_part1, sales_part2, sales_part1],
ignore_index=True
)
Difference Between merge() and concat()
- merge() – combines based on keys
- concat() – combines by stacking or aligning
Choosing the correct method depends on how your data is structured.
Common Concatenation Issues
- Duplicate indexes
- Mismatched row counts
- Unexpected NaN values
Always inspect the result using head() and info().
Practice Exercise
Try the following:
- Split the dataset into three parts
- Concatenate them back together
- Add a new column using horizontal concatenation
What’s Next?
In the next lesson, you will learn how to work efficiently with large datasets and optimize memory usage in Pandas.