Selecting Data in Pandas
After exploring a DataFrame and understanding its structure, the next essential skill is selecting specific data.
In real-world analysis, you rarely work with the entire dataset at once. Instead, you select required columns, rows, or combinations of both.
Why Data Selection Matters
Selecting data allows you to:
- Focus only on relevant information
- Perform calculations on specific columns
- Filter records for analysis and reporting
- Prepare data for cleaning or visualization
Pandas provides multiple ways to select data, each useful in different situations.
Loading the Dataset
We will continue using the same dataset from previous lessons.
import pandas as pd
df = pd.read_csv("dataplexa_pandas_sales.csv")
Selecting a Single Column
The simplest way to select data is by choosing one column.
You can select a column using square brackets:
df["Product"]
This returns a Pandas Series containing only the values from the selected column.
Selecting Multiple Columns
To select multiple columns, pass a list of column names.
df[["Product", "Sales", "Quantity"]]
The result is a new DataFrame with only those columns.
Selecting Rows by Index
Sometimes you need specific rows instead of columns.
To select rows by their index position, use iloc.
df.iloc[0]
This selects the first row in the DataFrame.
To select multiple rows:
df.iloc[0:5]
This returns the first five rows.
Selecting Rows and Columns Together
You can combine row and column selection using iloc.
df.iloc[0:5, 1:4]
This selects:
- Rows from index 0 to 4
- Columns from index 1 to 3
Selecting Data by Column Labels
Another powerful method is loc,
which uses column and row labels instead of numbers.
df.loc[0, "Sales"]
This selects the value from the Sales column
in the first row.
Selecting multiple rows and columns:
df.loc[0:4, ["Product", "Sales"]]
Difference Between loc and iloc
locuses labels (column names, index labels)ilocuses numeric positions
Understanding this difference is critical for accurate data selection.
Practice Exercise
Exercise
Using the dataset:
- Select the
Salescolumn - Select
ProductandQuantitycolumns - Select the first 10 rows
- Select rows 5 to 10 and only the
Salescolumn
What’s Next?
Now that you know how to select data, the next lesson will focus on filtering data using conditions, which allows you to work with only specific records.