Exploring DataFrames
Once data is loaded into Pandas, the next critical step is understanding what the data contains.
In this lesson, you will learn how to explore a DataFrame to understand its structure, columns, data types, and overall quality.
Why Data Exploration Is Important
Before performing analysis or cleaning, you must first understand the data you are working with.
Exploring data helps you:
- Identify available columns
- Understand data types
- Detect missing or incorrect values
- Avoid mistakes during analysis
Loading the Dataset
Start by loading the dataset that you downloaded from the Dataplexa datasets page.
import pandas as pd
df = pd.read_csv("dataplexa_pandas_sales.csv")
Once loaded, the data is stored in a DataFrame called df.
Viewing Column Names
To see all column names in the DataFrame, use:
df.columns
This helps you understand what information is available and how each column is named.
Checking Data Types
Each column in a DataFrame has a data type (number, text, date, etc.).
To check data types, use:
df.dtypes
This is important because many operations depend on correct data types.
Basic Information About the DataFrame
The info() method provides a quick summary of the dataset.
df.info()
It shows:
- Number of rows and columns
- Column names
- Data types
- Non-null value counts
Statistical Summary of Data
For numeric columns, Pandas can generate summary statistics
using describe().
df.describe()
This includes:
- Count
- Mean
- Minimum and maximum values
- Quartiles
This helps you quickly understand distributions and ranges.
Checking for Missing Values
Missing data is common in real-world datasets.
To check how many missing values exist in each column:
df.isnull().sum()
This allows you to decide whether data needs cleaning, which you will learn in later lessons.
Understanding Dataset Shape
To confirm how large the dataset is, use:
df.shape
This returns:
- Total number of rows
- Total number of columns
Practice Exercise
Exercise
Using the dataset:
- List all column names
- Check data types of each column
- Display basic information using
info() - Generate summary statistics
What’s Next?
Now that you understand the structure of your dataset, the next lesson will teach you how to select specific columns and rows from a DataFrame.