Reading Data into Pandas
In real-world data analysis, data rarely starts inside your code. Most of the time, data comes from external files such as CSV, Excel, or databases.
In this lesson, you will learn how to load external data files into Pandas so you can begin analyzing them.
Why Reading Data Is Important
Almost every data project follows this flow:
- Receive data from a file or source
- Load it into Pandas
- Explore and clean the data
- Analyze and transform the data
If you cannot read data correctly, all later analysis becomes unreliable.
Common File Formats Used in Pandas
Pandas supports many data formats. The most commonly used are:
- CSV (Comma-Separated Values)
- Excel files (.xlsx)
- JSON files
- Text files
In this course, we will primarily work with CSV files because they are simple, fast, and widely used in industry.
Downloading the Course Dataset
To follow along with this course, you should download the official Pandas practice dataset provided by Dataplexa.
This dataset will be used across multiple lessons so you can see how Pandas operations work on real data.
Download the dataset file and place it in your working directory before proceeding.
Reading a CSV File Using Pandas
Once the dataset is downloaded, you can load it using
the read_csv() function.
import pandas as pd
df = pd.read_csv("dataplexa_pandas_sales.csv")
print(df)
This command reads the CSV file and stores it as a DataFrame.
Understanding What Happens Internally
When Pandas reads a CSV file:
- Each column becomes a DataFrame column
- Each row becomes a record
- Data types are automatically inferred
This automatic behavior makes Pandas powerful but also requires careful inspection.
Viewing the First Few Rows
Large datasets should never be printed entirely.
Instead, you should preview a few rows using head().
df.head()
By default, this shows the first 5 rows of the dataset.
Viewing the Last Few Rows
To inspect the end of the dataset, use tail().
df.tail()
This is useful for checking recent records or data completeness.
Checking Dataset Size
Knowing how large your dataset is helps plan analysis and performance.
df.shape
This returns:
- Total number of rows
- Total number of columns
Common CSV Reading Issues
While reading CSV files, you may encounter:
- Incorrect file paths
- Missing headers
- Encoding errors
- Extra separators
Pandas provides additional parameters to handle these cases, which you will learn in later lessons.
Practice Exercise
Exercise
Download the course dataset and load it into Pandas.
- Display the first 5 rows
- Display the last 5 rows
- Check the shape of the dataset
Expected Outcome
You should be comfortable loading external data files and previewing their contents.
What’s Next?
In the next lesson, you will learn how to explore DataFrames using built-in functions to understand columns, data types, and summary statistics.