Pandas Lesson 3 – Reading Data | Dataplexa

Reading Data into Pandas

In real-world data analysis, data rarely starts inside your code. Most of the time, data comes from external files such as CSV, Excel, or databases.

In this lesson, you will learn how to load external data files into Pandas so you can begin analyzing them.


Why Reading Data Is Important

Almost every data project follows this flow:

  • Receive data from a file or source
  • Load it into Pandas
  • Explore and clean the data
  • Analyze and transform the data

If you cannot read data correctly, all later analysis becomes unreliable.


Common File Formats Used in Pandas

Pandas supports many data formats. The most commonly used are:

  • CSV (Comma-Separated Values)
  • Excel files (.xlsx)
  • JSON files
  • Text files

In this course, we will primarily work with CSV files because they are simple, fast, and widely used in industry.


Downloading the Course Dataset

To follow along with this course, you should download the official Pandas practice dataset provided by Dataplexa.

This dataset will be used across multiple lessons so you can see how Pandas operations work on real data.

Download the dataset file and place it in your working directory before proceeding.


Reading a CSV File Using Pandas

Once the dataset is downloaded, you can load it using the read_csv() function.

import pandas as pd

df = pd.read_csv("dataplexa_pandas_sales.csv")
print(df)

This command reads the CSV file and stores it as a DataFrame.


Understanding What Happens Internally

When Pandas reads a CSV file:

  • Each column becomes a DataFrame column
  • Each row becomes a record
  • Data types are automatically inferred

This automatic behavior makes Pandas powerful but also requires careful inspection.


Viewing the First Few Rows

Large datasets should never be printed entirely. Instead, you should preview a few rows using head().

df.head()

By default, this shows the first 5 rows of the dataset.


Viewing the Last Few Rows

To inspect the end of the dataset, use tail().

df.tail()

This is useful for checking recent records or data completeness.


Checking Dataset Size

Knowing how large your dataset is helps plan analysis and performance.

df.shape

This returns:

  • Total number of rows
  • Total number of columns

Common CSV Reading Issues

While reading CSV files, you may encounter:

  • Incorrect file paths
  • Missing headers
  • Encoding errors
  • Extra separators

Pandas provides additional parameters to handle these cases, which you will learn in later lessons.


Practice Exercise

Exercise

Download the course dataset and load it into Pandas.

  • Display the first 5 rows
  • Display the last 5 rows
  • Check the shape of the dataset

Expected Outcome

You should be comfortable loading external data files and previewing their contents.


What’s Next?

In the next lesson, you will learn how to explore DataFrames using built-in functions to understand columns, data types, and summary statistics.