Structured Arrays in NumPy
So far, we have worked with NumPy arrays that store values of a single data type. However, real-world data often contains multiple types such as numbers, strings, and dates in the same dataset.
Structured arrays allow NumPy to store different data types together, similar to rows in a table or records in a database.
What Is a Structured Array?
A structured array is a NumPy array where each element is a record containing multiple named fields with different data types.
You can think of it as a lightweight table with fixed columns.
Creating a Structured Array
To create a structured array, you must define a data type (dtype)
that specifies field names and their types.
import numpy as np
dtype = [('name', 'U10'), ('age', 'i4'), ('salary', 'f8')]
employees = np.array([
('Alice', 30, 70000.0),
('Bob', 25, 55000.5),
('Charlie', 35, 82000.0)
], dtype=dtype)
print(employees)
Output:
[('Alice', 30, 70000. )
('Bob', 25, 55000.5)
('Charlie', 35, 82000. )]
Each row is a record with named fields.
Accessing Individual Fields
You can access a specific column by using the field name.
print(employees['name'])
print(employees['salary'])
Output:
['Alice' 'Bob' 'Charlie']
[70000. 55000.5 82000. ]
This behaves like selecting a column in a table.
Accessing Individual Records
You can also access full rows using indexing.
print(employees[1])
Output:
('Bob', 25, 55000.5)
This returns a single record containing all fields.
Filtering Structured Arrays
Structured arrays support boolean filtering just like normal NumPy arrays.
high_salary = employees[employees['salary'] > 60000]
print(high_salary)
Output:
[('Alice', 30, 70000.)
('Charlie', 35, 82000.)]
This is useful for rule-based data selection.
Sorting Structured Arrays
You can sort structured arrays using a specific field.
sorted_by_age = np.sort(employees, order='age')
print(sorted_by_age)
Output:
[('Bob', 25, 55000.5)
('Alice', 30, 70000. )
('Charlie', 35, 82000. )]
Sorting by fields makes structured arrays behave like tables.
Why Use Structured Arrays?
- Efficient storage of mixed data types
- Faster than Python lists of dictionaries
- Good for fixed-format data
- Works well with NumPy operations
Limitations of Structured Arrays
Structured arrays are not as flexible as Pandas DataFrames.
- Field structure is fixed
- No built-in missing value handling
- Limited string operations
For heavy data analysis, Pandas is usually a better choice.
Practice Exercise
Task
- Create a structured array with fields: product, price, quantity
- Add at least three records
- Filter products with price above 100
- Sort by quantity
What’s Next?
In the next lesson, you will learn about NumPy memory layout and how arrays are stored in memory for performance optimization.