NumPy Lesson 24 – Structured Arrays | Dataplexa

Structured Arrays in NumPy

So far, we have worked with NumPy arrays that store values of a single data type. However, real-world data often contains multiple types such as numbers, strings, and dates in the same dataset.

Structured arrays allow NumPy to store different data types together, similar to rows in a table or records in a database.


What Is a Structured Array?

A structured array is a NumPy array where each element is a record containing multiple named fields with different data types.

You can think of it as a lightweight table with fixed columns.


Creating a Structured Array

To create a structured array, you must define a data type (dtype) that specifies field names and their types.

import numpy as np

dtype = [('name', 'U10'), ('age', 'i4'), ('salary', 'f8')]

employees = np.array([
    ('Alice', 30, 70000.0),
    ('Bob', 25, 55000.5),
    ('Charlie', 35, 82000.0)
], dtype=dtype)

print(employees)

Output:

[('Alice', 30, 70000. )
 ('Bob', 25, 55000.5)
 ('Charlie', 35, 82000. )]

Each row is a record with named fields.


Accessing Individual Fields

You can access a specific column by using the field name.

print(employees['name'])
print(employees['salary'])

Output:

['Alice' 'Bob' 'Charlie']
[70000.  55000.5 82000. ]

This behaves like selecting a column in a table.


Accessing Individual Records

You can also access full rows using indexing.

print(employees[1])

Output:

('Bob', 25, 55000.5)

This returns a single record containing all fields.


Filtering Structured Arrays

Structured arrays support boolean filtering just like normal NumPy arrays.

high_salary = employees[employees['salary'] > 60000]
print(high_salary)

Output:

[('Alice', 30, 70000.)
 ('Charlie', 35, 82000.)]

This is useful for rule-based data selection.


Sorting Structured Arrays

You can sort structured arrays using a specific field.

sorted_by_age = np.sort(employees, order='age')
print(sorted_by_age)

Output:

[('Bob', 25, 55000.5)
 ('Alice', 30, 70000. )
 ('Charlie', 35, 82000. )]

Sorting by fields makes structured arrays behave like tables.


Why Use Structured Arrays?

  • Efficient storage of mixed data types
  • Faster than Python lists of dictionaries
  • Good for fixed-format data
  • Works well with NumPy operations

Limitations of Structured Arrays

Structured arrays are not as flexible as Pandas DataFrames.

  • Field structure is fixed
  • No built-in missing value handling
  • Limited string operations

For heavy data analysis, Pandas is usually a better choice.


Practice Exercise

Task

  • Create a structured array with fields: product, price, quantity
  • Add at least three records
  • Filter products with price above 100
  • Sort by quantity

What’s Next?

In the next lesson, you will learn about NumPy memory layout and how arrays are stored in memory for performance optimization.