NumPy Lesson 14 – Random Module | Dataplexa

NumPy Random Module

The NumPy random module is used to generate random numbers, random arrays, and simulated datasets. It plays a critical role in data science, statistics, simulations, and machine learning.

In this lesson, you will learn how to generate random values, control randomness, and create reproducible datasets using NumPy.


Why Random Data Is Important

Random data is commonly used for:

  • Testing algorithms
  • Simulating real-world scenarios
  • Splitting training and testing datasets
  • Initializing machine learning models

NumPy provides fast and reliable tools to handle all of these cases.


Generating Random Numbers

The function np.random.rand() generates random floating-point numbers between 0 and 1.

import numpy as np

random_value = np.random.rand()
print(random_value)

Output:

0.726413

Each time you run this code, the value changes.


Generating Random Arrays

You can generate an array of random values by passing dimensions.

random_array = np.random.rand(5)
print(random_array)

Output:

[0.18 0.92 0.44 0.67 0.31]

Each element is a random float between 0 and 1.


Random Integers

To generate random integers, use np.random.randint().

random_integers = np.random.randint(1, 100, size=5)
print(random_integers)

Output:

[23 78 45 9 61]

This generates 5 random integers between 1 and 100.


Generating Multi-Dimensional Random Arrays

You can create matrices filled with random values.

random_matrix = np.random.rand(3, 4)
print(random_matrix)

Output:

[[0.15 0.72 0.33 0.91]
 [0.84 0.26 0.58 0.11]
 [0.47 0.69 0.05 0.88]]

This is useful for simulating datasets with rows and columns.


Setting a Random Seed

Random results change every time by default. To make results reproducible, you can set a seed.

np.random.seed(42)

values = np.random.rand(3)
print(values)

Output:

[0.37454012 0.95071431 0.73199394]

Using the same seed always produces the same output.


Normal (Gaussian) Distribution

NumPy can generate values following a normal distribution using np.random.randn().

normal_values = np.random.randn(5)
print(normal_values)

Output:

[ 0.49 -0.14  0.65  1.52 -0.23]

These values are centered around zero with a standard deviation of 1.


Shuffling Data

The np.random.shuffle() function randomly rearranges elements of an array.

data = np.array([10, 20, 30, 40, 50])
np.random.shuffle(data)

print(data)

Output:

[30 10 50 20 40]

This is commonly used before splitting datasets.


Real-World Example

Imagine simulating test scores for 100 students:

scores = np.random.randint(35, 100, size=100)
print(scores[:10])

This creates a realistic dataset for analysis and practice.


Practice Exercise

Task

  • Create a random array of 10 values
  • Generate 5 random integers between 50 and 150
  • Set a seed and observe reproducible output

What’s Next?

In the next lesson, you will learn about Statistical Functions in NumPy such as mean, median, variance, and standard deviation.