NumPy Random Module
The NumPy random module is used to generate random numbers, random arrays, and simulated datasets. It plays a critical role in data science, statistics, simulations, and machine learning.
In this lesson, you will learn how to generate random values, control randomness, and create reproducible datasets using NumPy.
Why Random Data Is Important
Random data is commonly used for:
- Testing algorithms
- Simulating real-world scenarios
- Splitting training and testing datasets
- Initializing machine learning models
NumPy provides fast and reliable tools to handle all of these cases.
Generating Random Numbers
The function np.random.rand() generates random floating-point
numbers between 0 and 1.
import numpy as np
random_value = np.random.rand()
print(random_value)
Output:
0.726413
Each time you run this code, the value changes.
Generating Random Arrays
You can generate an array of random values by passing dimensions.
random_array = np.random.rand(5)
print(random_array)
Output:
[0.18 0.92 0.44 0.67 0.31]
Each element is a random float between 0 and 1.
Random Integers
To generate random integers, use np.random.randint().
random_integers = np.random.randint(1, 100, size=5)
print(random_integers)
Output:
[23 78 45 9 61]
This generates 5 random integers between 1 and 100.
Generating Multi-Dimensional Random Arrays
You can create matrices filled with random values.
random_matrix = np.random.rand(3, 4)
print(random_matrix)
Output:
[[0.15 0.72 0.33 0.91]
[0.84 0.26 0.58 0.11]
[0.47 0.69 0.05 0.88]]
This is useful for simulating datasets with rows and columns.
Setting a Random Seed
Random results change every time by default. To make results reproducible, you can set a seed.
np.random.seed(42)
values = np.random.rand(3)
print(values)
Output:
[0.37454012 0.95071431 0.73199394]
Using the same seed always produces the same output.
Normal (Gaussian) Distribution
NumPy can generate values following a normal distribution using
np.random.randn().
normal_values = np.random.randn(5)
print(normal_values)
Output:
[ 0.49 -0.14 0.65 1.52 -0.23]
These values are centered around zero with a standard deviation of 1.
Shuffling Data
The np.random.shuffle() function randomly rearranges elements
of an array.
data = np.array([10, 20, 30, 40, 50])
np.random.shuffle(data)
print(data)
Output:
[30 10 50 20 40]
This is commonly used before splitting datasets.
Real-World Example
Imagine simulating test scores for 100 students:
scores = np.random.randint(35, 100, size=100)
print(scores[:10])
This creates a realistic dataset for analysis and practice.
Practice Exercise
Task
- Create a random array of 10 values
- Generate 5 random integers between 50 and 150
- Set a seed and observe reproducible output
What’s Next?
In the next lesson, you will learn about Statistical Functions in NumPy such as mean, median, variance, and standard deviation.