NumPy Crash Course: The Math Engine Behind Everything¶
If you have ever used Python for anything involving numbers, you have used NumPy, whether you knew it or not. Pandas is built on NumPy. Scikit-learn is built on NumPy. TensorFlow, PyTorch, SciPy, Matplotlib, and roughly 90% of the Python scientific ecosystem either depend on NumPy directly or reimplement parts of it. If you are doing quantitative finance in Python and you do not understand NumPy, you are writing code against the grain of the entire ecosystem.
NumPy started in 2005 as a merger of two older libraries (Numeric and Numarray) and quickly became the foundation of scientific Python. Its superpower is deceptively simple: it gives you an N-dimensional array type that is memory-contiguous, type-homogeneous, and fast. All the math happens in C code. Your Python is just a thin interface. The result is performance that rivals compiled languages for numerical work, with a syntax that stays readable.
This article covers the NumPy you actually use: array creation, indexing, broadcasting, vectorization, linear algebra, and random number generation. Once you see how it works, you will understand why “loops are slow in Python” and what to do about it.
Python has many libraries. NumPy is the one that makes the others possible.
Why NumPy Exists¶
Python lists are flexible. You can put anything in them: integers, strings, other lists, objects. That flexibility has a cost. Each element is a full Python object, stored somewhere in memory, with a pointer from the list.
For 1 million integers, a Python list stores:
- 1 million integer objects (each ~28 bytes)
- 1 million pointers (8 bytes each)
- Plus list overhead
Total: around 36 MB for a million 8-byte integers.
NumPy stores the same data as a contiguous C array of 8 bytes per integer: 8 MB total. Four times less memory. But the real win is speed.
The Loop vs Vectorization Example¶
import numpy as np
import time
n = 10_000_000
python_list = list(range(n))
numpy_array = np.arange(n)
# Python list comprehension
start = time.time()
result = [x * 2 + 1 for x in python_list]
print(f"Python list: {time.time() - start:.3f} seconds")
# NumPy vectorized
start = time.time()
result = numpy_array * 2 + 1
print(f"NumPy array: {time.time() - start:.3f} seconds")
Typical output:
Python list: 0.842 seconds
NumPy array: 0.015 seconds
56x faster. For 10 million elements. With shorter, more readable code.
Key Insight: NumPy’s speed advantage comes from two sources. First, it stores data as contiguous memory blocks, which is cache-friendly. Second, its operations are implemented in C and use CPU vector instructions (SIMD). When you write
array * 2, you are dispatching to a highly optimized C loop, not iterating in Python.
Writing a Python for loop over 10 million numbers is like trying to empty a swimming pool with a teaspoon. NumPy hands you a firefighter’s hose.
Creating Arrays¶
From Python Lists¶
import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = np.array([[1, 2, 3], [4, 5, 6]]) # 2D array
c = np.array([1.0, 2.0, 3.0], dtype=np.float32)
Pre-Sized Arrays¶
np.zeros(10) # Ten zeros
np.ones((3, 3)) # 3x3 matrix of ones
np.full((2, 4), 7) # 2x4 matrix of sevens
np.empty(5) # Uninitialized (fastest, but contents are garbage)
np.eye(4) # 4x4 identity matrix
Ranges¶
np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
np.linspace(0, 1, 11) # 11 evenly spaced values from 0 to 1
np.logspace(0, 3, 4) # [1, 10, 100, 1000]
Random Arrays¶
np.random.seed(42)
np.random.rand(3, 3) # Uniform [0, 1), shape 3x3
np.random.randn(5) # Standard normal
np.random.randint(0, 100, 10) # 10 integers between 0 and 99
np.random.normal(loc=100, scale=20, size=1000) # Custom normal
From Finance Data¶
# Monthly returns for 5 stocks over 2 years
returns = np.random.normal(0.008, 0.05, size=(24, 5))
# Or loaded from CSV via NumPy directly
data = np.loadtxt("prices.csv", delimiter=",", skiprows=1)
Array Attributes: Know Your Data¶
Every NumPy array has these essential attributes:
arr = np.random.randn(252, 10) # 252 days, 10 stocks
arr.shape # (252, 10) - dimensions
arr.ndim # 2 - number of dimensions
arr.size # 2520 - total number of elements
arr.dtype # dtype('float64') - data type
arr.itemsize # 8 - bytes per element
arr.nbytes # 20160 - total memory
Data Types¶
Common dtypes for finance:
| dtype | Description | Use Case |
|---|---|---|
np.float64 | 64-bit float (default) | Almost everything |
np.float32 | 32-bit float | ML models, saving memory |
np.int64 | 64-bit integer | Counts, IDs |
np.bool_ | Boolean | Masks, conditions |
np.datetime64 | Date/time | Time series |
Changing dtype:
arr_float = arr.astype(np.float32) # Convert to 32-bit float
arr_int = arr.astype(np.int64) # Convert to integer (truncates)
Indexing and Slicing¶
Basic Indexing¶
NumPy extends Python’s slicing to N dimensions:
arr = np.arange(24).reshape(4, 6)
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]]
arr[1] # Row 1: [6, 7, 8, 9, 10, 11]
arr[1, 2] # Element at row 1, column 2: 8
arr[:, 0] # All rows, column 0: [0, 6, 12, 18]
arr[1:3, 2:5] # Rows 1-2, columns 2-4
arr[::2, ::2] # Every other row and column
Boolean Indexing¶
This is where NumPy really shines for data analysis:
returns = np.random.normal(0.001, 0.02, 1000)
# Positive returns
positive = returns[returns > 0]
# Large moves (> 2 standard deviations)
mean = returns.mean()
std = returns.std()
outliers = returns[np.abs(returns - mean) > 2 * std]
# Set all negative values to zero
returns[returns < 0] = 0
Fancy Indexing¶
Select specific elements using arrays of indices:
arr = np.arange(10) * 10 # [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
indices = [0, 2, 4, 6]
arr[indices] # [0, 20, 40, 60]
# 2D example
arr2d = np.arange(20).reshape(4, 5)
rows = [0, 1, 3]
cols = [1, 2, 4]
arr2d[rows, cols] # Elements at (0,1), (1,2), (3,4)
Key Insight: Boolean indexing and fancy indexing return copies, not views. Modifying the result does not modify the original. Basic slicing returns views, so modifying the slice modifies the original. This distinction trips up new users constantly.
Broadcasting: The Hidden Superpower¶
Broadcasting is how NumPy handles operations between arrays of different shapes. It is the feature that makes NumPy code so concise.
The Rules¶
Two arrays are “broadcastable” if, when you align their shapes from the right:
1. The dimensions are equal, OR
2. One of the dimensions is 1
NumPy “stretches” the dimension of size 1 to match the other.
Simple Examples¶
# Scalar + array: scalar broadcasts to every element
np.array([1, 2, 3]) + 10 # [11, 12, 13]
# Array + smaller array
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # shape (3, 3)
b = np.array([10, 20, 30]) # shape (3,)
a + b
# [[11, 22, 33],
# [14, 25, 36],
# [17, 28, 39]]
Here, b is broadcast to each row of a.
Column vs Row Broadcasting¶
# Row vector (1, 3)
row = np.array([[10, 20, 30]])
# Column vector (3, 1)
col = np.array([[1], [2], [3]])
# Broadcast: outer product-like behavior
row + col
# [[11, 21, 31],
# [12, 22, 32],
# [13, 23, 33]]
Finance Example: Portfolio Returns¶
Suppose you have daily returns for 10 stocks (matrix of shape (252, 10)) and want weighted portfolio returns:
returns = np.random.normal(0.001, 0.02, size=(252, 10))
weights = np.array([0.15, 0.10, 0.10, 0.10, 0.05, 0.10, 0.10, 0.10, 0.10, 0.10])
# Broadcast: weights (10,) multiplied against each row of returns (252, 10)
weighted = returns * weights
# Portfolio daily return: sum across stocks
portfolio_returns = weighted.sum(axis=1) # shape (252,)
Broadcasting is NumPy’s equivalent of Excel’s $ anchoring. Once you get used to it, you write calculations in your head, not on paper.
Vectorization: The Core Philosophy¶
“Vectorization” means replacing Python loops with array operations. This is the single most important performance technique in scientific Python.
The Wrong Way¶
# Bad: Python loop
returns = np.random.normal(0.001, 0.02, 1000000)
squared = np.empty_like(returns)
for i in range(len(returns)):
squared[i] = returns[i] ** 2
The Right Way¶
# Good: vectorized
squared = returns ** 2
Universal Functions (ufuncs)¶
NumPy provides element-wise versions of every common math function:
np.sqrt(arr) # Square root
np.exp(arr) # e^x
np.log(arr) # Natural log
np.sin(arr) # Sine
np.abs(arr) # Absolute value
np.maximum(a, b) # Element-wise max
np.minimum(a, b) # Element-wise min
np.round(arr, 2) # Round to 2 decimal places
np.clip(arr, 0, 1) # Clip to range
All of these operate in compiled C, on entire arrays, in one call.
Finance Example: Log Returns¶
prices = np.array([100, 101.5, 99.8, 102.3, 103.7])
# Simple returns
returns = prices[1:] / prices[:-1] - 1
# Log returns (vectorized)
log_returns = np.log(prices[1:] / prices[:-1])
# Cumulative wealth from log returns
cumulative_wealth = np.exp(log_returns.cumsum())
Three one-liners do what would take a dozen lines in a loop.
Reductions and Aggregations¶
Reduce an array along an axis to compute statistics.
Basic Reductions¶
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr.sum() # 45 (all elements)
arr.sum(axis=0) # [12, 15, 18] (sum of each column)
arr.sum(axis=1) # [6, 15, 24] (sum of each row)
arr.mean() # 5.0
arr.std() # Standard deviation
arr.var() # Variance
arr.min() # 1
arr.max() # 9
arr.argmin() # Index of minimum: 0
arr.argmax() # Index of maximum: 8
arr.cumsum() # Cumulative sum
arr.cumprod() # Cumulative product
Axis Matters¶
returns = np.random.normal(0.001, 0.02, size=(252, 10)) # 252 days, 10 stocks
# Average daily return per stock (across all days)
avg_per_stock = returns.mean(axis=0) # shape (10,)
# Average daily return per day (across all stocks)
avg_per_day = returns.mean(axis=1) # shape (252,)
# Grand average
grand_avg = returns.mean() # scalar
Finance Example: Annualized Volatility¶
daily_returns = np.random.normal(0.0008, 0.012, size=(252, 5))
# Per-stock annualized volatility
annual_vol = daily_returns.std(axis=0) * np.sqrt(252)
print(f"Annual volatilities: {annual_vol}")
# Cross-sectional mean (average across stocks, per day)
equal_weight_portfolio = daily_returns.mean(axis=1)
Linear Algebra¶
NumPy’s linalg module handles matrix operations needed for portfolio theory, regression, PCA, and more.
Matrix Multiplication¶
A = np.random.randn(3, 4)
B = np.random.randn(4, 2)
C = A @ B # Matrix multiplication (Python 3.5+)
C = np.dot(A, B) # Equivalent
# Shape: (3, 2)
Common Operations¶
# Transpose
A.T
# Inverse
np.linalg.inv(A) # A must be square
# Determinant
np.linalg.det(A)
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
# Solve Ax = b
x = np.linalg.solve(A, b)
# Singular Value Decomposition
U, S, Vt = np.linalg.svd(A)
# Cholesky decomposition (for Monte Carlo)
L = np.linalg.cholesky(cov_matrix)
Finance Example: Portfolio Optimization (Markowitz)¶
# 5 stocks, covariance matrix
cov = np.array([
[0.04, 0.02, 0.01, 0.00, 0.01],
[0.02, 0.05, 0.02, 0.01, 0.01],
[0.01, 0.02, 0.03, 0.01, 0.01],
[0.00, 0.01, 0.01, 0.02, 0.01],
[0.01, 0.01, 0.01, 0.01, 0.03]
])
expected_returns = np.array([0.10, 0.12, 0.08, 0.06, 0.09])
# Minimum variance portfolio (analytical solution)
inv_cov = np.linalg.inv(cov)
ones = np.ones(5)
weights_min_var = inv_cov @ ones / (ones @ inv_cov @ ones)
print(f"Min variance weights: {weights_min_var}")
print(f"Sum of weights: {weights_min_var.sum():.4f}")
# Portfolio variance
portfolio_variance = weights_min_var @ cov @ weights_min_var
print(f"Portfolio std: {np.sqrt(portfolio_variance):.4f}")
That is Markowitz portfolio optimization in 5 lines. The math behind hundreds of research papers and trillions of dollars of portfolio allocations.
Random Number Generation¶
For Monte Carlo, simulations, and bootstrapping.
Modern API (preferred)¶
# Create a generator with a seed
rng = np.random.default_rng(seed=42)
# Various distributions
uniform = rng.uniform(0, 1, size=1000)
normal = rng.normal(loc=0, scale=1, size=1000)
lognormal = rng.lognormal(mean=0, sigma=1, size=1000)
exponential = rng.exponential(scale=1, size=1000)
binomial = rng.binomial(n=100, p=0.5, size=1000)
# Multivariate normal (for correlated returns)
mean = np.array([0.01, 0.02, 0.015])
cov = np.array([[0.04, 0.01, 0.02],
[0.01, 0.09, 0.03],
[0.02, 0.03, 0.06]])
samples = rng.multivariate_normal(mean, cov, size=10000)
Finance Example: Monte Carlo for Asian Option¶
rng = np.random.default_rng(42)
n_simulations = 100_000
n_steps = 252 # Daily steps in a year
S0 = 100 # Initial price
K = 100 # Strike
T = 1.0 # Time to maturity
r = 0.05 # Risk-free rate
sigma = 0.2 # Volatility
dt = T / n_steps
# Generate random shocks: shape (n_simulations, n_steps)
z = rng.standard_normal((n_simulations, n_steps))
# Build price paths using vectorized operations
log_returns = (r - 0.5 * sigma ** 2) * dt + sigma * np.sqrt(dt) * z
log_paths = np.cumsum(log_returns, axis=1)
paths = S0 * np.exp(log_paths)
# Average price along each path
avg_prices = paths.mean(axis=1)
# Asian call payoff
payoffs = np.maximum(avg_prices - K, 0)
# Discount and average
asian_call = np.exp(-r * T) * payoffs.mean()
print(f"Asian call price: ${asian_call:.4f}")
100,000 Monte Carlo simulations with 252 time steps each. 25.2 million random numbers. Entire calculation runs in under a second.
Key Insight: The modern
np.random.default_rng()API is preferred over the legacynp.random.seed()+np.random.normal()style. It is faster, has better statistical properties, and allows multiple independent generators. Always use the modern API in new code.
Reshaping and Stacking¶
Reshape¶
arr = np.arange(12)
arr.reshape(3, 4) # 3 rows, 4 columns
arr.reshape(2, 2, 3) # 3D array
arr.reshape(-1, 4) # -1 means "figure it out"
Flatten¶
matrix = np.array([[1, 2, 3], [4, 5, 6]])
matrix.flatten() # [1, 2, 3, 4, 5, 6]
matrix.ravel() # Same, but returns a view when possible (faster)
Stacking¶
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.vstack([a, b]) # Vertical: shape (2, 3)
np.hstack([a, b]) # Horizontal: shape (6,)
np.column_stack([a, b]) # As columns: shape (3, 2)
# For 2D arrays
A = np.ones((3, 2))
B = np.zeros((3, 2))
np.hstack([A, B]) # shape (3, 4)
np.vstack([A, B]) # shape (6, 2)
Transposing¶
arr.T # Transpose (for 2D)
arr.transpose() # Same thing
arr.swapaxes(0, 1) # Swap specific axes
Performance Tips¶
1. Preallocate, Do Not Append¶
# Bad: growing the array is O(n^2)
result = np.array([])
for i in range(1000):
result = np.append(result, i)
# Good: preallocate
result = np.empty(1000)
for i in range(1000):
result[i] = i
# Best: vectorize
result = np.arange(1000)
2. Use Appropriate Data Types¶
# If values fit, float32 is half the memory and often faster
arr_big = np.random.randn(1_000_000).astype(np.float32)
3. Avoid Creating Intermediate Arrays¶
# Creates intermediate array: x^2
y = np.sin(x ** 2)
# Avoid intermediate with out= parameter
y = np.empty_like(x)
np.multiply(x, x, out=y)
np.sin(y, out=y)
4. Use numexpr or numba for Complex Expressions¶
import numexpr as ne
# For large arrays, numexpr can be 2-10x faster
result = ne.evaluate("sin(x) * exp(-x**2)")
5. Know When to Leave NumPy¶
For GPU: use CuPy (drop-in NumPy replacement on NVIDIA GPUs).
For sparse arrays: use scipy.sparse.
For out-of-core: use Dask.
For SIMD: sometimes you need to write C extensions.
Common Pitfalls¶
Views vs copies. Basic slicing returns a view (modifications affect the original). Boolean/fancy indexing returns a copy. Use
.copy()explicitly when in doubt.Broadcasting errors. When two arrays have mismatched shapes, NumPy tries to broadcast. If it cannot, you get an error. Always check shapes before operations.
Integer overflow.
np.int8holds values -128 to 127. Arithmetic can overflow silently. Useint64for anything that might grow.NaN propagation. Any operation with NaN produces NaN.
np.nan == np.nanis False. Usenp.isnan()to detect.Axis confusion.
axis=0goes “down” (across rows, collapsing to a row).axis=1goes “across” (through columns, collapsing to a column). This confuses everyone. Always double-check with a small example.Using Python math on NumPy arrays.
math.sqrt(arr)errors on arrays. Usenp.sqrt(arr).
Wrapping Up¶
NumPy is the foundation. Pandas sits on it. Scikit-learn sits on it. Every serious numerical Python library leverages it. If you are writing Python for finance, machine learning, or scientific computing, your productivity is roughly proportional to your NumPy fluency.
The core ideas: arrays are memory-contiguous, operations are vectorized, broadcasting eliminates explicit loops, and reductions happen in C. Master those four concepts and you have 80% of what you need. The remaining 20% (linear algebra, random numbers, reshaping) is reference material you look up when needed.
Vectorized NumPy code is often shorter than equivalent loopy code, and usually 10-100x faster. It reads like mathematical notation. Once you start thinking in arrays, you will never go back.
NumPy makes Python fast enough for most things. Where it is not, you have bigger problems than language choice.
Cheat Sheet¶
Key Questions & Answers¶
Why is NumPy faster than Python lists?¶
NumPy arrays store data as contiguous C arrays (not Python objects) and operate on them in compiled C code. Python lists store pointers to Python objects scattered across memory. For numerical work, NumPy is typically 10-100x faster due to better memory layout, SIMD instructions, and elimination of Python interpreter overhead per element.
What is broadcasting?¶
A mechanism that allows NumPy to apply operations to arrays of different shapes. Dimensions of size 1 get “stretched” to match the other array’s size. It is what lets you write
array + scalaror add a (3,) vector to a (252, 3) matrix without explicit loops.
When should I use @, *, or np.dot?¶
Use
*for element-wise multiplication (Hadamard product). Use@ornp.dotfor matrix multiplication. In modern code,@is preferred. For 1D arrays,@gives the inner product.
How do I simulate correlated random variables?¶
Use
rng.multivariate_normal(mean, cov, size=n). Alternatively, generate uncorrelated samples and apply a Cholesky decomposition of the covariance matrix:L = np.linalg.cholesky(cov); samples = uncorrelated @ L.T. Both approaches produce samples with the specified covariance structure.
Key Concepts at a Glance¶
| Concept | Summary |
|---|---|
| ndarray | NumPy’s N-dimensional array type |
| dtype | Data type (float64, int32, bool, etc.) |
| shape | Tuple of dimensions: (rows, cols) for 2D |
| Vectorization | Replacing loops with array operations |
| Broadcasting | Automatic expansion of smaller arrays for operations |
| ufunc | Universal function (element-wise: sqrt, exp, sin, etc.) |
| axis | 0 = down columns; 1 = across rows; -1 = last axis |
| View vs copy | Basic slicing: view. Boolean/fancy: copy. |
| np.arange | Range as array |
| np.linspace | Evenly spaced values |
| np.random.default_rng | Modern random number generator |
| Matrix mult | A @ B (Python 3.5+) |
| Element-wise mult | A * B |
| Reductions | arr.sum(axis=...), arr.mean(), arr.std() |
| Cumulative | .cumsum(), .cumprod() |
| Reshape | .reshape(new_shape) |
| Transpose | .T |
| Cholesky | np.linalg.cholesky(cov) for correlated samples |
| Solve | np.linalg.solve(A, b) instead of inv(A) @ b |
Sources & Further Reading¶
- Harris, C.R. et al. (2020), Array Programming with NumPy, Nature
- McKinney, W., Python for Data Analysis, O’Reilly
- VanderPlas, J., Python Data Science Handbook, O’Reilly
- NumPy Documentation, User Guide
- NumPy Documentation, NumPy for Matlab Users
- Oliphant, T.E., Guide to NumPy, Continuum Press
- Python Data Science, NumPy Tutorials
- Quantitative Economics, NumPy
- Real Python, NumPy Tutorial
- SciPy Documentation, Linear Algebra