1. NumPy Internals — How It Really Works {#internals}
Understanding NumPy's internals is what separates average users from power users.
Memory Layout Deep Dive
import numpy as np
# ndarray internal structure:
# - data pointer: points to first element in memory
# - dtype: type of each element
# - shape: tuple of dimension sizes
# - strides: bytes to step in each dimension
# - flags: C/F contiguous, owndata, writeable
arr = np.arange(12, dtype=np.float64).reshape(3, 4)
print(arr.shape) # (3, 4)
print(arr.strides) # (32, 8) — 32 bytes per row, 8 bytes per col
print(arr.flags)
# C_CONTIGUOUS : True — row-major (C order)
# F_CONTIGUOUS : False
# OWNDATA : True — this array owns its data
# WRITEABLE : True
# ALIGNED : True
# How strides work:
# element at [i, j] is at: data_pointer + i*strides[0] + j*strides[1]
# arr[2, 3] = data_ptr + 2*32 + 3*8 = data_ptr + 64 + 24 = data_ptr + 88
# Transpose just swaps strides! No data copy!
arr_T = arr.T
print(arr_T.strides) # (8, 32) — strides swapped
print(arr_T.flags) # C_CONTIGUOUS : False, F_CONTIGUOUS : True
# Reshape works by changing shape/strides if array is contiguous
reshaped = arr.reshape(2, 6)
print(np.shares_memory(arr, reshaped)) # True — no copy made!
# When reshape CAN'T avoid a copy
arr_T_reshaped = arr.T.reshape(2, 6)
print(np.shares_memory(arr.T, arr_T_reshaped)) # False — needed a copy!
The ufunc Pipeline
import numpy as np
# When you call np.add(a, b), here's what happens internally:
# 1. Type resolution — find common dtype
# 2. If needed, cast inputs to common type
# 3. Loop over elements using C-level tight loop
# 4. SIMD instructions applied (vectorized at CPU level)
# 5. Result written to output buffer
# You can inspect the ufunc
print(np.add.nin) # 2 — number of inputs
print(np.add.nout) # 1 — number of outputs
print(np.add.ntypes) # Number of type loops
print(np.add.types) # All type combinations supported
# ['bb->b', 'BB->B', 'hh->h', ... 'dd->d', 'gg->g', ...]
# b=int8, B=uint8, h=int16, d=float64, g=float128
# Creating custom ufuncs with np.frompyfunc
def my_add(x, y):
return x + y
my_ufunc = np.frompyfunc(my_add, nin=2, nout=1)
arr = np.array([1, 2, 3, 4, 5])
print(my_ufunc(arr, arr)) # [2 4 6 8 10] — but returns object array
# For typed ufuncs, use np.vectorize (not truly vectorized — still Python loop)
vectorized_fn = np.vectorize(lambda x: x**2 + 2*x + 1)
print(vectorized_fn(arr)) # [4 9 16 25 36] — cleaner but not faster than loop!
2. Stride Tricks — Advanced Memory Manipulation {#strides}
Stride tricks let you create arrays with custom memory layouts — creating virtual copies without actual data duplication.
import numpy as np
from numpy.lib.stride_tricks import as_strided, sliding_window_view
# --- ROLLING WINDOW (most common use case) ---
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Create overlapping windows of size 3 using as_strided
window_size = 3
shape = (len(arr) - window_size + 1, window_size)
strides = (arr.strides[0], arr.strides[0]) # Step by 1 element in both dims
windows = as_strided(arr, shape=shape, strides=strides)
print(windows)
# [[ 1 2 3]
# [ 2 3 4]
# [ 3 4 5]
# ...
# [ 8 9 10]]
# Better way (NumPy 1.20+) — sliding_window_view
windows = sliding_window_view(arr, window_shape=3)
print(windows.shape) # (8, 3)
# 2D sliding windows (for image convolution)
image = np.arange(25).reshape(5, 5)
patches = sliding_window_view(image, window_shape=(3, 3))
print(patches.shape) # (3, 3, 3, 3) — (3 positions y, 3 positions x, 3, 3)
# Compute mean of each patch
patch_means = patches.mean(axis=(-2, -1))
print(patch_means) # 3x3 array of means (like average pooling!)
# --- USING AS_STRIDED FOR MATRIX MULTIPLICATION VIEW ---
# Memory-efficient Toeplitz matrix (no data copy!)
arr = np.array([1, 2, 3, 4, 5, 6, 7])
n = 4 # window size
m = len(arr) - n + 1
toeplitz_shape = (m, n)
toeplitz_strides = (arr.strides[0], arr.strides[0])
T = as_strided(arr, shape=toeplitz_shape, strides=toeplitz_strides)
print(T)
# [[1 2 3 4]
# [2 3 4 5]
# [3 4 5 6]
# [4 5 6 7]]
# All of this shares memory with original arr!
# ⚠️ WARNING: as_strided is dangerous!
# - Can access memory outside array bounds (segfault risk!)
# - Always prefer sliding_window_view when possible
# - When using as_strided, treat result as READ-ONLY
3. Vectorization — Writing Fast NumPy Code {#vectorization}
The #1 performance rule: eliminate Python loops using array operations.
import numpy as np
import time
# --- EXAMPLE 1: Distance Matrix ---
# SLOW — Python loop
def pairwise_distance_loop(X):
n = len(X)
D = np.zeros((n, n))
for i in range(n):
for j in range(n):
D[i, j] = np.sqrt(np.sum((X[i] - X[j])**2))
return D
# FAST — Vectorized using broadcasting
def pairwise_distance_vectorized(X):
# X has shape (n, d)
# Expand to (n, 1, d) and (1, n, d), then subtract
diff = X[:, np.newaxis, :] - X[np.newaxis, :, :] # (n, n, d)
return np.sqrt(np.sum(diff**2, axis=2)) # (n, n)
# FASTEST — Using Einstein summation trick
def pairwise_distance_einsum(X):
# ||a - b||² = ||a||² + ||b||² - 2(a·b)
sq_norms = np.sum(X**2, axis=1) # (n,)
D_sq = sq_norms[:, np.newaxis] + sq_norms[np.newaxis, :] - 2 * (X @ X.T)
D_sq = np.maximum(D_sq, 0) # Numerical stability
return np.sqrt(D_sq)
# Benchmark
X = np.random.randn(500, 10)
t0 = time.perf_counter(); pairwise_distance_loop(X); print(f"Loop: {time.perf_counter()-t0:.3f}s")
t0 = time.perf_counter(); pairwise_distance_vectorized(X); print(f"Vectorized: {time.perf_counter()-t0:.3f}s")
t0 = time.perf_counter(); pairwise_distance_einsum(X); print(f"Einsum: {time.perf_counter()-t0:.3f}s")
# Loop: ~2.5s, Vectorized: ~0.05s, Einsum: ~0.01s
# --- EXAMPLE 2: Conditional Assignment (no loop!) ---
arr = np.random.randn(1_000_000)
# SLOW
result = np.zeros_like(arr)
for i, x in enumerate(arr):
result[i] = x if x > 0 else x * 0.01 # Leaky ReLU
# FAST
result = np.where(arr > 0, arr, arr * 0.01) # Same result, ~100x faster
# Even faster — using clip and arithmetic
result = np.maximum(arr, 0) + 0.01 * np.minimum(arr, 0)
# --- EXAMPLE 3: Softmax (no loop!) ---
def softmax(x):
x_shifted = x - np.max(x, axis=-1, keepdims=True) # Numerical stability!
exp_x = np.exp(x_shifted)
return exp_x / exp_x.sum(axis=-1, keepdims=True)
logits = np.random.randn(32, 10) # 32 samples, 10 classes
probs = softmax(logits)
print(probs.shape) # (32, 10)
print(probs.sum(axis=1)) # All ones — correctly normalized
# --- EXAMPLE 4: One-Hot Encoding (no loop!) ---
def one_hot(labels, num_classes):
n = len(labels)
encoded = np.zeros((n, num_classes), dtype=np.int8)
encoded[np.arange(n), labels] = 1 # Fancy indexing!
return encoded
labels = np.array([0, 2, 1, 3, 2])
one_hot_encoded = one_hot(labels, num_classes=4)
print(one_hot_encoded)
# [[1 0 0 0]
# [0 0 1 0]
# [0 1 0 0]
# [0 0 0 1]
# [0 0 1 0]]
4. Performance Optimization — Profiling & Tuning {#performance}
import numpy as np
import time
# --- TIP 1: Use appropriate dtype ---
# float32 vs float64: 2x memory, nearly same speed on CPU
# But on GPU (via CuPy), float32 is 2x faster!
big_float64 = np.random.randn(10_000_000) # 80 MB
big_float32 = np.random.randn(10_000_000).astype(np.float32) # 40 MB
# Check dtype impact on speed
t0 = time.perf_counter()
np.sum(big_float64 ** 2)
print(f"float64: {time.perf_counter()-t0:.4f}s")
t0 = time.perf_counter()
np.sum(big_float32 ** 2)
print(f"float32: {time.perf_counter()-t0:.4f}s")
# --- TIP 2: Avoid unnecessary copies ---
arr = np.random.randn(1000, 1000)
# BAD — creates 3 temporary arrays
result = (arr * 2) + (arr * 3)
# GOOD — fewer temporaries
result = np.empty_like(arr)
np.multiply(arr, 2, out=result)
np.add(result, arr * 3, out=result)
# BETTER
result = arr * 5 # arr * 2 + arr * 3 = arr * 5 — simpler!
# --- TIP 3: Contiguous memory access ---
arr = np.random.randn(1000, 1000)
# Column-wise sum on C-order array (cache-unfriendly — jumps in memory)
t0 = time.perf_counter()
for _ in range(100):
col_sums = arr.sum(axis=0)
print(f"Column sum: {time.perf_counter()-t0:.4f}s")
# Row-wise sum on C-order array (cache-friendly — sequential access)
t0 = time.perf_counter()
for _ in range(100):
row_sums = arr.sum(axis=1)
print(f"Row sum: {time.perf_counter()-t0:.4f}s")
# TIP: If you do many column operations, use F-order!
arr_f = np.asfortranarray(arr) # F-contiguous copy
t0 = time.perf_counter()
for _ in range(100):
col_sums = arr_f.sum(axis=0)
print(f"Column sum (F-order): {time.perf_counter()-t0:.4f}s")
# --- TIP 4: np.einsum for complex ops ---
A = np.random.randn(100, 200)
B = np.random.randn(200, 150)
# Standard matmul
t0 = time.perf_counter()
C = A @ B
print(f"matmul: {time.perf_counter()-t0:.6f}s")
# einsum (sometimes faster for specific ops)
t0 = time.perf_counter()
C = np.einsum('ij,jk->ik', A, B, optimize=True) # optimize=True is key!
print(f"einsum: {time.perf_counter()-t0:.6f}s")
# --- TIP 5: np.einsum optimize parameter ---
A = np.random.randn(10, 100, 50)
B = np.random.randn(10, 50, 80)
C = np.random.randn(10, 80, 30)
# Without optimization — may use suboptimal order
result1 = np.einsum('bij,bjk,bkl->bil', A, B, C)
# With optimization — finds best contraction order
result2 = np.einsum('bij,bjk,bkl->bil', A, B, C, optimize='optimal')
print(np.allclose(result1, result2)) # True, but result2 is faster
# --- TIP 6: Pre-allocate result arrays ---
n_iterations = 1000
data_size = 10000
# BAD — appending to list then converting
results_bad = []
for i in range(n_iterations):
results_bad.append(np.random.randn(data_size).mean())
results_bad = np.array(results_bad)
# GOOD — pre-allocate
results_good = np.empty(n_iterations)
for i in range(n_iterations):
results_good[i] = np.random.randn(data_size).mean()
# BEST — fully vectorized
results_best = np.random.randn(n_iterations, data_size).mean(axis=1)
# --- TIP 7: Use np.maximum/minimum instead of np.clip when possible ---
arr = np.random.randn(1_000_000)
# clip (one function call)
t0 = time.perf_counter()
np.clip(arr, 0, 1)
print(f"clip: {time.perf_counter()-t0:.6f}s")
# ReLU using maximum (faster for single bound)
t0 = time.perf_counter()
np.maximum(arr, 0)
print(f"maximum: {time.perf_counter()-t0:.6f}s")
Profiling NumPy Code
import numpy as np
import cProfile
import pstats
import io
def my_function():
arr = np.random.randn(10000, 100)
result = np.zeros(10000)
for i in range(len(arr)):
result[i] = np.sum(arr[i] ** 2) # Intentionally slow!
return result
# Profile
pr = cProfile.Profile()
pr.enable()
my_function()
pr.disable()
# Print stats
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
ps.print_stats(10) # Top 10 bottlenecks
print(s.getvalue())
# Line profiler (install with: pip install line_profiler)
# %lprun -f my_function my_function() # In Jupyter
# Memory profiler (install with: pip install memory_profiler)
# %memit my_function() # In Jupyter
5. NumPy + Numba — JIT Compilation {#numba}
When vectorization isn't possible, Numba compiles Python to machine code.
import numpy as np
import numba as nb
import time
# Scenario: Complex logic that can't easily be vectorized
def python_loop(arr):
result = np.empty_like(arr)
for i in range(len(arr)):
if arr[i] > 0:
result[i] = np.sqrt(arr[i]) * np.log(arr[i] + 1)
else:
result[i] = arr[i] ** 2
return result
# Numba JIT — just add @nb.jit decorator!
@nb.jit(nopython=True, parallel=True, cache=True)
def numba_loop(arr):
result = np.empty_like(arr)
for i in nb.prange(len(arr)): # prange = parallel range
if arr[i] > 0:
result[i] = np.sqrt(arr[i]) * np.log(arr[i] + 1)
else:
result[i] = arr[i] ** 2
return result
# Vectorized NumPy (for comparison)
def numpy_vectorized(arr):
result = np.where(arr > 0,
np.sqrt(np.abs(arr)) * np.log(np.abs(arr) + 1),
arr ** 2)
return result
arr = np.random.randn(10_000_000)
# Warmup Numba (first call compiles)
numba_loop(arr[:100])
t0 = time.perf_counter()
python_loop(arr[:100000]) # Subset, too slow for full array
print(f"Python (100k): {time.perf_counter()-t0:.3f}s")
t0 = time.perf_counter()
numpy_vectorized(arr)
print(f"NumPy vectorized: {time.perf_counter()-t0:.3f}s")
t0 = time.perf_counter()
numba_loop(arr)
print(f"Numba parallel: {time.perf_counter()-t0:.3f}s")
# Results approximate:
# Python (100k): 0.15s → extrapolated 15s for 10M
# NumPy vectorized: 0.18s
# Numba parallel: 0.03s → 5x faster than NumPy!
# Numba for custom ufuncs
@nb.vectorize(['float64(float64)', 'float32(float32)'], target='cpu')
def leaky_relu(x):
return x if x > 0 else 0.01 * x
result = leaky_relu(arr) # Vectorized, uses SIMD
# GPU target (requires CUDA)
# @nb.vectorize(['float64(float64)'], target='cuda')
6. Advanced Indexing Patterns {#advanced-indexing}
import numpy as np
# --- MULTI-DIMENSIONAL FANCY INDEXING ---
arr = np.arange(100).reshape(10, 10)
# Select a checkerboard pattern
rows = np.array([0, 0, 1, 1, 2, 2])
cols = np.array([0, 2, 1, 3, 0, 4])
print(arr[rows, cols]) # Values at (0,0),(0,2),(1,1),(1,3),(2,0),(2,4)
# Select block submatrix (not fancy indexing — this is two slices)
block = arr[2:5, 3:7]
# Select specific rows AND specific columns (fancy + slice mix)
selected_rows = arr[[0, 3, 7]] # 3 specific rows, all cols
selected_cols = arr[:, [1, 4, 8]] # All rows, 3 specific cols
submatrix = arr[np.ix_([0,3,7], [1,4,8])] # Specific rows AND cols via meshgrid
# np.ix_ — creates open meshgrid for indexing
print(submatrix.shape) # (3, 3)
# --- ADVANCED BOOLEAN OPERATIONS ---
arr = np.random.randn(5, 5)
# Create 2D mask
mask_rows = np.array([True, False, True, False, True])
mask_cols = np.array([True, True, False, False, True])
# Select rows using boolean
print(arr[mask_rows]) # 3 rows where mask is True
print(arr[:, mask_cols]) # Columns where mask is True
print(arr[np.ix_(mask_rows, mask_cols)]) # Submatrix at intersection
# --- SCATTER OPERATION (inverse of gather) ---
source = np.array([1, 2, 3])
target = np.zeros(10)
indices = np.array([2, 5, 8])
target[indices] = source # Scatter: place source values at indices
print(target) # [0 0 1 0 0 2 0 0 3 0]
# Scatter add (accumulate into target)
np.add.at(target, indices, source) # target[indices] += source (handles repeats!)
# --- MULTI-DIMENSIONAL TAKE AND PUT ---
arr = np.arange(20).reshape(4, 5)
# np.take — gather along axis
np.take(arr, [0, 2], axis=0) # Take rows 0 and 2
np.take(arr, [1, 3, 4], axis=1) # Take cols 1, 3, 4
# np.put — scatter into flattened array
arr = np.arange(20)
np.put(arr, [0, 5, 10, 15], [99, 88, 77, 66])
print(arr) # [99 1 2 3 4 88 6 7 8 9 77 ...]
# np.putmask — conditional put
arr = np.arange(10, dtype=float)
np.putmask(arr, arr > 5, arr ** 2) # Where condition, put squared value
# --- UNRAVEL AND RAVEL INDEX ---
arr = np.arange(24).reshape(4, 6)
# Get flat index of maximum
flat_idx = np.argmax(arr)
print(flat_idx) # 23
# Convert flat index to multi-dim indices
row, col = np.unravel_index(flat_idx, arr.shape)
print(row, col) # 3, 5
# Convert multi-dim indices to flat index
flat = np.ravel_multi_index((row, col), arr.shape)
print(flat) # 23
# Useful for getting top-k indices in a 2D array
top3_flat = np.argpartition(arr.flat, -3)[-3:]
top3_2d = np.array(np.unravel_index(top3_flat, arr.shape)).T
print(top3_2d) # [[3,3], [3,4], [3,5]]
7. Real-World Use Cases — Complete Pipelines {#real-world}
Use Case 1: ETL Data Pipeline
import numpy as np
# Simulate raw sensor data with missing values and outliers
np.random.seed(42)
n_sensors = 10
n_timestamps = 1000
# Raw data: temperatures in °C (some missing: -9999, some outliers)
raw_data = np.random.normal(25, 5, (n_timestamps, n_sensors))
raw_data[np.random.rand(*raw_data.shape) < 0.05] = -9999 # 5% missing
raw_data[np.random.rand(*raw_data.shape) < 0.02] += 100 # 2% outliers
print(f"Raw data shape: {raw_data.shape}")
print(f"Missing values: {np.sum(raw_data == -9999)}")
# STEP 1: Handle missing values — replace with NaN
data = raw_data.copy().astype(np.float64)
data[data == -9999] = np.nan
# STEP 2: Detect and remove outliers using IQR
def remove_outliers_iqr(arr, factor=3.0):
q1 = np.nanpercentile(arr, 25, axis=0)
q3 = np.nanpercentile(arr, 75, axis=0)
iqr = q3 - q1
lower = q1 - factor * iqr
upper = q3 + factor * iqr
mask = (arr < lower) | (arr > upper) # Outlier positions
arr_clean = arr.copy()
arr_clean[mask] = np.nan # Replace outliers with NaN
return arr_clean
data = remove_outliers_iqr(data)
print(f"After outlier removal — NaN count: {np.sum(np.isnan(data))}")
# STEP 3: Forward-fill NaN values (simple imputation)
def forward_fill(arr):
"""Fill NaN with previous valid value, column by column."""
result = arr.copy()
for col in range(arr.shape[1]):
col_data = result[:, col]
nan_mask = np.isnan(col_data)
not_nan_idx = np.where(~nan_mask)[0]
if len(not_nan_idx) == 0:
continue
# Use searchsorted to find previous valid value
fill_idx = np.searchsorted(not_nan_idx, np.where(nan_mask)[0], side='right') - 1
valid_fills = fill_idx >= 0
nan_positions = np.where(nan_mask)[0][valid_fills]
source_positions = not_nan_idx[fill_idx[valid_fills]]
col_data[nan_positions] = col_data[source_positions]
return result
data = forward_fill(data)
# STEP 4: Remaining NaN (at start) — fill with column median
col_medians = np.nanmedian(data, axis=0)
nan_positions = np.where(np.isnan(data))
data[nan_positions] = np.take(col_medians, nan_positions[1])
# STEP 5: Normalize to [0, 1] per sensor
col_min = data.min(axis=0)
col_max = data.max(axis=0)
normalized = (data - col_min) / (col_max - col_min + 1e-8)
# STEP 6: Compute rolling statistics
window = 10
rolling_mean = np.array([
normalized[i:i+window].mean(axis=0)
for i in range(len(normalized) - window + 1)
])
# Vectorized version using stride tricks
from numpy.lib.stride_tricks import sliding_window_view
windows = sliding_window_view(normalized, window_shape=(window, n_sensors))
rolling_mean_fast = windows[0].mean(axis=(1, 2, 3)) # Adjust as needed
print(f"Final data shape: {normalized.shape}")
print(f"No remaining NaN: {not np.any(np.isnan(normalized))}")
print(f"Range: [{normalized.min():.4f}, {normalized.max():.4f}]")
Use Case 2: Text Vectorization (Bag of Words)
import numpy as np
# Documents
documents = [
["the", "cat", "sat", "on", "the", "mat"],
["the", "dog", "sat", "on", "the", "log"],
["cat", "and", "dog", "are", "friends"],
]
# Build vocabulary
vocab = sorted(set(word for doc in documents for word in doc))
word2idx = {word: idx for idx, word in enumerate(vocab)}
print(f"Vocabulary size: {len(vocab)}")
print(f"Vocabulary: {vocab}")
# Create BoW matrix (term frequency)
bow_matrix = np.zeros((len(documents), len(vocab)), dtype=np.float32)
for doc_idx, doc in enumerate(documents):
for word in doc:
bow_matrix[doc_idx, word2idx[word]] += 1
print("BoW matrix:")
print(bow_matrix)
# TF-IDF
tf = bow_matrix / bow_matrix.sum(axis=1, keepdims=True) # Term Frequency
df = np.sum(bow_matrix > 0, axis=0) # Document Frequency
idf = np.log((len(documents) + 1) / (df + 1)) + 1 # Inverse Document Frequency
tfidf = tf * idf # TF-IDF
# Cosine similarity between documents
def cosine_similarity_matrix(X):
norms = np.linalg.norm(X, axis=1, keepdims=True)
X_normalized = X / (norms + 1e-8)
return X_normalized @ X_normalized.T
similarity = cosine_similarity_matrix(tfidf)
print("\nDocument similarity matrix:")
print(np.round(similarity, 3))
8. Data Science Workflow with NumPy {#data-science}
import numpy as np
# Simulated dataset: House prices
np.random.seed(42)
n = 500
# Features: size(sqft), bedrooms, age, distance_to_city
size = np.random.uniform(500, 3000, n)
bedrooms = np.random.randint(1, 6, n)
age = np.random.uniform(0, 50, n)
distance = np.random.uniform(1, 50, n)
noise = np.random.normal(0, 20000, n)
# Target: price
price = (100 * size + 50000 * bedrooms - 1000 * age - 5000 * distance + 200000 + noise)
# --- EXPLORATORY DATA ANALYSIS ---
X = np.column_stack([size, bedrooms, age, distance])
y = price
print(f"Dataset: {X.shape[0]} samples, {X.shape[1]} features")
print(f"\nFeature statistics:")
feature_names = ['Size', 'Bedrooms', 'Age', 'Distance']
for i, name in enumerate(feature_names):
col = X[:, i]
print(f" {name}: mean={col.mean():.1f}, std={col.std():.1f}, "
f"min={col.min():.1f}, max={col.max():.1f}")
print(f"\nTarget (Price): mean={y.mean():,.0f}, std={y.std():,.0f}")
# Correlation with target
for i, name in enumerate(feature_names):
corr = np.corrcoef(X[:, i], y)[0, 1]
print(f" {name} correlation with Price: {corr:.4f}")
# --- TRAIN/TEST SPLIT (without sklearn) ---
indices = np.random.permutation(n)
train_size = int(0.8 * n)
train_idx = indices[:train_size]
test_idx = indices[train_size:]
X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]
# --- FEATURE SCALING ---
# Fit scaler on training data only
train_mean = X_train.mean(axis=0)
train_std = X_train.std(axis=0)
X_train_scaled = (X_train - train_mean) / train_std
X_test_scaled = (X_test - train_mean) / train_std # Use training stats!
# --- PRINCIPAL COMPONENT ANALYSIS (PCA) from scratch ---
def pca_numpy(X, n_components):
"""Manual PCA using NumPy SVD."""
# Center the data
X_centered = X - X.mean(axis=0)
# Compute covariance matrix
cov = X_centered.T @ X_centered / (len(X) - 1)
# Eigendecomposition (for symmetric matrix, use eigh — more stable)
eigenvalues, eigenvectors = np.linalg.eigh(cov)
# Sort by descending eigenvalue
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
# Explained variance ratio
explained_var_ratio = eigenvalues / eigenvalues.sum()
# Project data
components = eigenvectors[:, :n_components]
X_projected = X_centered @ components
return X_projected, explained_var_ratio[:n_components], components
X_pca, var_ratio, components = pca_numpy(X_train_scaled, n_components=2)
print(f"\nPCA explained variance: {var_ratio}")
print(f"Total explained: {var_ratio.sum():.2%}")
# --- EVALUATION METRICS ---
def regression_metrics(y_true, y_pred):
errors = y_true - y_pred
mae = np.mean(np.abs(errors))
mse = np.mean(errors ** 2)
rmse = np.sqrt(mse)
ss_res = np.sum(errors ** 2)
ss_tot = np.sum((y_true - y_true.mean()) ** 2)
r2 = 1 - ss_res / ss_tot
mape = np.mean(np.abs(errors / y_true)) * 100
return {'MAE': mae, 'MSE': mse, 'RMSE': rmse, 'R²': r2, 'MAPE%': mape}
9. NumPy in Machine Learning — From Scratch {#ml}
Neural Network Forward Pass
import numpy as np
# A simple 2-layer neural network for binary classification
class SimpleNN:
def __init__(self, input_dim, hidden_dim, output_dim, seed=42):
rng = np.random.default_rng(seed)
# Xavier initialization
scale1 = np.sqrt(2.0 / input_dim)
scale2 = np.sqrt(2.0 / hidden_dim)
self.W1 = rng.normal(0, scale1, (input_dim, hidden_dim))
self.b1 = np.zeros((1, hidden_dim))
self.W2 = rng.normal(0, scale2, (hidden_dim, output_dim))
self.b2 = np.zeros((1, output_dim))
def relu(self, x):
return np.maximum(x, 0)
def relu_grad(self, x):
return (x > 0).astype(float)
def sigmoid(self, x):
return 1 / (1 + np.exp(-np.clip(x, -500, 500))) # Clip for stability
def forward(self, X):
self.X = X
self.Z1 = X @ self.W1 + self.b1 # Linear: (n, hidden)
self.A1 = self.relu(self.Z1) # ReLU: (n, hidden)
self.Z2 = self.A1 @ self.W2 + self.b2 # Linear: (n, output)
self.A2 = self.sigmoid(self.Z2) # Sigmoid: (n, output)
return self.A2
def compute_loss(self, y_pred, y_true):
# Binary cross-entropy
eps = 1e-8
return -np.mean(y_true * np.log(y_pred + eps) + (1 - y_true) * np.log(1 - y_pred + eps))
def backward(self, y_true):
n = len(y_true)
# Output layer gradient
dZ2 = (self.A2 - y_true) / n # (n, 1)
dW2 = self.A1.T @ dZ2 # (hidden, 1)
db2 = dZ2.sum(axis=0, keepdims=True) # (1, 1)
# Hidden layer gradient
dA1 = dZ2 @ self.W2.T # (n, hidden)
dZ1 = dA1 * self.relu_grad(self.Z1) # (n, hidden)
dW1 = self.X.T @ dZ1 # (input, hidden)
db1 = dZ1.sum(axis=0, keepdims=True) # (1, hidden)
return dW1, db1, dW2, db2
def update(self, dW1, db1, dW2, db2, lr=0.01):
self.W1 -= lr * dW1
self.b1 -= lr * db1
self.W2 -= lr * dW2
self.b2 -= lr * db2
def train(self, X, y, epochs=100, lr=0.01, batch_size=32):
n = len(X)
losses = []
rng = np.random.default_rng(42)
for epoch in range(epochs):
# Mini-batch SGD
indices = rng.permutation(n)
epoch_loss = 0
for i in range(0, n, batch_size):
batch_idx = indices[i:i+batch_size]
X_batch = X[batch_idx]
y_batch = y[batch_idx]
y_pred = self.forward(X_batch)
loss = self.compute_loss(y_pred, y_batch)
epoch_loss += loss
gradients = self.backward(y_batch)
self.update(*gradients, lr=lr)
losses.append(epoch_loss / (n / batch_size))
if epoch % 20 == 0:
print(f"Epoch {epoch}: Loss = {losses[-1]:.4f}")
return losses
# Usage
np.random.seed(42)
n = 1000
X = np.random.randn(n, 4)
y = ((X[:, 0] + X[:, 1] > 0) & (X[:, 2] > 0)).astype(float).reshape(-1, 1)
model = SimpleNN(input_dim=4, hidden_dim=8, output_dim=1)
losses = model.train(X, y, epochs=100, lr=0.1, batch_size=32)
# Evaluate
y_pred = model.forward(X)
accuracy = np.mean((y_pred > 0.5) == y)
print(f"Final accuracy: {accuracy:.4f}")
10. NumPy for Image Processing {#image}
import numpy as np
# Images are 3D arrays: (height, width, channels)
# channels: 1=grayscale, 3=RGB, 4=RGBA
# Create a synthetic RGB image
np.random.seed(42)
image = np.random.randint(0, 256, (480, 640, 3), dtype=np.uint8)
print(f"Image shape: {image.shape}") # (480, 640, 3)
print(f"Image dtype: {image.dtype}") # uint8
print(f"Value range: [{image.min()}, {image.max()}]")
# --- CHANNEL OPERATIONS ---
R, G, B = image[:, :, 0], image[:, :, 1], image[:, :, 2]
# Convert to grayscale (weighted average — human perception)
grayscale = 0.2989 * R + 0.5870 * G + 0.1140 * B
grayscale = grayscale.astype(np.uint8)
print(f"Grayscale shape: {grayscale.shape}") # (480, 640)
# --- PIXEL OPERATIONS ---
# Brighten (clip to avoid overflow)
brighter = np.clip(image.astype(np.int32) + 50, 0, 255).astype(np.uint8)
# Darken
darker = (image * 0.5).astype(np.uint8)
# Invert colors
inverted = 255 - image
# Apply channel-specific adjustment (blue tint)
blue_tint = image.copy()
blue_tint[:, :, 2] = np.clip(blue_tint[:, :, 2].astype(int) + 80, 0, 255).astype(np.uint8)
# --- CROPPING ---
# Crop center 200x200
h, w = image.shape[:2]
cy, cx = h // 2, w // 2
crop = image[cy-100:cy+100, cx-100:cx+100]
# --- FLIPPING ---
flipped_h = image[:, ::-1] # Horizontal flip
flipped_v = image[::-1, :] # Vertical flip
rotated_180 = image[::-1, ::-1] # 180° rotation
# --- CONVOLUTION (manual 2D convolution) ---
def convolve2d(image, kernel):
"""Apply convolution to a 2D (grayscale) image."""
kh, kw = kernel.shape
ph, pw = kh // 2, kw // 2
# Pad image
padded = np.pad(image, ((ph, ph), (pw, pw)), mode='reflect')
# Sliding window view
from numpy.lib.stride_tricks import sliding_window_view
windows = sliding_window_view(padded, window_shape=(kh, kw))
# Apply kernel
return np.einsum('ijkl,kl->ij', windows, kernel)
# Gaussian blur kernel
def gaussian_kernel(size=5, sigma=1.0):
ax = np.linspace(-(size-1)/2, (size-1)/2, size)
gauss = np.exp(-ax**2 / (2 * sigma**2))
kernel = np.outer(gauss, gauss)
return kernel / kernel.sum()
# Edge detection (Sobel)
sobel_x = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], dtype=float)
sobel_y = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]], dtype=float)
edges_x = convolve2d(grayscale.astype(float), sobel_x)
edges_y = convolve2d(grayscale.astype(float), sobel_y)
edges = np.sqrt(edges_x**2 + edges_y**2)
edges = (edges / edges.max() * 255).astype(np.uint8)
# --- COLOR HISTOGRAMS ---
r_hist, bins = np.histogram(R.flatten(), bins=256, range=(0, 255))
g_hist, _ = np.histogram(G.flatten(), bins=256, range=(0, 255))
b_hist, _ = np.histogram(B.flatten(), bins=256, range=(0, 255))
# Histogram equalization (grayscale)
def histogram_equalize(img):
hist, bins = np.histogram(img.flatten(), 256, range=(0, 256))
cdf = hist.cumsum()
cdf_normalized = (cdf - cdf.min()) * 255 / (cdf.max() - cdf.min())
return cdf_normalized[img].astype(np.uint8)
equalized = histogram_equalize(grayscale)
11. NumPy for Financial Analysis {#finance}
import numpy as np
# Simulate stock price data (Geometric Brownian Motion)
np.random.seed(42)
n_days = 252 # Trading days in a year
n_stocks = 5
# Parameters
S0 = np.array([100, 150, 200, 75, 120]) # Initial prices
mu = np.array([0.10, 0.12, 0.08, 0.15, 0.09]) / 252 # Daily drift
sigma = np.array([0.20, 0.25, 0.15, 0.30, 0.18]) / np.sqrt(252) # Daily vol
# Simulate price paths
dt = 1
returns = np.random.normal(mu, sigma, (n_days, n_stocks))
price_paths = S0 * np.exp(np.cumsum(returns, axis=0)) # (252, 5)
price_paths = np.vstack([S0, price_paths]) # Add initial prices
print(f"Price paths shape: {price_paths.shape}") # (253, 5)
# --- RETURNS ANALYSIS ---
daily_returns = np.diff(price_paths, axis=0) / price_paths[:-1]
log_returns = np.diff(np.log(price_paths), axis=0)
# Annualized statistics
annual_return = daily_returns.mean(axis=0) * 252
annual_vol = daily_returns.std(axis=0, ddof=1) * np.sqrt(252)
sharpe_ratio = annual_return / annual_vol
print("\nStock Performance:")
stock_names = ['AAPL', 'MSFT', 'AMZN', 'TSLA', 'GOOGL']
for i, name in enumerate(stock_names):
print(f" {name}: Return={annual_return[i]:.2%}, Vol={annual_vol[i]:.2%}, "
f"Sharpe={sharpe_ratio[i]:.2f}")
# --- PORTFOLIO ANALYSIS ---
# Equal weight portfolio
n_assets = n_stocks
weights = np.ones(n_assets) / n_assets
# Portfolio returns
portfolio_returns = daily_returns @ weights
portfolio_annual_return = portfolio_returns.mean() * 252
portfolio_annual_vol = portfolio_returns.std(ddof=1) * np.sqrt(252)
portfolio_sharpe = portfolio_annual_return / portfolio_annual_vol
# Correlation matrix
corr_matrix = np.corrcoef(daily_returns.T)
cov_matrix = np.cov(daily_returns.T)
# Portfolio variance (using matrix algebra)
portfolio_variance = weights @ cov_matrix @ weights * 252
portfolio_vol_matrix = np.sqrt(portfolio_variance)
print(f"\nPortfolio: Return={portfolio_annual_return:.2%}, "
f"Vol={portfolio_vol_matrix:.2%}, Sharpe={portfolio_sharpe:.2f}")
# --- RISK METRICS ---
# Value at Risk (VaR) — 95% and 99%
VaR_95 = np.percentile(portfolio_returns, 5)
VaR_99 = np.percentile(portfolio_returns, 1)
print(f"\n1-Day VaR (95%): {VaR_95:.4f} ({VaR_95*100:.2f}%)")
print(f"1-Day VaR (99%): {VaR_99:.4f} ({VaR_99*100:.2f}%)")
# Expected Shortfall (CVaR) — average loss beyond VaR
ES_95 = portfolio_returns[portfolio_returns <= VaR_95].mean()
print(f"1-Day ES (95%): {ES_95:.4f} ({ES_95*100:.2f}%)")
# Maximum Drawdown
portfolio_cumulative = np.cumprod(1 + portfolio_returns)
running_max = np.maximum.accumulate(portfolio_cumulative)
drawdowns = (portfolio_cumulative - running_max) / running_max
max_drawdown = drawdowns.min()
print(f"Maximum Drawdown: {max_drawdown:.4f} ({max_drawdown*100:.2f}%)")
# --- MONTE CARLO PORTFOLIO SIMULATION ---
n_simulations = 10000
sim_weights = np.random.dirichlet(np.ones(n_assets), n_simulations) # Random weights that sum to 1
# Compute returns and risks for each portfolio
sim_returns = sim_weights @ annual_return
sim_vols = np.sqrt(np.einsum('ij,jk,ik->i', sim_weights, cov_matrix * 252, sim_weights))
sim_sharpes = sim_returns / sim_vols
# Find optimal portfolios
max_sharpe_idx = np.argmax(sim_sharpes)
min_vol_idx = np.argmin(sim_vols)
print(f"\nOptimal Portfolios:")
print(f"Max Sharpe: {sim_sharpes[max_sharpe_idx]:.3f}, "
f"Return={sim_returns[max_sharpe_idx]:.2%}, Vol={sim_vols[max_sharpe_idx]:.2%}")
print(f"Min Vol: {sim_vols[min_vol_idx]:.2%}, "
f"Return={sim_returns[min_vol_idx]:.2%}")
print(f"Max Sharpe weights: {np.round(sim_weights[max_sharpe_idx], 3)}")
12. Comparison — NumPy vs Alternatives {#comparison}
Feature | NumPy | Pandas | PyTorch Tensor | CuPy | JAX |
|---|---|---|---|---|---|
Data types | Homogeneous | Heterogeneous columns | Homogeneous | Homogeneous | Homogeneous |
Primary use | Numerical arrays | Tabular data | ML/Deep learning | GPU arrays | ML + autodiff |
GPU support | ❌ | ❌ | ✅ | ✅ | ✅ |
Autodiff | ❌ | ❌ | ✅ | ❌ | ✅ |
Speed (CPU) | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
Memory | Efficient | Overhead | Overhead | GPU VRAM | Efficient |
API compatibility | Native | Built on NumPy | NumPy-like | NumPy-compatible | NumPy-like |
Labels/index | ❌ | ✅ | ❌ | ❌ | ❌ |
When to Use What?
NumPy → Default choice for array math, scientific computing, ML internals
Pandas → Tabular data with labels, CSV/Excel/SQL data
PyTorch → Deep learning, need autograd, GPU training
CuPy → Drop-in NumPy replacement for GPU (same API!)
JAX → Research ML, functional transformations (jit, grad, vmap)
Dask → NumPy arrays larger than RAM (distributed computing)
NumPy to CuPy Migration
# Almost identical API!
import numpy as np # CPU
import cupy as cp # GPU (if CUDA available)
# NumPy code
a = np.random.randn(1000, 1000)
b = np.random.randn(1000, 1000)
c = a @ b # CPU matrix multiplication
# CuPy code (GPU) — nearly identical!
a_gpu = cp.random.randn(1000, 1000)
b_gpu = cp.random.randn(1000, 1000)
c_gpu = a_gpu @ b_gpu # GPU matrix multiplication
# Transfer between CPU and GPU
a_gpu = cp.asarray(a) # NumPy → CuPy (CPU to GPU)
a_cpu = cp.asnumpy(a_gpu) # CuPy → NumPy (GPU to CPU)
13. Best Practices & Pro Developer Guide {#best-practices}
import numpy as np
# --- BEST PRACTICE 1: Always set random seed for reproducibility ---
rng = np.random.default_rng(seed=42) # Use new Generator API
# NOT: np.random.seed(42) — global state, harder to reason about
# --- BEST PRACTICE 2: Use assertions to validate shapes ---
def matrix_multiply(A, B):
assert A.ndim == 2, f"A must be 2D, got {A.ndim}D"
assert B.ndim == 2, f"B must be 2D, got {B.ndim}D"
assert A.shape[1] == B.shape[0], \
f"Shape mismatch: {A.shape} vs {B.shape}"
return A @ B
# --- BEST PRACTICE 3: Use np.testing for unit tests ---
actual = np.array([1.0, 2.0, 3.0])
expected = np.array([1.0, 2.0, 3.0])
np.testing.assert_array_equal(actual, expected) # Exact equality
np.testing.assert_array_almost_equal(actual, expected, decimal=5) # ~5 decimal places
np.testing.assert_allclose(actual, expected, rtol=1e-5) # Relative tolerance
# --- BEST PRACTICE 4: Validate inputs defensively ---
def safe_normalize(arr: np.ndarray, axis: int = 0) -> np.ndarray:
arr = np.asarray(arr, dtype=np.float64) # Ensure correct type
if arr.ndim < 1:
raise ValueError("Input must be at least 1D")
norms = np.linalg.norm(arr, axis=axis, keepdims=True)
zero_norms = norms == 0
if np.any(zero_norms):
import warnings
warnings.warn("Zero-norm vectors detected; will produce NaN")
return arr / norms
# --- BEST PRACTICE 5: Document array shapes with comments ---
def attention_scores(Q, K, V, mask=None):
"""
Multi-head attention scores.
Args:
Q: (batch, seq_len, d_k)
K: (batch, seq_len, d_k)
V: (batch, seq_len, d_v)
mask: (batch, seq_len, seq_len) or None
Returns:
output: (batch, seq_len, d_v)
"""
d_k = Q.shape[-1]
# (batch, seq_len, seq_len)
scores = Q @ K.transpose(0, 2, 1) / np.sqrt(d_k)
if mask is not None:
scores = np.where(mask, scores, -1e9)
# (batch, seq_len, seq_len)
weights = np.exp(scores - scores.max(axis=-1, keepdims=True))
weights = weights / weights.sum(axis=-1, keepdims=True)
# (batch, seq_len, d_v)
return weights @ V
# --- BEST PRACTICE 6: Use context managers for precision control ---
with np.errstate(divide='ignore', invalid='ignore', over='ignore'):
result = np.log(np.array([0, 1, 2, -1])) # No warning for log(0) or log(-1)
# --- BEST PRACTICE 7: Profile before optimizing ---
# First make it correct, then profile, then optimize the bottleneck
# Use: cProfile, line_profiler, memory_profiler, timeit
# --- BEST PRACTICE 8: Prefer NumPy functions over Python equivalents ---
arr = np.array([3, 1, 4, 1, 5, 9])
min_val = min(arr) # SLOW — Python min (creates iterator)
max_val = max(arr) # SLOW
min_val = arr.min() # FAST — NumPy method
max_val = arr.max() # FAST
min_val = np.min(arr) # Also FAST
# --- BEST PRACTICE 9: Avoid Python loops — use vectorization ---
# See Section 3 for detailed examples
# --- BEST PRACTICE 10: Memory — float32 for large ML arrays ---
# Use float32 for ML (half the memory, negligible precision loss)
# Use float64 for scientific computing (full precision)
large_ml_array = np.zeros((10000, 512), dtype=np.float32) # 20MB not 40MB
14. Common Anti-Patterns — What NOT to Do {#antipatterns}
import numpy as np
# ❌ ANTI-PATTERN 1: Growing arrays with np.append in loops
# BAD — O(n²) time, creates new array each iteration
result = np.array([])
for i in range(10000):
result = np.append(result, i**2)
# GOOD — Pre-allocate or collect in list then convert once
result = np.array([i**2 for i in range(10000)]) # List comp then convert
# Or
result = np.arange(10000) ** 2 # Best — pure NumPy
# ❌ ANTI-PATTERN 2: Not using vectorization
data = np.random.randn(100000)
threshold = 2.0
# BAD
outliers = []
for x in data:
if abs(x) > threshold:
outliers.append(x)
# GOOD
outliers = data[np.abs(data) > threshold]
# ❌ ANTI-PATTERN 3: Accidental view modification
original = np.array([1, 2, 3, 4, 5])
modified = original[2:5] # View!
modified[:] = 99 # Modifies original!
print(original) # [1 2 99 99 99] — surprise!
# GOOD
modified = original[2:5].copy() # Explicit copy
# ❌ ANTI-PATTERN 4: Using Python comparison instead of np.isnan
arr = np.array([1, 2, np.nan, 4])
# BAD
result = [x for x in arr if x != np.nan] # Never works! nan != nan
# GOOD
result = arr[~np.isnan(arr)]
# ❌ ANTI-PATTERN 5: Forgetting that int arrays can overflow silently
age = np.array([200], dtype=np.uint8)
age += 100 # Silently wraps around!
print(age) # [44] — wrong!
# GOOD — Use larger dtype or check before operation
age = np.array([200], dtype=np.int32) # Won't overflow here
# ❌ ANTI-PATTERN 6: Using np.vectorize thinking it's fast
np.vectorize(lambda x: x**2)(arr) # NOT vectorized! Just a Python loop with NumPy overhead
# GOOD
arr ** 2 # Truly vectorized
# ❌ ANTI-PATTERN 7: Reshape without checking contiguity
arr = np.arange(12).reshape(3, 4).T # Transpose — now F-contiguous!
# This will silently make a copy (expensive for large arrays)
try:
arr.shape = (12,) # Fails with error!
except AttributeError:
pass
# GOOD — Use reshape (handles non-contiguous automatically)
flat = arr.reshape(-1) # Works, may copy if needed
# Or explicitly make contiguous
flat = np.ascontiguousarray(arr).reshape(-1)
15. Complete Interview Question Bank {#interview}
Basic Level
Q1: What is the difference between np.array and np.asarray?
np.arrayalways makes a copy.np.asarrayavoids copying if input is already a NumPy array with the right dtype. Useasarraywhen you don't want unnecessary copies.
Q2: What does -1 mean in arr.reshape(-1, 4)?
-1 is a placeholder meaning "infer this dimension automatically." NumPy calculates it as
total_elements / 4.
Q3: What is the output dtype of np.array([1, 2.0, True])?
float64— NumPy upcasts to the highest precision type needed to represent all values.
Q4: What does arr.flags['WRITEABLE'] = False do?
Makes the array read-only. Any attempt to modify it will raise
ValueError: assignment destination is read-only. Useful for protecting shared constants.
Q5: What is the difference between np.zeros and np.empty?
np.zerosinitializes all elements to 0.np.emptyallocates memory without initialization (contains garbage values) — faster when you plan to fill every element.
Intermediate Level
Q6: Explain how memory is laid out for a (3, 4) array and how transpose works without copying.
A
(3, 4)float64 array has 12 elements stored as 96 bytes in memory (row-major). Strides are(32, 8)— 32 bytes to next row, 8 to next column.Transposeswaps the strides to(8, 32)and shape to(4, 3)— the underlying data doesn't move!
Q7: What is broadcasting and when does it fail?
Broadcasting is NumPy's rule for performing operations on arrays with different shapes. Arrays are compared dimension by dimension from the right: dimensions must either be equal, or one must be 1. Fails when incompatible dimensions exist, e.g.,
(3, 4)and(2,)→ shapes 4 and 2 don't match.
Q8: What is the difference between np.dot, np.inner, and np.outer?
dot(a, b): matrix multiplication for 2D, sum of products for 1D.inner(a, b): same as dot for 1D; for nD, contracts last axis of a with last axis of b.outer(a, b): outer product — creates matrix of all combinations, shape(m, n).
Q9: When does ravel() return a copy vs a view?
Returns a view when array is C-contiguous. Returns a copy for non-contiguous arrays (e.g., after transpose). You can check with
np.shares_memory(arr, arr.ravel()).
Q10: How would you implement a sliding window without explicit loops?
from numpy.lib.stride_tricks import sliding_window_view
windows = sliding_window_view(arr, window_shape=window_size)
window_means = windows.mean(axis=-1)
Advanced Level
Q11: What is the difference between np.linalg.solve and computing np.linalg.inv(A) @ b?
solveuses LU decomposition to find the solution directly — numerically stable and O(n³).inv(A) @ bcomputes the full inverse first (also O(n³)) then multiplies — less stable and slower.solveshould always be preferred.
Q12: Explain how np.einsum works with optimize=True.
Without
optimize, einsum contracts in the order specified. Withoptimize=True, NumPy finds the optimal contraction order (like optimal parenthesization of matrix chain multiplication) using dynamic programming, which can dramatically reduce the number of operations for 3+ tensor contractions.
Q13: What is the NumPy GIL situation?
NumPy operations release Python's GIL during execution of their C code, so NumPy computations can run in parallel threads. This is why
concurrent.futures.ThreadPoolExecutorcan give speedups for NumPy workloads, unlike pure Python operations.
Q14: What is the purpose of np.errstate?
It's a context manager to temporarily change how NumPy handles floating-point errors (divide by zero, overflow, underflow, invalid operations). Options: 'ignore', 'warn', 'raise', 'call', 'print', 'log'. Useful for suppressing expected warnings in numerical code.
Q15: How does np.memmap work and when should you use it?
Memory-mapped files map a disk file to an ndarray. Only the accessed portions are loaded from disk. Use when array is too large for RAM — common in genomics, large image datasets, and scientific simulations where data is much larger than available memory.
Scenario-Based
Q16: You have a (1000, 100) array. How do you normalize each row to have zero mean and unit standard deviation?
mean = arr.mean(axis=1, keepdims=True) # (1000, 1)
std = arr.std(axis=1, keepdims=True, ddof=1) # (1000, 1)
normalized = (arr - mean) / (std + 1e-8) # Broadcasting: (1000, 100)
Q17: How would you efficiently compute all pairwise Euclidean distances between 1000 points in 128D?
# Using the identity ||a-b||² = ||a||² + ||b||² - 2(a·b)
sq_norms = np.sum(X**2, axis=1) # (1000,)
D_sq = sq_norms[:, None] + sq_norms[None, :] - 2 * (X @ X.T)
D = np.sqrt(np.maximum(D_sq, 0)) # (1000, 1000)
Q18: How do you find the index of the 5 largest values in a 2D array?
flat_indices = np.argpartition(arr.flat, -5)[-5:]
sorted_flat = flat_indices[np.argsort(arr.flat[flat_indices])[::-1]]
row_indices, col_indices = np.unravel_index(sorted_flat, arr.shape)
Q19: You receive a CSV with mixed types. How do you load just the numeric columns?
data = np.genfromtxt('data.csv', delimiter=',', skip_header=1,
usecols=(1, 2, 4), # Specific numeric columns
dtype=np.float64, filling_values=np.nan)
Q20: How would you implement batch normalization forward pass in NumPy?
def batch_norm(X, gamma, beta, eps=1e-5):
# X: (batch_size, features)
mean = X.mean(axis=0) # (features,)
var = X.var(axis=0) # (features,)
X_norm = (X - mean) / np.sqrt(var + eps) # (batch, features)
return gamma * X_norm + beta # Scale and shift
16. Conclusion & Learning Path {#conclusion}
What We've Covered
Across all three parts of this guide, you've learned:
Part 1: ndarray fundamentals, dtypes, array creation, indexing, slicing, fancy indexing, broadcasting, reshape, and manipulation
Part 2: All mathematical ufuncs, statistical functions, sorting, set operations, complete linear algebra (linalg), FFT, polynomials, random generation, file I/O, structured arrays, and masked arrays
Part 3: NumPy internals, stride tricks, vectorization patterns, performance optimization, Numba integration, real-world pipelines (ETL, NLP, image processing, finance, ML), and comprehensive interview questions
Your NumPy Learning Path
Week 1: Part 1 — Master arrays, indexing, broadcasting
Week 2: Part 2 — Statistical functions, linalg
Week 3: Part 3 — Vectorization, real-world projects
Week 4: Build something: implement k-means, PCA, or a neural network from scratch
Ongoing: Read NumPy source code, contribute to open source
Key Mental Models
Think in shapes — always know your array shape before and after every operation
Broadcasting first — before writing a loop, ask "can broadcasting solve this?"
Vectorize everything — loops are for prototyping; production code should be loop-free
Memory matters — view vs copy distinction affects both correctness and performance
dtype is a contract — choose the right dtype upfront; implicit conversions cost memory
When to Use NumPy (vs Alternatives)
✅ Use NumPy when:
- Working with numerical arrays
- Scientific computing
- Implementing ML algorithms from scratch
- Image/signal processing arrays
- Linear algebra operations
⬆️ Graduate to pandas when: labelled data, mixed types, SQL-like operations
⬆️ Graduate to PyTorch/JAX when: GPU, autograd, deep learning
⬆️ Graduate to Dask when: data > RAM, distributed computing
Final Advice
NumPy is the lingua franca of scientific Python. Master it deeply and everything else — pandas, scikit-learn, PyTorch — becomes easier because you understand what's happening under the hood.
The best way to truly learn NumPy is to implement algorithms from scratch: k-nearest neighbors, k-means clustering, logistic regression, PCA, a simple neural network. Do this, and NumPy will become second nature.
End of Part 3 — The Complete NumPy Guide
Series complete! Parts 1, 2, and 3 together form one of the most comprehensive NumPy references available.