Data Science NumPy

The Complete NumPy Guide — Part 3: Advanced Patterns, Real-World Pipelines & Complete Interview Guide

Master NumPy's advanced internals — memory optimization, vectorization, Numba JIT, real-world data science pipelines, and the ultimate NumPy interview questi...

Apr 15, 2026 33 min read

1. NumPy Internals — How It Really Works {#internals}

Understanding NumPy's internals is what separates average users from power users.

Memory Layout Deep Dive

import numpy as np

# ndarray internal structure:
# - data pointer: points to first element in memory
# - dtype: type of each element
# - shape: tuple of dimension sizes
# - strides: bytes to step in each dimension
# - flags: C/F contiguous, owndata, writeable

arr = np.arange(12, dtype=np.float64).reshape(3, 4)

print(arr.shape)      # (3, 4)
print(arr.strides)    # (32, 8) — 32 bytes per row, 8 bytes per col
print(arr.flags)
# C_CONTIGUOUS : True   — row-major (C order)
# F_CONTIGUOUS : False
# OWNDATA : True        — this array owns its data
# WRITEABLE : True
# ALIGNED : True

# How strides work:
# element at [i, j] is at: data_pointer + i*strides[0] + j*strides[1]
# arr[2, 3] = data_ptr + 2*32 + 3*8 = data_ptr + 64 + 24 = data_ptr + 88

# Transpose just swaps strides! No data copy!
arr_T = arr.T
print(arr_T.strides)  # (8, 32) — strides swapped
print(arr_T.flags)    # C_CONTIGUOUS : False, F_CONTIGUOUS : True

# Reshape works by changing shape/strides if array is contiguous
reshaped = arr.reshape(2, 6)
print(np.shares_memory(arr, reshaped))  # True — no copy made!

# When reshape CAN'T avoid a copy
arr_T_reshaped = arr.T.reshape(2, 6)
print(np.shares_memory(arr.T, arr_T_reshaped))  # False — needed a copy!

The ufunc Pipeline

import numpy as np

# When you call np.add(a, b), here's what happens internally:
# 1. Type resolution — find common dtype
# 2. If needed, cast inputs to common type
# 3. Loop over elements using C-level tight loop
# 4. SIMD instructions applied (vectorized at CPU level)
# 5. Result written to output buffer

# You can inspect the ufunc
print(np.add.nin)    # 2 — number of inputs
print(np.add.nout)   # 1 — number of outputs
print(np.add.ntypes) # Number of type loops
print(np.add.types)  # All type combinations supported
# ['bb->b', 'BB->B', 'hh->h', ... 'dd->d', 'gg->g', ...]
# b=int8, B=uint8, h=int16, d=float64, g=float128

# Creating custom ufuncs with np.frompyfunc
def my_add(x, y):
    return x + y

my_ufunc = np.frompyfunc(my_add, nin=2, nout=1)
arr = np.array([1, 2, 3, 4, 5])
print(my_ufunc(arr, arr))  # [2 4 6 8 10] — but returns object array

# For typed ufuncs, use np.vectorize (not truly vectorized — still Python loop)
vectorized_fn = np.vectorize(lambda x: x**2 + 2*x + 1)
print(vectorized_fn(arr))  # [4 9 16 25 36] — cleaner but not faster than loop!

2. Stride Tricks — Advanced Memory Manipulation {#strides}

Stride tricks let you create arrays with custom memory layouts — creating virtual copies without actual data duplication.

import numpy as np
from numpy.lib.stride_tricks import as_strided, sliding_window_view

# --- ROLLING WINDOW (most common use case) ---
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Create overlapping windows of size 3 using as_strided
window_size = 3
shape = (len(arr) - window_size + 1, window_size)
strides = (arr.strides[0], arr.strides[0])  # Step by 1 element in both dims

windows = as_strided(arr, shape=shape, strides=strides)
print(windows)
# [[ 1  2  3]
#  [ 2  3  4]
#  [ 3  4  5]
#  ...
#  [ 8  9 10]]

# Better way (NumPy 1.20+) — sliding_window_view
windows = sliding_window_view(arr, window_shape=3)
print(windows.shape)  # (8, 3)

# 2D sliding windows (for image convolution)
image = np.arange(25).reshape(5, 5)
patches = sliding_window_view(image, window_shape=(3, 3))
print(patches.shape)   # (3, 3, 3, 3) — (3 positions y, 3 positions x, 3, 3)

# Compute mean of each patch
patch_means = patches.mean(axis=(-2, -1))
print(patch_means)     # 3x3 array of means (like average pooling!)

# --- USING AS_STRIDED FOR MATRIX MULTIPLICATION VIEW ---
# Memory-efficient Toeplitz matrix (no data copy!)
arr = np.array([1, 2, 3, 4, 5, 6, 7])
n = 4  # window size
m = len(arr) - n + 1

toeplitz_shape = (m, n)
toeplitz_strides = (arr.strides[0], arr.strides[0])
T = as_strided(arr, shape=toeplitz_shape, strides=toeplitz_strides)
print(T)
# [[1 2 3 4]
#  [2 3 4 5]
#  [3 4 5 6]
#  [4 5 6 7]]
# All of this shares memory with original arr!

# ⚠️ WARNING: as_strided is dangerous!
# - Can access memory outside array bounds (segfault risk!)
# - Always prefer sliding_window_view when possible
# - When using as_strided, treat result as READ-ONLY

3. Vectorization — Writing Fast NumPy Code {#vectorization}

The #1 performance rule: eliminate Python loops using array operations.

import numpy as np
import time

# --- EXAMPLE 1: Distance Matrix ---

# SLOW — Python loop
def pairwise_distance_loop(X):
    n = len(X)
    D = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            D[i, j] = np.sqrt(np.sum((X[i] - X[j])**2))
    return D

# FAST — Vectorized using broadcasting
def pairwise_distance_vectorized(X):
    # X has shape (n, d)
    # Expand to (n, 1, d) and (1, n, d), then subtract
    diff = X[:, np.newaxis, :] - X[np.newaxis, :, :]  # (n, n, d)
    return np.sqrt(np.sum(diff**2, axis=2))            # (n, n)

# FASTEST — Using Einstein summation trick
def pairwise_distance_einsum(X):
    # ||a - b||² = ||a||² + ||b||² - 2(a·b)
    sq_norms = np.sum(X**2, axis=1)                # (n,)
    D_sq = sq_norms[:, np.newaxis] + sq_norms[np.newaxis, :] - 2 * (X @ X.T)
    D_sq = np.maximum(D_sq, 0)                     # Numerical stability
    return np.sqrt(D_sq)

# Benchmark
X = np.random.randn(500, 10)
t0 = time.perf_counter(); pairwise_distance_loop(X); print(f"Loop: {time.perf_counter()-t0:.3f}s")
t0 = time.perf_counter(); pairwise_distance_vectorized(X); print(f"Vectorized: {time.perf_counter()-t0:.3f}s")
t0 = time.perf_counter(); pairwise_distance_einsum(X); print(f"Einsum: {time.perf_counter()-t0:.3f}s")
# Loop: ~2.5s, Vectorized: ~0.05s, Einsum: ~0.01s

# --- EXAMPLE 2: Conditional Assignment (no loop!) ---
arr = np.random.randn(1_000_000)

# SLOW
result = np.zeros_like(arr)
for i, x in enumerate(arr):
    result[i] = x if x > 0 else x * 0.01  # Leaky ReLU

# FAST
result = np.where(arr > 0, arr, arr * 0.01)  # Same result, ~100x faster

# Even faster — using clip and arithmetic
result = np.maximum(arr, 0) + 0.01 * np.minimum(arr, 0)

# --- EXAMPLE 3: Softmax (no loop!) ---
def softmax(x):
    x_shifted = x - np.max(x, axis=-1, keepdims=True)  # Numerical stability!
    exp_x = np.exp(x_shifted)
    return exp_x / exp_x.sum(axis=-1, keepdims=True)

logits = np.random.randn(32, 10)   # 32 samples, 10 classes
probs = softmax(logits)
print(probs.shape)     # (32, 10)
print(probs.sum(axis=1))   # All ones — correctly normalized

# --- EXAMPLE 4: One-Hot Encoding (no loop!) ---
def one_hot(labels, num_classes):
    n = len(labels)
    encoded = np.zeros((n, num_classes), dtype=np.int8)
    encoded[np.arange(n), labels] = 1  # Fancy indexing!
    return encoded

labels = np.array([0, 2, 1, 3, 2])
one_hot_encoded = one_hot(labels, num_classes=4)
print(one_hot_encoded)
# [[1 0 0 0]
#  [0 0 1 0]
#  [0 1 0 0]
#  [0 0 0 1]
#  [0 0 1 0]]

4. Performance Optimization — Profiling & Tuning {#performance}

import numpy as np
import time

# --- TIP 1: Use appropriate dtype ---
# float32 vs float64: 2x memory, nearly same speed on CPU
# But on GPU (via CuPy), float32 is 2x faster!
big_float64 = np.random.randn(10_000_000)                    # 80 MB
big_float32 = np.random.randn(10_000_000).astype(np.float32) # 40 MB

# Check dtype impact on speed
t0 = time.perf_counter()
np.sum(big_float64 ** 2)
print(f"float64: {time.perf_counter()-t0:.4f}s")

t0 = time.perf_counter()
np.sum(big_float32 ** 2)
print(f"float32: {time.perf_counter()-t0:.4f}s")

# --- TIP 2: Avoid unnecessary copies ---
arr = np.random.randn(1000, 1000)

# BAD — creates 3 temporary arrays
result = (arr * 2) + (arr * 3)

# GOOD — fewer temporaries
result = np.empty_like(arr)
np.multiply(arr, 2, out=result)
np.add(result, arr * 3, out=result)

# BETTER
result = arr * 5  # arr * 2 + arr * 3 = arr * 5 — simpler!

# --- TIP 3: Contiguous memory access ---
arr = np.random.randn(1000, 1000)

# Column-wise sum on C-order array (cache-unfriendly — jumps in memory)
t0 = time.perf_counter()
for _ in range(100):
    col_sums = arr.sum(axis=0)
print(f"Column sum: {time.perf_counter()-t0:.4f}s")

# Row-wise sum on C-order array (cache-friendly — sequential access)
t0 = time.perf_counter()
for _ in range(100):
    row_sums = arr.sum(axis=1)
print(f"Row sum: {time.perf_counter()-t0:.4f}s")

# TIP: If you do many column operations, use F-order!
arr_f = np.asfortranarray(arr)  # F-contiguous copy
t0 = time.perf_counter()
for _ in range(100):
    col_sums = arr_f.sum(axis=0)
print(f"Column sum (F-order): {time.perf_counter()-t0:.4f}s")

# --- TIP 4: np.einsum for complex ops ---
A = np.random.randn(100, 200)
B = np.random.randn(200, 150)

# Standard matmul
t0 = time.perf_counter()
C = A @ B
print(f"matmul: {time.perf_counter()-t0:.6f}s")

# einsum (sometimes faster for specific ops)
t0 = time.perf_counter()
C = np.einsum('ij,jk->ik', A, B, optimize=True)  # optimize=True is key!
print(f"einsum: {time.perf_counter()-t0:.6f}s")

# --- TIP 5: np.einsum optimize parameter ---
A = np.random.randn(10, 100, 50)
B = np.random.randn(10, 50, 80)
C = np.random.randn(10, 80, 30)

# Without optimization — may use suboptimal order
result1 = np.einsum('bij,bjk,bkl->bil', A, B, C)

# With optimization — finds best contraction order
result2 = np.einsum('bij,bjk,bkl->bil', A, B, C, optimize='optimal')
print(np.allclose(result1, result2))  # True, but result2 is faster

# --- TIP 6: Pre-allocate result arrays ---
n_iterations = 1000
data_size = 10000

# BAD — appending to list then converting
results_bad = []
for i in range(n_iterations):
    results_bad.append(np.random.randn(data_size).mean())
results_bad = np.array(results_bad)

# GOOD — pre-allocate
results_good = np.empty(n_iterations)
for i in range(n_iterations):
    results_good[i] = np.random.randn(data_size).mean()

# BEST — fully vectorized
results_best = np.random.randn(n_iterations, data_size).mean(axis=1)

# --- TIP 7: Use np.maximum/minimum instead of np.clip when possible ---
arr = np.random.randn(1_000_000)

# clip (one function call)
t0 = time.perf_counter()
np.clip(arr, 0, 1)
print(f"clip: {time.perf_counter()-t0:.6f}s")

# ReLU using maximum (faster for single bound)
t0 = time.perf_counter()
np.maximum(arr, 0)
print(f"maximum: {time.perf_counter()-t0:.6f}s")

Profiling NumPy Code

import numpy as np
import cProfile
import pstats
import io

def my_function():
    arr = np.random.randn(10000, 100)
    result = np.zeros(10000)
    for i in range(len(arr)):
        result[i] = np.sum(arr[i] ** 2)  # Intentionally slow!
    return result

# Profile
pr = cProfile.Profile()
pr.enable()
my_function()
pr.disable()

# Print stats
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
ps.print_stats(10)   # Top 10 bottlenecks
print(s.getvalue())

# Line profiler (install with: pip install line_profiler)
# %lprun -f my_function my_function()  # In Jupyter

# Memory profiler (install with: pip install memory_profiler)
# %memit my_function()  # In Jupyter

5. NumPy + Numba — JIT Compilation {#numba}

When vectorization isn't possible, Numba compiles Python to machine code.

import numpy as np
import numba as nb
import time

# Scenario: Complex logic that can't easily be vectorized
def python_loop(arr):
    result = np.empty_like(arr)
    for i in range(len(arr)):
        if arr[i] > 0:
            result[i] = np.sqrt(arr[i]) * np.log(arr[i] + 1)
        else:
            result[i] = arr[i] ** 2
    return result

# Numba JIT — just add @nb.jit decorator!
@nb.jit(nopython=True, parallel=True, cache=True)
def numba_loop(arr):
    result = np.empty_like(arr)
    for i in nb.prange(len(arr)):  # prange = parallel range
        if arr[i] > 0:
            result[i] = np.sqrt(arr[i]) * np.log(arr[i] + 1)
        else:
            result[i] = arr[i] ** 2
    return result

# Vectorized NumPy (for comparison)
def numpy_vectorized(arr):
    result = np.where(arr > 0,
                      np.sqrt(np.abs(arr)) * np.log(np.abs(arr) + 1),
                      arr ** 2)
    return result

arr = np.random.randn(10_000_000)

# Warmup Numba (first call compiles)
numba_loop(arr[:100])

t0 = time.perf_counter()
python_loop(arr[:100000])  # Subset, too slow for full array
print(f"Python (100k): {time.perf_counter()-t0:.3f}s")

t0 = time.perf_counter()
numpy_vectorized(arr)
print(f"NumPy vectorized: {time.perf_counter()-t0:.3f}s")

t0 = time.perf_counter()
numba_loop(arr)
print(f"Numba parallel: {time.perf_counter()-t0:.3f}s")

# Results approximate:
# Python (100k): 0.15s → extrapolated 15s for 10M
# NumPy vectorized: 0.18s
# Numba parallel: 0.03s → 5x faster than NumPy!

# Numba for custom ufuncs
@nb.vectorize(['float64(float64)', 'float32(float32)'], target='cpu')
def leaky_relu(x):
    return x if x > 0 else 0.01 * x

result = leaky_relu(arr)   # Vectorized, uses SIMD

# GPU target (requires CUDA)
# @nb.vectorize(['float64(float64)'], target='cuda')

6. Advanced Indexing Patterns {#advanced-indexing}

import numpy as np

# --- MULTI-DIMENSIONAL FANCY INDEXING ---
arr = np.arange(100).reshape(10, 10)

# Select a checkerboard pattern
rows = np.array([0, 0, 1, 1, 2, 2])
cols = np.array([0, 2, 1, 3, 0, 4])
print(arr[rows, cols])  # Values at (0,0),(0,2),(1,1),(1,3),(2,0),(2,4)

# Select block submatrix (not fancy indexing — this is two slices)
block = arr[2:5, 3:7]

# Select specific rows AND specific columns (fancy + slice mix)
selected_rows = arr[[0, 3, 7]]           # 3 specific rows, all cols
selected_cols = arr[:, [1, 4, 8]]        # All rows, 3 specific cols
submatrix = arr[np.ix_([0,3,7], [1,4,8])]  # Specific rows AND cols via meshgrid

# np.ix_ — creates open meshgrid for indexing
print(submatrix.shape)  # (3, 3)

# --- ADVANCED BOOLEAN OPERATIONS ---
arr = np.random.randn(5, 5)

# Create 2D mask
mask_rows = np.array([True, False, True, False, True])
mask_cols = np.array([True, True, False, False, True])

# Select rows using boolean
print(arr[mask_rows])           # 3 rows where mask is True
print(arr[:, mask_cols])        # Columns where mask is True
print(arr[np.ix_(mask_rows, mask_cols)])  # Submatrix at intersection

# --- SCATTER OPERATION (inverse of gather) ---
source = np.array([1, 2, 3])
target = np.zeros(10)
indices = np.array([2, 5, 8])
target[indices] = source   # Scatter: place source values at indices
print(target)  # [0 0 1 0 0 2 0 0 3 0]

# Scatter add (accumulate into target)
np.add.at(target, indices, source)   # target[indices] += source (handles repeats!)

# --- MULTI-DIMENSIONAL TAKE AND PUT ---
arr = np.arange(20).reshape(4, 5)

# np.take — gather along axis
np.take(arr, [0, 2], axis=0)   # Take rows 0 and 2
np.take(arr, [1, 3, 4], axis=1)  # Take cols 1, 3, 4

# np.put — scatter into flattened array
arr = np.arange(20)
np.put(arr, [0, 5, 10, 15], [99, 88, 77, 66])
print(arr)  # [99 1 2 3 4 88 6 7 8 9 77 ...]

# np.putmask — conditional put
arr = np.arange(10, dtype=float)
np.putmask(arr, arr > 5, arr ** 2)  # Where condition, put squared value

# --- UNRAVEL AND RAVEL INDEX ---
arr = np.arange(24).reshape(4, 6)

# Get flat index of maximum
flat_idx = np.argmax(arr)
print(flat_idx)   # 23

# Convert flat index to multi-dim indices
row, col = np.unravel_index(flat_idx, arr.shape)
print(row, col)   # 3, 5

# Convert multi-dim indices to flat index
flat = np.ravel_multi_index((row, col), arr.shape)
print(flat)       # 23

# Useful for getting top-k indices in a 2D array
top3_flat = np.argpartition(arr.flat, -3)[-3:]
top3_2d = np.array(np.unravel_index(top3_flat, arr.shape)).T
print(top3_2d)   # [[3,3], [3,4], [3,5]]

7. Real-World Use Cases — Complete Pipelines {#real-world}

Use Case 1: ETL Data Pipeline

import numpy as np

# Simulate raw sensor data with missing values and outliers
np.random.seed(42)
n_sensors = 10
n_timestamps = 1000

# Raw data: temperatures in °C (some missing: -9999, some outliers)
raw_data = np.random.normal(25, 5, (n_timestamps, n_sensors))
raw_data[np.random.rand(*raw_data.shape) < 0.05] = -9999  # 5% missing
raw_data[np.random.rand(*raw_data.shape) < 0.02] += 100   # 2% outliers

print(f"Raw data shape: {raw_data.shape}")
print(f"Missing values: {np.sum(raw_data == -9999)}")

# STEP 1: Handle missing values — replace with NaN
data = raw_data.copy().astype(np.float64)
data[data == -9999] = np.nan

# STEP 2: Detect and remove outliers using IQR
def remove_outliers_iqr(arr, factor=3.0):
    q1 = np.nanpercentile(arr, 25, axis=0)
    q3 = np.nanpercentile(arr, 75, axis=0)
    iqr = q3 - q1
    lower = q1 - factor * iqr
    upper = q3 + factor * iqr
    mask = (arr < lower) | (arr > upper)  # Outlier positions
    arr_clean = arr.copy()
    arr_clean[mask] = np.nan  # Replace outliers with NaN
    return arr_clean

data = remove_outliers_iqr(data)
print(f"After outlier removal — NaN count: {np.sum(np.isnan(data))}")

# STEP 3: Forward-fill NaN values (simple imputation)
def forward_fill(arr):
    """Fill NaN with previous valid value, column by column."""
    result = arr.copy()
    for col in range(arr.shape[1]):
        col_data = result[:, col]
        nan_mask = np.isnan(col_data)
        not_nan_idx = np.where(~nan_mask)[0]
        if len(not_nan_idx) == 0:
            continue
        # Use searchsorted to find previous valid value
        fill_idx = np.searchsorted(not_nan_idx, np.where(nan_mask)[0], side='right') - 1
        valid_fills = fill_idx >= 0
        nan_positions = np.where(nan_mask)[0][valid_fills]
        source_positions = not_nan_idx[fill_idx[valid_fills]]
        col_data[nan_positions] = col_data[source_positions]
    return result

data = forward_fill(data)

# STEP 4: Remaining NaN (at start) — fill with column median
col_medians = np.nanmedian(data, axis=0)
nan_positions = np.where(np.isnan(data))
data[nan_positions] = np.take(col_medians, nan_positions[1])

# STEP 5: Normalize to [0, 1] per sensor
col_min = data.min(axis=0)
col_max = data.max(axis=0)
normalized = (data - col_min) / (col_max - col_min + 1e-8)

# STEP 6: Compute rolling statistics
window = 10
rolling_mean = np.array([
    normalized[i:i+window].mean(axis=0)
    for i in range(len(normalized) - window + 1)
])

# Vectorized version using stride tricks
from numpy.lib.stride_tricks import sliding_window_view
windows = sliding_window_view(normalized, window_shape=(window, n_sensors))
rolling_mean_fast = windows[0].mean(axis=(1, 2, 3))  # Adjust as needed

print(f"Final data shape: {normalized.shape}")
print(f"No remaining NaN: {not np.any(np.isnan(normalized))}")
print(f"Range: [{normalized.min():.4f}, {normalized.max():.4f}]")

Use Case 2: Text Vectorization (Bag of Words)

import numpy as np

# Documents
documents = [
    ["the", "cat", "sat", "on", "the", "mat"],
    ["the", "dog", "sat", "on", "the", "log"],
    ["cat", "and", "dog", "are", "friends"],
]

# Build vocabulary
vocab = sorted(set(word for doc in documents for word in doc))
word2idx = {word: idx for idx, word in enumerate(vocab)}
print(f"Vocabulary size: {len(vocab)}")
print(f"Vocabulary: {vocab}")

# Create BoW matrix (term frequency)
bow_matrix = np.zeros((len(documents), len(vocab)), dtype=np.float32)
for doc_idx, doc in enumerate(documents):
    for word in doc:
        bow_matrix[doc_idx, word2idx[word]] += 1

print("BoW matrix:")
print(bow_matrix)

# TF-IDF
tf = bow_matrix / bow_matrix.sum(axis=1, keepdims=True)  # Term Frequency
df = np.sum(bow_matrix > 0, axis=0)  # Document Frequency
idf = np.log((len(documents) + 1) / (df + 1)) + 1  # Inverse Document Frequency
tfidf = tf * idf  # TF-IDF

# Cosine similarity between documents
def cosine_similarity_matrix(X):
    norms = np.linalg.norm(X, axis=1, keepdims=True)
    X_normalized = X / (norms + 1e-8)
    return X_normalized @ X_normalized.T

similarity = cosine_similarity_matrix(tfidf)
print("\nDocument similarity matrix:")
print(np.round(similarity, 3))

8. Data Science Workflow with NumPy {#data-science}

import numpy as np

# Simulated dataset: House prices
np.random.seed(42)
n = 500

# Features: size(sqft), bedrooms, age, distance_to_city
size = np.random.uniform(500, 3000, n)
bedrooms = np.random.randint(1, 6, n)
age = np.random.uniform(0, 50, n)
distance = np.random.uniform(1, 50, n)
noise = np.random.normal(0, 20000, n)

# Target: price
price = (100 * size + 50000 * bedrooms - 1000 * age - 5000 * distance + 200000 + noise)

# --- EXPLORATORY DATA ANALYSIS ---
X = np.column_stack([size, bedrooms, age, distance])
y = price

print(f"Dataset: {X.shape[0]} samples, {X.shape[1]} features")
print(f"\nFeature statistics:")
feature_names = ['Size', 'Bedrooms', 'Age', 'Distance']
for i, name in enumerate(feature_names):
    col = X[:, i]
    print(f"  {name}: mean={col.mean():.1f}, std={col.std():.1f}, "
          f"min={col.min():.1f}, max={col.max():.1f}")

print(f"\nTarget (Price): mean={y.mean():,.0f}, std={y.std():,.0f}")

# Correlation with target
for i, name in enumerate(feature_names):
    corr = np.corrcoef(X[:, i], y)[0, 1]
    print(f"  {name} correlation with Price: {corr:.4f}")

# --- TRAIN/TEST SPLIT (without sklearn) ---
indices = np.random.permutation(n)
train_size = int(0.8 * n)
train_idx = indices[:train_size]
test_idx = indices[train_size:]

X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]

# --- FEATURE SCALING ---
# Fit scaler on training data only
train_mean = X_train.mean(axis=0)
train_std = X_train.std(axis=0)
X_train_scaled = (X_train - train_mean) / train_std
X_test_scaled = (X_test - train_mean) / train_std  # Use training stats!

# --- PRINCIPAL COMPONENT ANALYSIS (PCA) from scratch ---
def pca_numpy(X, n_components):
    """Manual PCA using NumPy SVD."""
    # Center the data
    X_centered = X - X.mean(axis=0)
    
    # Compute covariance matrix
    cov = X_centered.T @ X_centered / (len(X) - 1)
    
    # Eigendecomposition (for symmetric matrix, use eigh — more stable)
    eigenvalues, eigenvectors = np.linalg.eigh(cov)
    
    # Sort by descending eigenvalue
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Explained variance ratio
    explained_var_ratio = eigenvalues / eigenvalues.sum()
    
    # Project data
    components = eigenvectors[:, :n_components]
    X_projected = X_centered @ components
    
    return X_projected, explained_var_ratio[:n_components], components

X_pca, var_ratio, components = pca_numpy(X_train_scaled, n_components=2)
print(f"\nPCA explained variance: {var_ratio}")
print(f"Total explained: {var_ratio.sum():.2%}")

# --- EVALUATION METRICS ---
def regression_metrics(y_true, y_pred):
    errors = y_true - y_pred
    mae = np.mean(np.abs(errors))
    mse = np.mean(errors ** 2)
    rmse = np.sqrt(mse)
    
    ss_res = np.sum(errors ** 2)
    ss_tot = np.sum((y_true - y_true.mean()) ** 2)
    r2 = 1 - ss_res / ss_tot
    
    mape = np.mean(np.abs(errors / y_true)) * 100
    
    return {'MAE': mae, 'MSE': mse, 'RMSE': rmse, 'R²': r2, 'MAPE%': mape}

9. NumPy in Machine Learning — From Scratch {#ml}

Neural Network Forward Pass

import numpy as np

# A simple 2-layer neural network for binary classification
class SimpleNN:
    def __init__(self, input_dim, hidden_dim, output_dim, seed=42):
        rng = np.random.default_rng(seed)
        
        # Xavier initialization
        scale1 = np.sqrt(2.0 / input_dim)
        scale2 = np.sqrt(2.0 / hidden_dim)
        
        self.W1 = rng.normal(0, scale1, (input_dim, hidden_dim))
        self.b1 = np.zeros((1, hidden_dim))
        self.W2 = rng.normal(0, scale2, (hidden_dim, output_dim))
        self.b2 = np.zeros((1, output_dim))
    
    def relu(self, x):
        return np.maximum(x, 0)
    
    def relu_grad(self, x):
        return (x > 0).astype(float)
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-np.clip(x, -500, 500)))  # Clip for stability
    
    def forward(self, X):
        self.X = X
        self.Z1 = X @ self.W1 + self.b1    # Linear: (n, hidden)
        self.A1 = self.relu(self.Z1)         # ReLU: (n, hidden)
        self.Z2 = self.A1 @ self.W2 + self.b2  # Linear: (n, output)
        self.A2 = self.sigmoid(self.Z2)      # Sigmoid: (n, output)
        return self.A2
    
    def compute_loss(self, y_pred, y_true):
        # Binary cross-entropy
        eps = 1e-8
        return -np.mean(y_true * np.log(y_pred + eps) + (1 - y_true) * np.log(1 - y_pred + eps))
    
    def backward(self, y_true):
        n = len(y_true)
        
        # Output layer gradient
        dZ2 = (self.A2 - y_true) / n          # (n, 1)
        dW2 = self.A1.T @ dZ2                 # (hidden, 1)
        db2 = dZ2.sum(axis=0, keepdims=True)  # (1, 1)
        
        # Hidden layer gradient
        dA1 = dZ2 @ self.W2.T                 # (n, hidden)
        dZ1 = dA1 * self.relu_grad(self.Z1)   # (n, hidden)
        dW1 = self.X.T @ dZ1                  # (input, hidden)
        db1 = dZ1.sum(axis=0, keepdims=True)  # (1, hidden)
        
        return dW1, db1, dW2, db2
    
    def update(self, dW1, db1, dW2, db2, lr=0.01):
        self.W1 -= lr * dW1
        self.b1 -= lr * db1
        self.W2 -= lr * dW2
        self.b2 -= lr * db2
    
    def train(self, X, y, epochs=100, lr=0.01, batch_size=32):
        n = len(X)
        losses = []
        rng = np.random.default_rng(42)
        
        for epoch in range(epochs):
            # Mini-batch SGD
            indices = rng.permutation(n)
            epoch_loss = 0
            
            for i in range(0, n, batch_size):
                batch_idx = indices[i:i+batch_size]
                X_batch = X[batch_idx]
                y_batch = y[batch_idx]
                
                y_pred = self.forward(X_batch)
                loss = self.compute_loss(y_pred, y_batch)
                epoch_loss += loss
                
                gradients = self.backward(y_batch)
                self.update(*gradients, lr=lr)
            
            losses.append(epoch_loss / (n / batch_size))
            
            if epoch % 20 == 0:
                print(f"Epoch {epoch}: Loss = {losses[-1]:.4f}")
        
        return losses

# Usage
np.random.seed(42)
n = 1000
X = np.random.randn(n, 4)
y = ((X[:, 0] + X[:, 1] > 0) & (X[:, 2] > 0)).astype(float).reshape(-1, 1)

model = SimpleNN(input_dim=4, hidden_dim=8, output_dim=1)
losses = model.train(X, y, epochs=100, lr=0.1, batch_size=32)

# Evaluate
y_pred = model.forward(X)
accuracy = np.mean((y_pred > 0.5) == y)
print(f"Final accuracy: {accuracy:.4f}")

10. NumPy for Image Processing {#image}

import numpy as np

# Images are 3D arrays: (height, width, channels)
# channels: 1=grayscale, 3=RGB, 4=RGBA

# Create a synthetic RGB image
np.random.seed(42)
image = np.random.randint(0, 256, (480, 640, 3), dtype=np.uint8)

print(f"Image shape: {image.shape}")    # (480, 640, 3)
print(f"Image dtype: {image.dtype}")   # uint8
print(f"Value range: [{image.min()}, {image.max()}]")

# --- CHANNEL OPERATIONS ---
R, G, B = image[:, :, 0], image[:, :, 1], image[:, :, 2]

# Convert to grayscale (weighted average — human perception)
grayscale = 0.2989 * R + 0.5870 * G + 0.1140 * B
grayscale = grayscale.astype(np.uint8)
print(f"Grayscale shape: {grayscale.shape}")  # (480, 640)

# --- PIXEL OPERATIONS ---
# Brighten (clip to avoid overflow)
brighter = np.clip(image.astype(np.int32) + 50, 0, 255).astype(np.uint8)

# Darken
darker = (image * 0.5).astype(np.uint8)

# Invert colors
inverted = 255 - image

# Apply channel-specific adjustment (blue tint)
blue_tint = image.copy()
blue_tint[:, :, 2] = np.clip(blue_tint[:, :, 2].astype(int) + 80, 0, 255).astype(np.uint8)

# --- CROPPING ---
# Crop center 200x200
h, w = image.shape[:2]
cy, cx = h // 2, w // 2
crop = image[cy-100:cy+100, cx-100:cx+100]

# --- FLIPPING ---
flipped_h = image[:, ::-1]    # Horizontal flip
flipped_v = image[::-1, :]    # Vertical flip
rotated_180 = image[::-1, ::-1]  # 180° rotation

# --- CONVOLUTION (manual 2D convolution) ---
def convolve2d(image, kernel):
    """Apply convolution to a 2D (grayscale) image."""
    kh, kw = kernel.shape
    ph, pw = kh // 2, kw // 2
    
    # Pad image
    padded = np.pad(image, ((ph, ph), (pw, pw)), mode='reflect')
    
    # Sliding window view
    from numpy.lib.stride_tricks import sliding_window_view
    windows = sliding_window_view(padded, window_shape=(kh, kw))
    
    # Apply kernel
    return np.einsum('ijkl,kl->ij', windows, kernel)

# Gaussian blur kernel
def gaussian_kernel(size=5, sigma=1.0):
    ax = np.linspace(-(size-1)/2, (size-1)/2, size)
    gauss = np.exp(-ax**2 / (2 * sigma**2))
    kernel = np.outer(gauss, gauss)
    return kernel / kernel.sum()

# Edge detection (Sobel)
sobel_x = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], dtype=float)
sobel_y = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]], dtype=float)

edges_x = convolve2d(grayscale.astype(float), sobel_x)
edges_y = convolve2d(grayscale.astype(float), sobel_y)
edges = np.sqrt(edges_x**2 + edges_y**2)
edges = (edges / edges.max() * 255).astype(np.uint8)

# --- COLOR HISTOGRAMS ---
r_hist, bins = np.histogram(R.flatten(), bins=256, range=(0, 255))
g_hist, _ = np.histogram(G.flatten(), bins=256, range=(0, 255))
b_hist, _ = np.histogram(B.flatten(), bins=256, range=(0, 255))

# Histogram equalization (grayscale)
def histogram_equalize(img):
    hist, bins = np.histogram(img.flatten(), 256, range=(0, 256))
    cdf = hist.cumsum()
    cdf_normalized = (cdf - cdf.min()) * 255 / (cdf.max() - cdf.min())
    return cdf_normalized[img].astype(np.uint8)

equalized = histogram_equalize(grayscale)

11. NumPy for Financial Analysis {#finance}

import numpy as np

# Simulate stock price data (Geometric Brownian Motion)
np.random.seed(42)
n_days = 252  # Trading days in a year
n_stocks = 5

# Parameters
S0 = np.array([100, 150, 200, 75, 120])    # Initial prices
mu = np.array([0.10, 0.12, 0.08, 0.15, 0.09]) / 252  # Daily drift
sigma = np.array([0.20, 0.25, 0.15, 0.30, 0.18]) / np.sqrt(252)  # Daily vol

# Simulate price paths
dt = 1
returns = np.random.normal(mu, sigma, (n_days, n_stocks))
price_paths = S0 * np.exp(np.cumsum(returns, axis=0))  # (252, 5)
price_paths = np.vstack([S0, price_paths])               # Add initial prices

print(f"Price paths shape: {price_paths.shape}")  # (253, 5)

# --- RETURNS ANALYSIS ---
daily_returns = np.diff(price_paths, axis=0) / price_paths[:-1]
log_returns = np.diff(np.log(price_paths), axis=0)

# Annualized statistics
annual_return = daily_returns.mean(axis=0) * 252
annual_vol = daily_returns.std(axis=0, ddof=1) * np.sqrt(252)
sharpe_ratio = annual_return / annual_vol

print("\nStock Performance:")
stock_names = ['AAPL', 'MSFT', 'AMZN', 'TSLA', 'GOOGL']
for i, name in enumerate(stock_names):
    print(f"  {name}: Return={annual_return[i]:.2%}, Vol={annual_vol[i]:.2%}, "
          f"Sharpe={sharpe_ratio[i]:.2f}")

# --- PORTFOLIO ANALYSIS ---
# Equal weight portfolio
n_assets = n_stocks
weights = np.ones(n_assets) / n_assets

# Portfolio returns
portfolio_returns = daily_returns @ weights
portfolio_annual_return = portfolio_returns.mean() * 252
portfolio_annual_vol = portfolio_returns.std(ddof=1) * np.sqrt(252)
portfolio_sharpe = portfolio_annual_return / portfolio_annual_vol

# Correlation matrix
corr_matrix = np.corrcoef(daily_returns.T)
cov_matrix = np.cov(daily_returns.T)

# Portfolio variance (using matrix algebra)
portfolio_variance = weights @ cov_matrix @ weights * 252
portfolio_vol_matrix = np.sqrt(portfolio_variance)

print(f"\nPortfolio: Return={portfolio_annual_return:.2%}, "
      f"Vol={portfolio_vol_matrix:.2%}, Sharpe={portfolio_sharpe:.2f}")

# --- RISK METRICS ---
# Value at Risk (VaR) — 95% and 99%
VaR_95 = np.percentile(portfolio_returns, 5)
VaR_99 = np.percentile(portfolio_returns, 1)
print(f"\n1-Day VaR (95%): {VaR_95:.4f} ({VaR_95*100:.2f}%)")
print(f"1-Day VaR (99%): {VaR_99:.4f} ({VaR_99*100:.2f}%)")

# Expected Shortfall (CVaR) — average loss beyond VaR
ES_95 = portfolio_returns[portfolio_returns <= VaR_95].mean()
print(f"1-Day ES (95%): {ES_95:.4f} ({ES_95*100:.2f}%)")

# Maximum Drawdown
portfolio_cumulative = np.cumprod(1 + portfolio_returns)
running_max = np.maximum.accumulate(portfolio_cumulative)
drawdowns = (portfolio_cumulative - running_max) / running_max
max_drawdown = drawdowns.min()
print(f"Maximum Drawdown: {max_drawdown:.4f} ({max_drawdown*100:.2f}%)")

# --- MONTE CARLO PORTFOLIO SIMULATION ---
n_simulations = 10000
sim_weights = np.random.dirichlet(np.ones(n_assets), n_simulations)  # Random weights that sum to 1

# Compute returns and risks for each portfolio
sim_returns = sim_weights @ annual_return
sim_vols = np.sqrt(np.einsum('ij,jk,ik->i', sim_weights, cov_matrix * 252, sim_weights))
sim_sharpes = sim_returns / sim_vols

# Find optimal portfolios
max_sharpe_idx = np.argmax(sim_sharpes)
min_vol_idx = np.argmin(sim_vols)

print(f"\nOptimal Portfolios:")
print(f"Max Sharpe: {sim_sharpes[max_sharpe_idx]:.3f}, "
      f"Return={sim_returns[max_sharpe_idx]:.2%}, Vol={sim_vols[max_sharpe_idx]:.2%}")
print(f"Min Vol: {sim_vols[min_vol_idx]:.2%}, "
      f"Return={sim_returns[min_vol_idx]:.2%}")
print(f"Max Sharpe weights: {np.round(sim_weights[max_sharpe_idx], 3)}")

12. Comparison — NumPy vs Alternatives {#comparison}

Feature	NumPy	Pandas	PyTorch Tensor	CuPy	JAX
Data types	Homogeneous	Heterogeneous columns	Homogeneous	Homogeneous	Homogeneous
Primary use	Numerical arrays	Tabular data	ML/Deep learning	GPU arrays	ML + autodiff
GPU support	❌	❌	✅	✅	✅
Autodiff	❌	❌	✅	❌	✅
Speed (CPU)	⭐⭐⭐	⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐
Memory	Efficient	Overhead	Overhead	GPU VRAM	Efficient
API compatibility	Native	Built on NumPy	NumPy-like	NumPy-compatible	NumPy-like
Labels/index	❌	✅	❌	❌	❌

When to Use What?

NumPy → Default choice for array math, scientific computing, ML internals
Pandas → Tabular data with labels, CSV/Excel/SQL data
PyTorch → Deep learning, need autograd, GPU training
CuPy → Drop-in NumPy replacement for GPU (same API!)
JAX → Research ML, functional transformations (jit, grad, vmap)
Dask → NumPy arrays larger than RAM (distributed computing)

NumPy to CuPy Migration

# Almost identical API!
import numpy as np        # CPU
import cupy as cp         # GPU (if CUDA available)

# NumPy code
a = np.random.randn(1000, 1000)
b = np.random.randn(1000, 1000)
c = a @ b               # CPU matrix multiplication

# CuPy code (GPU) — nearly identical!
a_gpu = cp.random.randn(1000, 1000)
b_gpu = cp.random.randn(1000, 1000)
c_gpu = a_gpu @ b_gpu   # GPU matrix multiplication

# Transfer between CPU and GPU
a_gpu = cp.asarray(a)   # NumPy → CuPy (CPU to GPU)
a_cpu = cp.asnumpy(a_gpu)  # CuPy → NumPy (GPU to CPU)

13. Best Practices & Pro Developer Guide {#best-practices}

import numpy as np

# --- BEST PRACTICE 1: Always set random seed for reproducibility ---
rng = np.random.default_rng(seed=42)  # Use new Generator API
# NOT: np.random.seed(42)  — global state, harder to reason about

# --- BEST PRACTICE 2: Use assertions to validate shapes ---
def matrix_multiply(A, B):
    assert A.ndim == 2, f"A must be 2D, got {A.ndim}D"
    assert B.ndim == 2, f"B must be 2D, got {B.ndim}D"
    assert A.shape[1] == B.shape[0], \
        f"Shape mismatch: {A.shape} vs {B.shape}"
    return A @ B

# --- BEST PRACTICE 3: Use np.testing for unit tests ---
actual = np.array([1.0, 2.0, 3.0])
expected = np.array([1.0, 2.0, 3.0])

np.testing.assert_array_equal(actual, expected)           # Exact equality
np.testing.assert_array_almost_equal(actual, expected, decimal=5)  # ~5 decimal places
np.testing.assert_allclose(actual, expected, rtol=1e-5)  # Relative tolerance

# --- BEST PRACTICE 4: Validate inputs defensively ---
def safe_normalize(arr: np.ndarray, axis: int = 0) -> np.ndarray:
    arr = np.asarray(arr, dtype=np.float64)   # Ensure correct type
    if arr.ndim < 1:
        raise ValueError("Input must be at least 1D")
    
    norms = np.linalg.norm(arr, axis=axis, keepdims=True)
    zero_norms = norms == 0
    if np.any(zero_norms):
        import warnings
        warnings.warn("Zero-norm vectors detected; will produce NaN")
    return arr / norms

# --- BEST PRACTICE 5: Document array shapes with comments ---
def attention_scores(Q, K, V, mask=None):
    """
    Multi-head attention scores.
    Args:
        Q: (batch, seq_len, d_k)
        K: (batch, seq_len, d_k)
        V: (batch, seq_len, d_v)
        mask: (batch, seq_len, seq_len) or None
    Returns:
        output: (batch, seq_len, d_v)
    """
    d_k = Q.shape[-1]
    # (batch, seq_len, seq_len)
    scores = Q @ K.transpose(0, 2, 1) / np.sqrt(d_k)
    
    if mask is not None:
        scores = np.where(mask, scores, -1e9)
    
    # (batch, seq_len, seq_len)
    weights = np.exp(scores - scores.max(axis=-1, keepdims=True))
    weights = weights / weights.sum(axis=-1, keepdims=True)
    
    # (batch, seq_len, d_v)
    return weights @ V

# --- BEST PRACTICE 6: Use context managers for precision control ---
with np.errstate(divide='ignore', invalid='ignore', over='ignore'):
    result = np.log(np.array([0, 1, 2, -1]))   # No warning for log(0) or log(-1)

# --- BEST PRACTICE 7: Profile before optimizing ---
# First make it correct, then profile, then optimize the bottleneck
# Use: cProfile, line_profiler, memory_profiler, timeit

# --- BEST PRACTICE 8: Prefer NumPy functions over Python equivalents ---
arr = np.array([3, 1, 4, 1, 5, 9])
min_val = min(arr)         # SLOW — Python min (creates iterator)
max_val = max(arr)         # SLOW
min_val = arr.min()        # FAST — NumPy method
max_val = arr.max()        # FAST
min_val = np.min(arr)      # Also FAST

# --- BEST PRACTICE 9: Avoid Python loops — use vectorization ---
# See Section 3 for detailed examples

# --- BEST PRACTICE 10: Memory — float32 for large ML arrays ---
# Use float32 for ML (half the memory, negligible precision loss)
# Use float64 for scientific computing (full precision)
large_ml_array = np.zeros((10000, 512), dtype=np.float32)  # 20MB not 40MB

14. Common Anti-Patterns — What NOT to Do {#antipatterns}

import numpy as np

# ❌ ANTI-PATTERN 1: Growing arrays with np.append in loops
# BAD — O(n²) time, creates new array each iteration
result = np.array([])
for i in range(10000):
    result = np.append(result, i**2)

# GOOD — Pre-allocate or collect in list then convert once
result = np.array([i**2 for i in range(10000)])  # List comp then convert
# Or
result = np.arange(10000) ** 2                    # Best — pure NumPy

# ❌ ANTI-PATTERN 2: Not using vectorization
data = np.random.randn(100000)
threshold = 2.0

# BAD
outliers = []
for x in data:
    if abs(x) > threshold:
        outliers.append(x)

# GOOD
outliers = data[np.abs(data) > threshold]

# ❌ ANTI-PATTERN 3: Accidental view modification
original = np.array([1, 2, 3, 4, 5])
modified = original[2:5]      # View!
modified[:] = 99              # Modifies original!
print(original)               # [1 2 99 99 99] — surprise!

# GOOD
modified = original[2:5].copy()   # Explicit copy

# ❌ ANTI-PATTERN 4: Using Python comparison instead of np.isnan
arr = np.array([1, 2, np.nan, 4])
# BAD
result = [x for x in arr if x != np.nan]  # Never works! nan != nan

# GOOD
result = arr[~np.isnan(arr)]

# ❌ ANTI-PATTERN 5: Forgetting that int arrays can overflow silently
age = np.array([200], dtype=np.uint8)
age += 100     # Silently wraps around!
print(age)     # [44] — wrong!

# GOOD — Use larger dtype or check before operation
age = np.array([200], dtype=np.int32)  # Won't overflow here

# ❌ ANTI-PATTERN 6: Using np.vectorize thinking it's fast
np.vectorize(lambda x: x**2)(arr)   # NOT vectorized! Just a Python loop with NumPy overhead

# GOOD
arr ** 2  # Truly vectorized

# ❌ ANTI-PATTERN 7: Reshape without checking contiguity
arr = np.arange(12).reshape(3, 4).T  # Transpose — now F-contiguous!
# This will silently make a copy (expensive for large arrays)
try:
    arr.shape = (12,)   # Fails with error!
except AttributeError:
    pass

# GOOD — Use reshape (handles non-contiguous automatically)
flat = arr.reshape(-1)  # Works, may copy if needed
# Or explicitly make contiguous
flat = np.ascontiguousarray(arr).reshape(-1)

15. Complete Interview Question Bank {#interview}

Basic Level

Q1: What is the difference between np.array and np.asarray?

np.array always makes a copy. np.asarray avoids copying if input is already a NumPy array with the right dtype. Use asarray when you don't want unnecessary copies.

Q2: What does -1 mean in arr.reshape(-1, 4)?

-1 is a placeholder meaning "infer this dimension automatically." NumPy calculates it as total_elements / 4.

Q3: What is the output dtype of np.array([1, 2.0, True])?

float64 — NumPy upcasts to the highest precision type needed to represent all values.

Q4: What does arr.flags['WRITEABLE'] = False do?

Makes the array read-only. Any attempt to modify it will raise ValueError: assignment destination is read-only. Useful for protecting shared constants.

Q5: What is the difference between np.zeros and np.empty?

np.zeros initializes all elements to 0. np.empty allocates memory without initialization (contains garbage values) — faster when you plan to fill every element.

Intermediate Level

Q6: Explain how memory is laid out for a (3, 4) array and how transpose works without copying.

A (3, 4) float64 array has 12 elements stored as 96 bytes in memory (row-major). Strides are (32, 8) — 32 bytes to next row, 8 to next column. Transpose swaps the strides to (8, 32) and shape to (4, 3) — the underlying data doesn't move!

Q7: What is broadcasting and when does it fail?

Broadcasting is NumPy's rule for performing operations on arrays with different shapes. Arrays are compared dimension by dimension from the right: dimensions must either be equal, or one must be 1. Fails when incompatible dimensions exist, e.g., (3, 4) and (2,) → shapes 4 and 2 don't match.

Q8: What is the difference between np.dot, np.inner, and np.outer?

dot(a, b): matrix multiplication for 2D, sum of products for 1D. inner(a, b): same as dot for 1D; for nD, contracts last axis of a with last axis of b. outer(a, b): outer product — creates matrix of all combinations, shape (m, n).

Q9: When does ravel() return a copy vs a view?

Returns a view when array is C-contiguous. Returns a copy for non-contiguous arrays (e.g., after transpose). You can check with np.shares_memory(arr, arr.ravel()).

Q10: How would you implement a sliding window without explicit loops?

from numpy.lib.stride_tricks import sliding_window_view
windows = sliding_window_view(arr, window_shape=window_size)
window_means = windows.mean(axis=-1)

Advanced Level

Q11: What is the difference between np.linalg.solve and computing np.linalg.inv(A) @ b?

solve uses LU decomposition to find the solution directly — numerically stable and O(n³). inv(A) @ b computes the full inverse first (also O(n³)) then multiplies — less stable and slower. solve should always be preferred.

Q12: Explain how np.einsum works with optimize=True.

Without optimize, einsum contracts in the order specified. With optimize=True, NumPy finds the optimal contraction order (like optimal parenthesization of matrix chain multiplication) using dynamic programming, which can dramatically reduce the number of operations for 3+ tensor contractions.

Q13: What is the NumPy GIL situation?

NumPy operations release Python's GIL during execution of their C code, so NumPy computations can run in parallel threads. This is why concurrent.futures.ThreadPoolExecutor can give speedups for NumPy workloads, unlike pure Python operations.

Q14: What is the purpose of np.errstate?

It's a context manager to temporarily change how NumPy handles floating-point errors (divide by zero, overflow, underflow, invalid operations). Options: 'ignore', 'warn', 'raise', 'call', 'print', 'log'. Useful for suppressing expected warnings in numerical code.

Q15: How does np.memmap work and when should you use it?

Memory-mapped files map a disk file to an ndarray. Only the accessed portions are loaded from disk. Use when array is too large for RAM — common in genomics, large image datasets, and scientific simulations where data is much larger than available memory.

Scenario-Based

Q16: You have a (1000, 100) array. How do you normalize each row to have zero mean and unit standard deviation?

mean = arr.mean(axis=1, keepdims=True)  # (1000, 1)
std = arr.std(axis=1, keepdims=True, ddof=1)  # (1000, 1)
normalized = (arr - mean) / (std + 1e-8)  # Broadcasting: (1000, 100)

Q17: How would you efficiently compute all pairwise Euclidean distances between 1000 points in 128D?

# Using the identity ||a-b||² = ||a||² + ||b||² - 2(a·b)
sq_norms = np.sum(X**2, axis=1)  # (1000,)
D_sq = sq_norms[:, None] + sq_norms[None, :] - 2 * (X @ X.T)
D = np.sqrt(np.maximum(D_sq, 0))  # (1000, 1000)

Q18: How do you find the index of the 5 largest values in a 2D array?

flat_indices = np.argpartition(arr.flat, -5)[-5:]
sorted_flat = flat_indices[np.argsort(arr.flat[flat_indices])[::-1]]
row_indices, col_indices = np.unravel_index(sorted_flat, arr.shape)

Q19: You receive a CSV with mixed types. How do you load just the numeric columns?

data = np.genfromtxt('data.csv', delimiter=',', skip_header=1,
                      usecols=(1, 2, 4),  # Specific numeric columns
                      dtype=np.float64, filling_values=np.nan)

Q20: How would you implement batch normalization forward pass in NumPy?

def batch_norm(X, gamma, beta, eps=1e-5):
    # X: (batch_size, features)
    mean = X.mean(axis=0)           # (features,)
    var = X.var(axis=0)             # (features,)
    X_norm = (X - mean) / np.sqrt(var + eps)  # (batch, features)
    return gamma * X_norm + beta    # Scale and shift

16. Conclusion & Learning Path {#conclusion}

What We've Covered

Across all three parts of this guide, you've learned:

Part 1: ndarray fundamentals, dtypes, array creation, indexing, slicing, fancy indexing, broadcasting, reshape, and manipulation
Part 2: All mathematical ufuncs, statistical functions, sorting, set operations, complete linear algebra (linalg), FFT, polynomials, random generation, file I/O, structured arrays, and masked arrays
Part 3: NumPy internals, stride tricks, vectorization patterns, performance optimization, Numba integration, real-world pipelines (ETL, NLP, image processing, finance, ML), and comprehensive interview questions

Your NumPy Learning Path

Week 1: Part 1 — Master arrays, indexing, broadcasting
Week 2: Part 2 — Statistical functions, linalg
Week 3: Part 3 — Vectorization, real-world projects
Week 4: Build something: implement k-means, PCA, or a neural network from scratch
Ongoing: Read NumPy source code, contribute to open source

Key Mental Models

Think in shapes — always know your array shape before and after every operation
Broadcasting first — before writing a loop, ask "can broadcasting solve this?"
Vectorize everything — loops are for prototyping; production code should be loop-free
Memory matters — view vs copy distinction affects both correctness and performance
dtype is a contract — choose the right dtype upfront; implicit conversions cost memory

When to Use NumPy (vs Alternatives)

✅ Use NumPy when:
   - Working with numerical arrays
   - Scientific computing
   - Implementing ML algorithms from scratch
   - Image/signal processing arrays
   - Linear algebra operations

⬆️ Graduate to pandas when: labelled data, mixed types, SQL-like operations
⬆️ Graduate to PyTorch/JAX when: GPU, autograd, deep learning
⬆️ Graduate to Dask when: data > RAM, distributed computing

Final Advice

NumPy is the lingua franca of scientific Python. Master it deeply and everything else — pandas, scikit-learn, PyTorch — becomes easier because you understand what's happening under the hood.

The best way to truly learn NumPy is to implement algorithms from scratch: k-nearest neighbors, k-means clustering, logistic regression, PCA, a simple neural network. Do this, and NumPy will become second nature.

End of Part 3 — The Complete NumPy Guide

Series complete! Parts 1, 2, and 3 together form one of the most comprehensive NumPy references available.