1. Introduction
Every real-world program needs to store and manage collections of data. Python gives you four powerful, built-in data structures for this:
Structure | Think of it as... | Real-world example |
|---|---|---|
List | A shopping cart — ordered and changeable |
|
Tuple | GPS coordinates — fixed, read-only |
|
Set | A bag of unique coins — no duplicates |
|
Dictionary | A contact book — name maps to number |
|
Why does this matter?
Choosing the wrong data structure can make your code:
Slower — searching a list vs a set is O(n) vs O(1)
Buggier — allowing duplicates when you shouldn't
Harder to read — using
data[0]instead ofdata["name"]
By the end of this guide you'll know exactly which one to use, when, and why.
2. Quick Comparison Overview
Feature | List | Tuple | Set | Dictionary |
|---|---|---|---|---|
Syntax |
|
|
|
|
Ordered | ✅ Yes | ✅ Yes | ❌ No | ✅ Yes (Python 3.7+) |
Mutable | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
Allows Duplicates | ✅ Yes | ✅ Yes | ❌ No | ❌ No (keys) |
Indexed Access | ✅ Yes | ✅ Yes | ❌ No | ✅ By key |
Hashable (usable as dict key) | ❌ No | ✅ Yes | ❌ No | ❌ No |
Lookup Speed | O(n) | O(n) | O(1) | O(1) |
Memory | Medium | Low | Medium | High |
Use Case | General collection | Fixed data | Unique items | Key-value mapping |
3. LIST — Complete Deep Dive
A list is an ordered, mutable collection that can hold any data type — even mixed types. It is the most versatile and commonly used data structure in Python.
3.1 Creating Lists
# Empty list
empty = []
empty = list()
# List with values
numbers = [1, 2, 3, 4, 5]
names = ["Alice", "Bob", "Charlie"]
mixed = [1, "hello", 3.14, True, None] # Mixed types — Python allows this
# Nested list (2D list / matrix)
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
# List from other iterables
from_string = list("Python") # ['P', 'y', 't', 'h', 'o', 'n']
from_range = list(range(1, 6)) # [1, 2, 3, 4, 5]
from_tuple = list((1, 2, 3)) # [1, 2, 3]
from_set = list({3, 1, 2}) # [1, 2, 3] — order not guaranteed
# List repetition
zeros = [0] * 5 # [0, 0, 0, 0, 0]
pattern = [1, 2] * 3 # [1, 2, 1, 2, 1, 2]
# List comprehension (most Pythonic)
squares = [x**2 for x in range(1, 6)] # [1, 4, 9, 16, 25]
print(type(numbers)) # <class 'list'>
print(len(numbers)) # 5
3.2 Indexing and Slicing
fruits = ["apple", "banana", "cherry", "date", "elderberry"]
# 0 1 2 3 4 ← Positive index
# -5 -4 -3 -2 -1 ← Negative index
# Indexing — access single element
print(fruits[0]) # apple — first element
print(fruits[2]) # cherry
print(fruits[-1]) # elderberry — last element
print(fruits[-2]) # date — second to last
# Slicing — a[start:stop:step] (stop is EXCLUSIVE)
print(fruits[1:3]) # ['banana', 'cherry'] — index 1 and 2
print(fruits[:3]) # ['apple', 'banana', 'cherry'] — first 3
print(fruits[2:]) # ['cherry', 'date', 'elderberry'] — from index 2 onward
print(fruits[::2]) # ['apple', 'cherry', 'elderberry'] — every 2nd
print(fruits[::-1]) # ['elderberry', 'date', 'cherry', 'banana', 'apple'] — reversed
print(fruits[1:4:2]) # ['banana', 'date'] — from 1 to 3, step 2
# Nested list access
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(matrix[0]) # [1, 2, 3]
print(matrix[1][2]) # 6 — row 1, column 2
print(matrix[-1][-1]) # 9 — last row, last column
3.3 Adding Elements
lst = [1, 2, 3]
# append() — add ONE item to the END — O(1)
lst.append(4)
print(lst) # [1, 2, 3, 4]
lst.append([5, 6]) # Adds the LIST as a single element!
print(lst) # [1, 2, 3, 4, [5, 6]]
# insert(index, item) — insert at specific position — O(n)
lst = [1, 2, 3, 4]
lst.insert(2, 99) # Insert 99 at index 2
print(lst) # [1, 2, 99, 3, 4]
lst.insert(0, 0) # Insert at beginning
print(lst) # [0, 1, 2, 99, 3, 4]
lst.insert(100, 5) # Index beyond length → appends to end
print(lst) # [0, 1, 2, 99, 3, 4, 5]
# extend(iterable) — add ALL items from another iterable — O(k)
lst = [1, 2, 3]
lst.extend([4, 5, 6]) # Adds each element individually
print(lst) # [1, 2, 3, 4, 5, 6]
lst.extend("abc") # Strings are iterable!
print(lst) # [1, 2, 3, 4, 5, 6, 'a', 'b', 'c']
lst.extend(range(3)) # Works with any iterable
print(lst) # [1, 2, 3, 4, 5, 6, 'a', 'b', 'c', 0, 1, 2]
# + operator — concatenation — creates NEW list
a = [1, 2, 3]
b = [4, 5, 6]
c = a + b
print(c) # [1, 2, 3, 4, 5, 6]
print(a) # [1, 2, 3] — unchanged
3.4 Removing Elements
lst = [10, 20, 30, 20, 40, 50]
# remove(value) — removes FIRST occurrence of value — O(n)
lst.remove(20)
print(lst) # [10, 30, 20, 40, 50]
# lst.remove(99) # ValueError if not found!
# pop() — removes and RETURNS last item — O(1)
last = lst.pop()
print(last) # 50
print(lst) # [10, 30, 20, 40]
# pop(index) — removes and RETURNS item at index — O(n)
item = lst.pop(1)
print(item) # 30
print(lst) # [10, 20, 40]
# del — delete by index or slice — O(n)
lst = [10, 20, 30, 40, 50]
del lst[2]
print(lst) # [10, 20, 40, 50]
del lst[1:3] # Delete slice
print(lst) # [10, 50]
del lst # Delete the entire list!
# clear() — remove ALL elements — O(1)
lst = [1, 2, 3, 4, 5]
lst.clear()
print(lst) # []
3.5 Searching and Counting
lst = [10, 20, 30, 20, 40, 20, 50]
# in / not in — membership check — O(n)
print(20 in lst) # True
print(99 in lst) # False
print(99 not in lst) # True
# index(value) — returns index of FIRST occurrence — O(n)
print(lst.index(20)) # 1
print(lst.index(20, 2)) # 3 — search from index 2 onward
print(lst.index(20, 2, 5)) # 3 — search between index 2 and 4
# lst.index(99) # ValueError if not found!
# count(value) — count occurrences — O(n)
print(lst.count(20)) # 3
print(lst.count(99)) # 0 — no error, returns 0
3.6 Sorting and Reversing
lst = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]
# sort() — in-place sort — O(n log n)
lst.sort()
print(lst) # [1, 1, 2, 3, 3, 4, 5, 5, 6, 9]
lst.sort(reverse=True)
print(lst) # [9, 6, 5, 5, 4, 3, 3, 2, 1, 1]
# sorted() — returns NEW sorted list, original unchanged
original = [3, 1, 4, 1, 5]
new_list = sorted(original)
print(original) # [3, 1, 4, 1, 5] — unchanged
print(new_list) # [1, 1, 3, 4, 5]
# Sort with custom key
words = ["banana", "Apple", "cherry", "Date"]
words.sort() # ['Apple', 'Date', 'banana', 'cherry'] — uppercase first
words.sort(key=str.lower) # ['Apple', 'banana', 'cherry', 'Date'] — case-insensitive
words.sort(key=len) # Sort by length
print(words) # ['Date', 'Apple', 'banana', 'cherry']
# Sort list of dicts
students = [
{"name": "Alice", "score": 92},
{"name": "Bob", "score": 87},
{"name": "Charlie", "score": 95}
]
students.sort(key=lambda s: s["score"], reverse=True)
for s in students:
print(f"{s['name']}: {s['score']}")
# Charlie: 95 / Alice: 92 / Bob: 87
# reverse() — in-place reversal — O(n)
lst = [1, 2, 3, 4, 5]
lst.reverse()
print(lst) # [5, 4, 3, 2, 1]
# Reversed without modifying
print(lst[::-1]) # [1, 2, 3, 4, 5]
print(list(reversed(lst))) # [1, 2, 3, 4, 5]
3.7 Copying Lists
import copy
original = [1, [2, 3], [4, 5]]
# Assignment — NOT a copy, just another reference!
ref = original
ref[0] = 99
print(original) # [99, [2, 3], [4, 5]] — BOTH changed!
# Shallow copy — copies top level only
original = [1, [2, 3], [4, 5]]
shallow1 = original.copy()
shallow2 = original[:]
shallow3 = list(original)
shallow1[0] = 99 # Changes only shallow1
print(original) # [1, [2, 3], [4, 5]] — safe
shallow1[1][0] = 999 # Modifies nested list — SHARED reference!
print(original) # [1, [999, 3], [4, 5]] — oops!
# Deep copy — copies everything recursively
original = [1, [2, 3], [4, 5]]
deep = copy.deepcopy(original)
deep[1][0] = 999
print(original) # [1, [2, 3], [4, 5]] — completely safe ✅
3.8 Other Useful List Operations
lst = [3, 1, 4, 1, 5, 9]
# Aggregate functions
print(len(lst)) # 6 — number of elements
print(sum(lst)) # 23 — sum of all elements
print(min(lst)) # 1 — minimum value
print(max(lst)) # 9 — maximum value
# zip() — pair elements from multiple lists
names = ["Alice", "Bob", "Charlie"]
scores = [92, 87, 95]
grades = ["A", "B", "A"]
for name, score, grade in zip(names, scores, grades):
print(f"{name}: {score} ({grade})")
# enumerate() — index + value together
for i, name in enumerate(names, start=1):
print(f"{i}. {name}")
# 1. Alice / 2. Bob / 3. Charlie
# any() and all()
numbers = [2, 4, 6, 8, 10]
print(any(n > 9 for n in numbers)) # True — at least one > 9
print(all(n % 2 == 0 for n in numbers)) # True — all are even
# List flattening
nested = [[1, 2], [3, 4], [5, 6]]
flat = [item for sublist in nested for item in sublist]
print(flat) # [1, 2, 3, 4, 5, 6]
# Remove duplicates while preserving order
seen = set()
unique = [x for x in [3, 1, 4, 1, 5, 9, 2, 6, 5] if not (x in seen or seen.add(x))]
print(unique) # [3, 1, 4, 5, 9, 2, 6]
3.9 Complete List Methods Reference
Method | Description | Returns | Complexity |
|---|---|---|---|
| Add x to end | None | O(1) |
| Insert x at index i | None | O(n) |
| Add all items from iterable | None | O(k) |
| Remove first x | None | O(n) |
| Remove & return item at i | item | O(1)/O(n) |
| Remove all items | None | O(1) |
| First index of x | int | O(n) |
| Count of x | int | O(n) |
| In-place sort | None | O(n log n) |
| In-place reverse | None | O(n) |
| Shallow copy | list | O(n) |
4. TUPLE — Complete Deep Dive
A tuple is an ordered, immutable collection. Once created, it cannot be changed. Think of tuples as "read-only lists" that are faster and safer.
4.1 Creating Tuples
# Empty tuple
empty = ()
empty = tuple()
# Single-element tuple — THE COMMA IS CRITICAL!
single = (42,) # ✅ This is a tuple
not_tuple = (42) # ❌ This is just an int!
print(type(single)) # <class 'tuple'>
print(type(not_tuple)) # <class 'int'>
# Multiple elements
coords = (10.5, 20.3)
rgb = (255, 128, 0)
person = ("Alice", 30, "NYC")
mixed = (1, "hello", 3.14, True, None)
# Without parentheses — tuple packing
point = 3, 4 # Still a tuple!
print(type(point)) # <class 'tuple'>
# From iterable
from_list = tuple([1, 2, 3]) # (1, 2, 3)
from_str = tuple("Python") # ('P', 'y', 't', 'h', 'o', 'n')
from_range = tuple(range(1, 6)) # (1, 2, 3, 4, 5)
# Nested tuple
nested = ((1, 2), (3, 4), (5, 6))
print(type(coords)) # <class 'tuple'>
print(len(person)) # 3
4.2 Accessing Tuple Elements
person = ("Alice", 30, "New York", "Engineer")
# Indexing — same as list
print(person[0]) # Alice
print(person[-1]) # Engineer
print(person[1:3]) # (30, 'New York')
print(person[::-1]) # ('Engineer', 'New York', 30, 'Alice')
# Nested access
matrix = ((1, 2, 3), (4, 5, 6), (7, 8, 9))
print(matrix[1]) # (4, 5, 6)
print(matrix[1][2]) # 6
4.3 Tuple Unpacking — The Superpower
# Basic unpacking — number of variables must match
coords = (10, 20)
x, y = coords
print(x, y) # 10 20
person = ("Alice", 30, "NYC")
name, age, city = person
print(name, age, city) # Alice 30 NYC
# Swap using tuple unpacking
a, b = 10, 20
a, b = b, a
print(a, b) # 20 10
# Extended unpacking with * (star expression)
numbers = (1, 2, 3, 4, 5)
first, *rest = numbers
print(first) # 1
print(rest) # [2, 3, 4, 5]
*start, last = numbers
print(start) # [1, 2, 3, 4]
print(last) # 5
first, *middle, last = numbers
print(first) # 1
print(middle) # [2, 3, 4]
print(last) # 5
# Ignore values with _
name, _, city = ("Alice", 30, "NYC") # Ignore age
print(name, city) # Alice NYC
_, second, *_ = (1, 2, 3, 4, 5)
print(second) # 2
# Nested unpacking
data = ((1, 2), (3, 4))
(a, b), (c, d) = data
print(a, b, c, d) # 1 2 3 4
# In for loops
points = [(1, 2), (3, 4), (5, 6)]
for x, y in points:
distance = (x**2 + y**2) ** 0.5
print(f"({x},{y}) → distance: {distance:.2f}")
4.4 Tuple Methods
Tuples only have two methods because they're immutable:
t = (10, 20, 30, 20, 40, 20)
# count(value) — count occurrences
print(t.count(20)) # 3
print(t.count(99)) # 0
# index(value) — first index of value
print(t.index(20)) # 1
print(t.index(20, 2)) # 3 — search from index 2
# t.index(99) # ValueError if not found!
# Built-in functions work on tuples
print(len(t)) # 6
print(sum(t)) # 140
print(min(t)) # 10
print(max(t)) # 40
print(sorted(t)) # [10, 20, 20, 20, 30, 40] — returns a LIST
4.5 Immutability — What It Means
t = (1, 2, 3)
# You CANNOT modify a tuple
# t[0] = 99 # ❌ TypeError: 'tuple' object does not support item assignment
# t.append(4) # ❌ AttributeError: 'tuple' object has no attribute 'append'
# del t[0] # ❌ TypeError
# BUT if the tuple contains mutable objects (like a list), those CAN be changed
mixed = (1, [2, 3], "hello")
mixed[1].append(4) # Modifies the list INSIDE the tuple
print(mixed) # (1, [2, 3, 4], 'hello') — tuple itself unchanged
mixed[1][0] = 99
print(mixed) # (1, [99, 3, 4], 'hello')
# "Modify" a tuple by creating a new one
t = (1, 2, 3)
t = t + (4, 5) # Creates a NEW tuple
print(t) # (1, 2, 3, 4, 5)
t = t[:2] + (99,) + t[3:] # "Replace" index 2 with 99
print(t) # (1, 2, 99, 4, 5)
4.6 Why Use Tuples? Key Advantages
# 1. Tuples are FASTER than lists
import timeit
list_time = timeit.timeit("[1, 2, 3, 4, 5]", number=10_000_000)
tuple_time = timeit.timeit("(1, 2, 3, 4, 5)", number=10_000_000)
print(f"List: {list_time:.3f}s")
print(f"Tuple: {tuple_time:.3f}s") # Consistently faster
# 2. Tuples use LESS MEMORY
import sys
lst = [1, 2, 3, 4, 5]
tpl = (1, 2, 3, 4, 5)
print(sys.getsizeof(lst)) # 88 bytes
print(sys.getsizeof(tpl)) # 72 bytes
# 3. Tuples can be DICT KEYS (lists cannot)
locations = {
(28.6139, 77.2090): "New Delhi",
(19.0760, 72.8777): "Mumbai",
(13.0827, 80.2707): "Chennai"
}
print(locations[(28.6139, 77.2090)]) # New Delhi
# {[1,2]: "a"} # ❌ TypeError — list is unhashable
# 4. Tuples as function return values (multiple returns)
def get_stats(numbers):
return min(numbers), max(numbers), sum(numbers) / len(numbers)
minimum, maximum, average = get_stats([3, 1, 4, 1, 5, 9])
print(f"Min={minimum}, Max={maximum}, Avg={average:.2f}")
# 5. Named Tuples — best of both worlds
from collections import namedtuple
# Define the structure
Employee = namedtuple("Employee", ["name", "department", "salary"])
Point3D = namedtuple("Point3D", ["x", "y", "z"])
# Create instances
emp = Employee("Alice", "Engineering", 95000)
p = Point3D(1.0, 2.5, -3.0)
# Access by name OR index
print(emp.name) # Alice
print(emp[0]) # Alice
print(emp.salary) # 95000
print(p.x, p.y, p.z) # 1.0 2.5 -3.0
# Still immutable — behaves like a tuple
print(emp._asdict()) # OrderedDict([('name', 'Alice'), ...])
emp2 = emp._replace(salary=100000) # Creates new namedtuple
print(emp2) # Employee(name='Alice', department='Engineering', salary=100000)
print(emp) # Unchanged
# Python 3.6+ dataclass (modern alternative to namedtuple)
from dataclasses import dataclass
@dataclass(frozen=True) # frozen=True makes it immutable like a tuple
class Config:
host: str
port: int
debug: bool = False
cfg = Config("localhost", 5432)
print(cfg.host) # localhost
print(cfg) # Config(host='localhost', port=5432, debug=False)
5. SET — Complete Deep Dive
A set is an unordered collection of unique elements. Sets are perfect when you need to eliminate duplicates, test membership quickly, or perform mathematical set operations.
5.1 Creating Sets
# Empty set — MUST use set(), not {} (that creates a dict!)
empty = set()
print(type(empty)) # <class 'set'>
print(type({})) # <class 'dict'> ← Be careful!
# Set with values
fruits = {"apple", "banana", "cherry"}
numbers = {1, 2, 3, 4, 5}
mixed = {1, "hello", 3.14, True} # Allows mixed types
# Duplicates are automatically removed!
s = {1, 2, 2, 3, 3, 3, 4}
print(s) # {1, 2, 3, 4} — only unique values
# From other iterables
from_list = set([1, 2, 2, 3, 3]) # {1, 2, 3}
from_str = set("Mississippi") # {'M', 'i', 's', 'p'} — unique chars
from_tuple = set((1, 2, 2, 3)) # {1, 2, 3}
# frozenset — immutable set (can be used as dict key)
frozen = frozenset([1, 2, 3, 4])
print(type(frozen)) # <class 'frozenset'>
# frozen.add(5) # ❌ AttributeError — frozenset is immutable
# Set comprehension
even_squares = {x**2 for x in range(1, 11) if x % 2 == 0}
print(even_squares) # {4, 16, 36, 64, 100}
print(type(fruits)) # <class 'set'>
5.2 Adding and Removing Elements
s = {1, 2, 3}
# add(element) — add ONE element — O(1)
s.add(4)
print(s) # {1, 2, 3, 4}
s.add(2) # Adding duplicate — silently ignored
print(s) # {1, 2, 3, 4} — unchanged
# update(iterable) — add multiple elements — O(k)
s.update([5, 6, 7])
print(s) # {1, 2, 3, 4, 5, 6, 7}
s.update("ab", {8, 9}) # Can accept multiple iterables
print(s) # {1, 2, 3, 4, 5, 6, 7, 8, 9, 'a', 'b'}
s = {1, 2, 3, 4, 5}
# remove(element) — remove element, raises KeyError if not found — O(1)
s.remove(3)
print(s) # {1, 2, 4, 5}
# s.remove(99) # ❌ KeyError: 99
# discard(element) — remove element, NO error if not found — O(1)
s.discard(4)
print(s) # {1, 2, 5}
s.discard(99) # ✅ No error
print(s) # {1, 2, 5}
# pop() — remove and return ARBITRARY element (sets are unordered) — O(1)
popped = s.pop()
print(popped) # Some element (unpredictable!)
print(s)
# clear() — remove all elements — O(1)
s = {1, 2, 3}
s.clear()
print(s) # set()
5.3 Set Operations — Mathematical Power
A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}
# Union — all elements from both sets
print(A | B) # {1, 2, 3, 4, 5, 6, 7}
print(A.union(B)) # Same result
print(A.union(B, {8, 9})) # Can chain multiple sets
# Intersection — elements in BOTH sets
print(A & B) # {3, 4, 5}
print(A.intersection(B)) # Same result
# Difference — elements in A but NOT in B
print(A - B) # {1, 2}
print(A.difference(B)) # Same result
print(B - A) # {6, 7}
# Symmetric Difference — elements in EITHER but NOT BOTH
print(A ^ B) # {1, 2, 6, 7}
print(A.symmetric_difference(B)) # Same result
# In-place versions (modify the set directly)
A |= B # A = A | B (union update)
A &= B # A = A & B (intersection update)
A -= B # A = A - B (difference update)
A ^= B # A = A ^ B (symmetric difference update)
# Subset and Superset checks
A = {1, 2, 3}
B = {1, 2, 3, 4, 5}
print(A.issubset(B)) # True — A ⊆ B (A is subset of B)
print(A <= B) # True — same as issubset
print(A < B) # True — strict subset (A ⊂ B, A ≠ B)
print(B.issuperset(A)) # True — B ⊇ A
print(B >= A) # True — same as issuperset
print(B > A) # True — strict superset
print(A <= A) # True — every set is a subset of itself
print(A < A) # False — not a STRICT subset
# Disjoint — do sets share ANY elements?
X = {1, 2, 3}
Y = {4, 5, 6}
Z = {3, 4, 5}
print(X.isdisjoint(Y)) # True — no common elements
print(X.isdisjoint(Z)) # False — 3 is common
5.4 Set Methods Reference
s = {3, 1, 4, 1, 5, 9, 2, 6}
# Viewing
print(len(s)) # 7 (unique count)
print(3 in s) # True
print(99 not in s) # True
print(min(s)) # 1
print(max(s)) # 9
print(sum(s)) # 30
print(sorted(s)) # [1, 2, 3, 4, 5, 6, 9] — returns a LIST
# Iteration (order is NOT guaranteed)
for item in s:
print(item) # Some order — don't rely on it
# Convert to sorted list when you need order
sorted_items = sorted(s)
Method | Description | Returns |
|---|---|---|
| Add element x | None |
| Add all from iterables | None |
| Remove x, KeyError if missing | None |
| Remove x, no error if missing | None |
| Remove/return arbitrary element | element |
| Remove all elements | None |
| All elements from sets | new set |
| Common elements | new set |
| Elements not in others | new set |
| In either, not both | new set |
| Is self ⊆ other? | bool |
| Is self ⊇ other? | bool |
| No common elements? | bool |
| Shallow copy | new set |
| Immutable version | frozenset |
5.5 Key Set Use Cases
# Use Case 1: Remove duplicates from a list
emails = ["a@test.com", "b@test.com", "a@test.com", "c@test.com", "b@test.com"]
unique_emails = list(set(emails))
print(unique_emails) # ['a@test.com', 'b@test.com', 'c@test.com'] — order not guaranteed
# Preserve order while deduplicating
seen = set()
unique_ordered = [e for e in emails if not (e in seen or seen.add(e))]
print(unique_ordered) # ['a@test.com', 'b@test.com', 'c@test.com']
# Use Case 2: Fast membership testing
banned_users = {"spammer123", "bot456", "fake789"}
user = "spammer123"
if user in banned_users: # O(1) — instant, regardless of set size!
print("Access denied!")
# Use Case 3: Find common interests (intersection)
alice_skills = {"Python", "SQL", "Machine Learning", "Docker"}
bob_skills = {"JavaScript", "Python", "Docker", "Kubernetes"}
common = alice_skills & bob_skills
print(f"Common skills: {common}") # {'Python', 'Docker'}
# Use Case 4: Find unique words in text
text = "the quick brown fox jumps over the lazy dog the fox"
unique_words = set(text.split())
print(f"Unique words: {len(unique_words)}") # 8
# Use Case 5: Missing elements
required_cols = {"name", "email", "age", "salary"}
available_cols = {"name", "email", "phone"}
missing = required_cols - available_cols
print(f"Missing: {missing}") # {'age', 'salary'}
6. DICTIONARY — Complete Deep Dive
A dictionary is an ordered (Python 3.7+), mutable collection of key-value pairs. Keys must be unique and hashable. Think of it as a lookup table — you get a value instantly by its key.
6.1 Creating Dictionaries
# Empty dict
empty = {}
empty = dict()
# Basic dict
person = {"name": "Alice", "age": 30, "city": "NYC"}
# Different ways to create
# 1. From keyword arguments
d1 = dict(name="Alice", age=30, city="NYC")
# 2. From list of tuples
d2 = dict([("name", "Alice"), ("age", 30), ("city", "NYC")])
# 3. From two lists using zip
keys = ["name", "age", "city"]
values = ["Alice", 30, "NYC"]
d3 = dict(zip(keys, values))
# 4. fromkeys() — create dict with same value for all keys
d4 = dict.fromkeys(["a", "b", "c"], 0)
print(d4) # {'a': 0, 'b': 0, 'c': 0}
d5 = dict.fromkeys(["x", "y", "z"])
print(d5) # {'x': None, 'y': None, 'z': None}
# 5. Dict comprehension
squared = {x: x**2 for x in range(1, 6)}
print(squared) # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
# Valid key types — must be hashable
valid_keys = {
42: "int key",
3.14: "float key",
True: "bool key",
"name": "str key",
(1, 2): "tuple key"
}
# ❌ Invalid key types — unhashable
# {[1,2]: "a"} # TypeError: unhashable type: 'list'
# {{}: "a"} # TypeError: unhashable type: 'dict'
# Nested dict
company = {
"name": "TechCorp",
"founded": 2010,
"address": {
"street": "123 Main St",
"city": "San Francisco",
"state": "CA"
},
"employees": ["Alice", "Bob", "Charlie"],
"is_active": True
}
print(type(person)) # <class 'dict'>
print(len(person)) # 3
6.2 Accessing Values
person = {"name": "Alice", "age": 30, "city": "NYC"}
# Direct access — KeyError if key doesn't exist
print(person["name"]) # Alice
print(person["age"]) # 30
# print(person["email"]) # ❌ KeyError: 'email'
# get(key, default) — safe access, returns default if key missing
print(person.get("name")) # Alice
print(person.get("email")) # None — no error
print(person.get("email", "N/A")) # N/A — custom default
# Nested access
company = {
"address": {
"city": "SF",
"zip": "94102"
}
}
print(company["address"]["city"]) # SF
print(company.get("address", {}).get("city", "Unknown")) # SF — safe nested access
# Check if key exists
print("name" in person) # True
print("email" in person) # False
print("email" not in person) # True
# in checks KEYS only (not values)
print(30 in person) # False — 30 is a value, not a key
print(30 in person.values()) # True — check values explicitly
6.3 Adding and Updating
person = {"name": "Alice", "age": 30}
# Add new key-value pair
person["email"] = "alice@example.com"
person["city"] = "NYC"
print(person) # {'name': 'Alice', 'age': 30, 'email': 'alice@...', 'city': 'NYC'}
# Update existing key
person["age"] = 31
print(person["age"]) # 31
# update() — add/update multiple key-value pairs
person.update({"phone": "555-1234", "age": 32})
person.update(country="USA", state="NY") # Keyword args also work
print(person)
# setdefault(key, default) — add key ONLY if it doesn't exist
person.setdefault("nickname", "Ali") # Adds 'nickname': 'Ali'
print(person.get("nickname")) # Ali
person.setdefault("name", "Bob") # Does NOT change 'name' — it already exists!
print(person.get("name")) # Alice — unchanged
# Merging dicts (Python 3.9+)
d1 = {"a": 1, "b": 2}
d2 = {"c": 3, "d": 4}
d3 = {"b": 99, "e": 5}
merged = d1 | d2
print(merged) # {'a': 1, 'b': 2, 'c': 3, 'd': 4}
# Later key wins when duplicates
merged = d1 | d3
print(merged) # {'a': 1, 'b': 99, 'e': 5} — d3's 'b' overwrites d1's
d1 |= d2 # In-place merge (like +=)
print(d1) # {'a': 1, 'b': 2, 'c': 3, 'd': 4}
# Python 3.5+ alternative
merged = {**d1, **d2} # Unpack both dicts
6.4 Removing Elements
d = {"a": 1, "b": 2, "c": 3, "d": 4}
# del — remove by key, KeyError if missing
del d["b"]
print(d) # {'a': 1, 'c': 3, 'd': 4}
# del d["z"] # ❌ KeyError
# pop(key, default) — remove and RETURN value
val = d.pop("c")
print(val) # 3
print(d) # {'a': 1, 'd': 4}
val = d.pop("z", "not found") # ✅ No error with default
print(val) # not found
# popitem() — remove and return LAST inserted key-value pair (Python 3.7+)
d = {"x": 10, "y": 20, "z": 30}
item = d.popitem()
print(item) # ('z', 30) — last inserted
print(d) # {'x': 10, 'y': 20}
# clear() — remove everything
d.clear()
print(d) # {}
6.5 Iterating Over Dictionaries
person = {"name": "Alice", "age": 30, "city": "NYC"}
# Iterate over KEYS (default behavior)
for key in person:
print(key) # name age city
# Iterate over KEYS explicitly
for key in person.keys():
print(key)
# Iterate over VALUES
for value in person.values():
print(value) # Alice 30 NYC
# Iterate over KEY-VALUE PAIRS — most common!
for key, value in person.items():
print(f"{key}: {value}")
# name: Alice
# age: 30
# city: NYC
# With index
for i, (key, value) in enumerate(person.items(), start=1):
print(f"{i}. {key} = {value}")
# Dict comprehension — transform while iterating
upper_keys = {k.upper(): v for k, v in person.items()}
print(upper_keys) # {'NAME': 'Alice', 'AGE': 30, 'CITY': 'NYC'}
# Filter while iterating
adults = {name: age for name, age in {"Alice": 30, "Bob": 17, "Charlie": 25}.items() if age >= 18}
print(adults) # {'Alice': 30, 'Charlie': 25}
6.6 All Dictionary Methods
d = {"a": 1, "b": 2, "c": 3}
# View objects (dynamic — reflect changes)
print(d.keys()) # dict_keys(['a', 'b', 'c'])
print(d.values()) # dict_values([1, 2, 3])
print(d.items()) # dict_items([('a', 1), ('b', 2), ('c', 3)])
# Convert to list if needed
keys = list(d.keys()) # ['a', 'b', 'c']
values = list(d.values()) # [1, 2, 3]
items = list(d.items()) # [('a', 1), ('b', 2), ('c', 3)]
# Copying
shallow_copy = d.copy()
# dict.fromkeys()
template = dict.fromkeys(["name", "age", "email"], "")
print(template) # {'name': '', 'age': '', 'email': ''}
Method | Description | Returns |
|---|---|---|
| Get value (KeyError if missing) | value |
| Safe get | value or default |
| All keys | dict_keys view |
| All values | dict_values view |
| All key-value pairs | dict_items view |
| Add/update from other | None |
| Get or set default | value |
| Remove & return | value |
| Remove & return last pair | (key, value) |
| Remove all | None |
| Shallow copy | dict |
| Create from keys | dict |
6.7 Advanced Dictionary Patterns
from collections import defaultdict, Counter, OrderedDict
# defaultdict — auto-creates default value for missing keys
word_count = defaultdict(int)
text = "the quick brown fox jumps over the lazy dog the"
for word in text.split():
word_count[word] += 1 # No KeyError! Default int is 0
print(dict(word_count))
# Group items by category
from collections import defaultdict
category_items = defaultdict(list)
products = [("fruit", "apple"), ("veggie", "carrot"), ("fruit", "banana"), ("veggie", "broccoli")]
for category, item in products:
category_items[category].append(item)
print(dict(category_items))
# {'fruit': ['apple', 'banana'], 'veggie': ['carrot', 'broccoli']}
# Counter — specialized dict for counting
from collections import Counter
text = "hello world"
char_count = Counter(text)
print(char_count) # Counter({'l': 3, 'o': 2, ...})
print(char_count.most_common(3)) # [('l', 3), ('o', 2), ('h', 1)]
# Counter arithmetic
c1 = Counter(["a", "b", "a", "c"])
c2 = Counter(["a", "b", "b", "d"])
print(c1 + c2) # Counter({'a': 3, 'b': 3, 'c': 1, 'd': 1})
print(c1 - c2) # Counter({'a': 1, 'c': 1})
print(c1 & c2) # Counter({'a': 1, 'b': 1}) — min of each
# OrderedDict — maintains insertion order (less needed in Python 3.7+)
od = OrderedDict()
od["first"] = 1
od["second"] = 2
od["third"] = 3
print(list(od.keys())) # ['first', 'second', 'third']
# Move to end / beginning
od.move_to_end("first")
print(list(od.keys())) # ['second', 'third', 'first']
od.move_to_end("first", last=False)
print(list(od.keys())) # ['first', 'second', 'third']
# Nested defaultdict
nested = defaultdict(lambda: defaultdict(int))
nested["Alice"]["Python"] += 5
nested["Alice"]["SQL"] += 3
nested["Bob"]["Python"] += 2
print(dict(nested["Alice"])) # {'Python': 5, 'SQL': 3}
7. Intermediate Usage — Combining All Four
7.1 Converting Between Data Structures
# List ↔ Tuple ↔ Set ↔ Dict
lst = [1, 2, 2, 3, 3, 4]
# List → Set (removes duplicates)
s = set(lst) # {1, 2, 3, 4}
# List → Tuple
t = tuple(lst) # (1, 2, 2, 3, 3, 4)
# Set → List (order not guaranteed)
back_to_list = list(s) # [1, 2, 3, 4] — some order
# Tuple → List (to make it mutable)
mutable = list(t)
# List of pairs → Dict
pairs = [("name", "Alice"), ("age", 30), ("city", "NYC")]
d = dict(pairs)
print(d) # {'name': 'Alice', 'age': 30, 'city': 'NYC'}
# Dict → List of keys, values, or pairs
keys = list(d.keys()) # ['name', 'age', 'city']
values = list(d.values()) # ['Alice', 30, 'NYC']
items = list(d.items()) # [('name', 'Alice'), ('age', 30), ...]
# Dict → Set of keys
key_set = set(d) # {'name', 'age', 'city'}
7.2 Nesting Data Structures
# List of Dicts — most common in real data work
employees = [
{"id": 1, "name": "Alice", "dept": "Engineering", "salary": 90000},
{"id": 2, "name": "Bob", "dept": "Marketing", "salary": 70000},
{"id": 3, "name": "Charlie", "dept": "Engineering", "salary": 95000},
{"id": 4, "name": "Diana", "dept": "Marketing", "salary": 75000},
]
# Filter engineers
engineers = [e for e in employees if e["dept"] == "Engineering"]
# Get all names
names = [e["name"] for e in employees]
print(names) # ['Alice', 'Bob', 'Charlie', 'Diana']
# Average salary
avg_salary = sum(e["salary"] for e in employees) / len(employees)
print(f"Average: ${avg_salary:,.0f}")
# Group by department
from collections import defaultdict
by_dept = defaultdict(list)
for emp in employees:
by_dept[emp["dept"]].append(emp["name"])
print(dict(by_dept))
# {'Engineering': ['Alice', 'Charlie'], 'Marketing': ['Bob', 'Diana']}
# Sort by salary descending
sorted_emps = sorted(employees, key=lambda e: e["salary"], reverse=True)
# Dict of Lists
schedule = {
"Monday": ["Alice", "Bob"],
"Tuesday": ["Charlie", "Diana"],
"Wednesday": ["Alice", "Charlie"],
}
# Who works Monday?
print(schedule["Monday"]) # ['Alice', 'Bob']
# All unique workers
all_workers = set(name for names in schedule.values() for name in names)
print(all_workers) # {'Alice', 'Bob', 'Charlie', 'Diana'}
# Dict of Dicts
inventory = {
"laptop": {"quantity": 50, "price": 999.99},
"mouse": {"quantity": 150, "price": 29.99},
"keyboard": {"quantity": 100, "price": 79.99},
}
# Total value
total = sum(item["quantity"] * item["price"] for item in inventory.values())
print(f"Total inventory value: ${total:,.2f}")
7.3 Unpacking with Zip
# Zip multiple iterables
names = ("Alice", "Bob", "Charlie")
scores = (92, 87, 95)
grades = ("A", "B", "A")
# zip stops at the shortest iterable
for name, score, grade in zip(names, scores, grades):
print(f"{name}: {score} ({grade})")
# Build dict using zip
person = dict(zip(["name", "age", "city"], ["Alice", 30, "NYC"]))
print(person)
# Unzip — separate a list of tuples into separate tuples
data = [("Alice", 92), ("Bob", 87), ("Charlie", 95)]
names, scores = zip(*data) # * unpacks the list
print(names) # ('Alice', 'Bob', 'Charlie')
print(scores) # (92, 87, 95)
# zip_longest — use when lists have different lengths
from itertools import zip_longest
a = [1, 2, 3]
b = [4, 5]
for x, y in zip_longest(a, b, fillvalue=0):
print(x, y)
# 1 4
# 2 5
# 3 0
8. Advanced Concepts
8.1 List Under the Hood — Dynamic Array
import sys
# Python list is a dynamic array — doubles capacity when full
lst = []
prev_size = 0
for i in range(20):
lst.append(i)
curr_size = sys.getsizeof(lst)
if curr_size != prev_size:
print(f"n={len(lst):2d} → size={curr_size} bytes")
prev_size = curr_size
# Output shows capacity doublings:
# n= 1 → size=88
# n= 5 → size=120
# n= 9 → size=184
# n=17 → size=248
8.2 Dict Under the Hood — Hash Table
# Dicts use hash tables — this is why keys must be hashable
# hash() returns the hash value of an object
print(hash("hello")) # Some integer (varies)
print(hash(42)) # 42
print(hash((1, 2, 3))) # Some integer
# hash([1,2,3]) # ❌ TypeError — lists are unhashable
# Dict lookup is O(1) because:
# 1. Compute hash(key)
# 2. Find bucket using hash % table_size
# 3. Compare key (handle collisions)
# Demonstration of O(1) lookup
import time
small_dict = {i: i**2 for i in range(10)}
large_dict = {i: i**2 for i in range(10_000_000)}
# Both lookups are equally fast!
start = time.time()
_ = small_dict.get(9)
print(f"Small dict: {time.time()-start:.8f}s")
start = time.time()
_ = large_dict.get(9_999_999)
print(f"Large dict: {time.time()-start:.8f}s")
# Times are nearly identical — true O(1)!
8.3 Memory-Efficient Patterns
# Generator expressions — don't build entire list in memory
import sys
# List: all values in memory at once
lst_expr = [x**2 for x in range(10000)]
print(sys.getsizeof(lst_expr)) # ~87,624 bytes
# Generator: one value at a time
gen_expr = (x**2 for x in range(10000))
print(sys.getsizeof(gen_expr)) # 112 bytes — tiny!
# But you can still iterate and use it
total = sum(gen_expr) # Consumes the generator
# For huge datasets — read and process in chunks
def chunk(lst, size):
"""Split list into chunks of given size."""
for i in range(0, len(lst), size):
yield lst[i:i+size]
data = list(range(100))
for batch in chunk(data, 10):
print(f"Processing batch of {len(batch)}")
8.4 Sorting Complex Structures
from operator import attrgetter, itemgetter
students = [
{"name": "Charlie", "grade": "A", "score": 92},
{"name": "Alice", "grade": "B", "score": 87},
{"name": "Bob", "grade": "A", "score": 95},
{"name": "Diana", "grade": "B", "score": 87},
]
# Sort by single key
by_score = sorted(students, key=itemgetter("score"), reverse=True)
# Sort by multiple keys (grade ASC, then score DESC)
multi_sort = sorted(students, key=lambda s: (s["grade"], -s["score"]))
for s in multi_sort:
print(f"{s['name']}: {s['grade']} / {s['score']}")
# Alice: B / 87
# Bob: A / 95
# Charlie: A / 92
# Diana: B / 87 ← Wait, both A grade sorted by score descending
# Stable sort — equal elements maintain original order
by_name = sorted(students, key=itemgetter("name"))
# Sort with None values (handle gracefully)
data = [{"score": 92}, {"score": None}, {"score": 87}]
sorted_data = sorted(data, key=lambda x: (x["score"] is None, x["score"] or 0))
9. Real-World Use Cases
9.1 Data Processing Pipeline
# Process sales records using all four structures
raw_data = [
{"product": "Laptop", "region": "North", "sales": 150, "revenue": 149850},
{"product": "Mouse", "region": "South", "sales": 500, "revenue": 14995},
{"product": "Keyboard", "region": "North", "sales": 300, "revenue": 23997},
{"product": "Laptop", "region": "South", "sales": 120, "revenue": 119880},
{"product": "Mouse", "region": "North", "sales": 450, "revenue": 13455},
{"product": "Keyboard", "region": "South", "sales": 200, "revenue": 15998},
]
# 1. Get unique products (Set)
unique_products = {record["product"] for record in raw_data}
print(f"Products: {unique_products}")
# 2. Total revenue per product (Dict)
revenue_by_product = {}
for record in raw_data:
product = record["product"]
revenue_by_product[product] = revenue_by_product.get(product, 0) + record["revenue"]
print("Revenue by product:", revenue_by_product)
# 3. Top-performing product (Dict + max)
top_product = max(revenue_by_product, key=revenue_by_product.get)
print(f"Top product: {top_product} (${revenue_by_product[top_product]:,.0f})")
# 4. Summary tuple (immutable snapshot)
summary = (
len(raw_data),
sum(r["revenue"] for r in raw_data),
max(r["revenue"] for r in raw_data),
)
total_records, total_rev, max_rev = summary
print(f"Records: {total_records}, Total: ${total_rev:,.0f}, Max: ${max_rev:,.0f}")
9.2 Config Management
# Application configuration using dict
DEFAULT_CONFIG = {
"database": {
"host": "localhost",
"port": 5432,
"name": "mydb",
"pool_size": 10
},
"cache": {
"host": "localhost",
"port": 6379,
"ttl": 3600
},
"app": {
"debug": False,
"secret_key": "change-me-in-production",
"allowed_hosts": ["localhost", "127.0.0.1"],
"cors_origins": {"https://myapp.com", "https://api.myapp.com"}
}
}
def get_config(env="development"):
"""Get configuration for given environment."""
config = dict(DEFAULT_CONFIG) # Start with defaults
if env == "production":
config["app"]["debug"] = False
config["database"]["pool_size"] = 20
return config
config = get_config("production")
db_host = config["database"]["host"]
print(f"Connecting to {db_host}:{config['database']['port']}")
9.3 Caching with Dictionary
# Memoization — cache expensive function results
def create_cache():
cache = {}
def memoize(func):
def wrapper(*args):
if args not in cache:
cache[args] = func(*args)
return cache[args]
return wrapper
return memoize
memoize = create_cache()
@memoize
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
print(fibonacci(50)) # Instant! Without cache: exponential time
# Built-in LRU Cache
from functools import lru_cache
@lru_cache(maxsize=128)
def expensive_calculation(n):
import time
time.sleep(0.1) # Simulate slow computation
return n ** 2
print(expensive_calculation(5)) # Slow first time
print(expensive_calculation(5)) # Instant — from cache!
print(expensive_calculation.cache_info())
9.4 API Response Handling
import json
# Typical API response as dict
api_response = {
"status": "success",
"code": 200,
"data": {
"users": [
{"id": 1, "name": "Alice", "roles": ["admin", "editor"], "active": True},
{"id": 2, "name": "Bob", "roles": ["viewer"], "active": True},
{"id": 3, "name": "Charlie", "roles": ["editor"], "active": False},
],
"pagination": {"page": 1, "per_page": 10, "total": 3}
}
}
# Extract active users
users = api_response["data"]["users"]
active = [u for u in users if u["active"]]
admin_ids = {u["id"] for u in users if "admin" in u["roles"]} # Set
# Get all unique roles
all_roles = set()
for user in users:
all_roles.update(user["roles"])
print(f"All roles: {all_roles}") # {'admin', 'editor', 'viewer'}
# Build lookup by ID
user_by_id = {u["id"]: u for u in users}
print(user_by_id[1]["name"]) # Alice
# Safe navigation
page_info = api_response.get("data", {}).get("pagination", {})
print(f"Page {page_info.get('page', 1)} of {page_info.get('total', 0)}")
10. Practical Examples
Example 1: Student Database System
"""
Student management system demonstrating all four data structures.
"""
# --- Data Storage ---
students_db = {} # Dict: {student_id: student_record}
enrolled_courses = {} # Dict: {student_id: set of courses}
all_courses = {"Python101", "DataScience201", "WebDev301", "Algorithms401"}
# --- Functions ---
def add_student(student_id, name, age, email):
"""Add a new student. Returns error message if ID exists."""
if student_id in students_db:
return f"Error: Student ID {student_id} already exists"
students_db[student_id] = {
"id": student_id,
"name": name,
"age": age,
"email": email,
"scores": [], # List — ordered, mutable
"badges": set(), # Set — unique achievements
}
enrolled_courses[student_id] = set()
return f"Student {name} added successfully"
def enroll(student_id, course):
"""Enroll student in a course."""
if student_id not in students_db:
return "Student not found"
if course not in all_courses:
return f"Course '{course}' does not exist"
enrolled_courses[student_id].add(course) # Set handles duplicates
return f"Enrolled {students_db[student_id]['name']} in {course}"
def add_score(student_id, subject, score):
"""Add a score for a student."""
if student_id not in students_db:
return "Student not found"
students_db[student_id]["scores"].append((subject, score)) # Tuple: immutable pair
if score >= 90:
students_db[student_id]["badges"].add("High Achiever")
if score == 100:
students_db[student_id]["badges"].add("Perfect Score")
def get_report(student_id):
"""Generate student report."""
if student_id not in students_db:
return "Student not found"
s = students_db[student_id]
scores = s["scores"]
avg = sum(sc for _, sc in scores) / len(scores) if scores else 0
# Build report using f-string
report = f"""
╔══════════════════════════════════╗
STUDENT REPORT
╠══════════════════════════════════╣
ID: {s['id']}
Name: {s['name']}
Age: {s['age']}
Email: {s['email']}
Average: {avg:.1f}
Badges: {', '.join(s['badges']) or 'None'}
Courses: {', '.join(enrolled_courses[student_id]) or 'None'}
Scores:
"""
for subject, score in sorted(scores, key=lambda x: -x[1]):
report += f" • {subject:<20} {score}\n"
report += "╚══════════════════════════════════╝"
return report
def get_top_students(n=3):
"""Return top N students by average score."""
averages = []
for sid, data in students_db.items():
scores = data["scores"]
if scores:
avg = sum(sc for _, sc in scores) / len(scores)
averages.append((data["name"], avg))
return sorted(averages, key=lambda x: -x[1])[:n]
def find_common_enrollments(id1, id2):
"""Find courses both students are enrolled in."""
if id1 not in students_db or id2 not in students_db:
return set()
return enrolled_courses[id1] & enrolled_courses[id2] # Set intersection
# --- Usage ---
print(add_student(1, "Alice Johnson", 22, "alice@college.edu"))
print(add_student(2, "Bob Smith", 21, "bob@college.edu"))
print(add_student(3, "Charlie Brown", 23, "charlie@college.edu"))
print(enroll(1, "Python101"))
print(enroll(1, "DataScience201"))
print(enroll(2, "Python101"))
print(enroll(2, "WebDev301"))
print(enroll(3, "Python101"))
print(enroll(3, "DataScience201"))
add_score(1, "Python101", 95)
add_score(1, "DataScience201", 88)
add_score(2, "Python101", 100)
add_score(2, "WebDev301", 72)
add_score(3, "Python101", 78)
add_score(3, "DataScience201", 91)
print(get_report(1))
print("\nTop Students:")
for name, avg in get_top_students():
print(f" {name}: {avg:.1f}")
common = find_common_enrollments(1, 3)
print(f"\nAlice & Charlie share: {common}")
Example 2: Inventory Management
"""
Inventory tracker using List, Tuple, Set, and Dict together.
"""
from collections import defaultdict
from datetime import datetime
# Inventory as dict of dicts
inventory = {}
# Transaction log as list of tuples (immutable records)
transactions = [] # List of tuples: (timestamp, item, action, qty, price)
# Low stock alerts as set
low_stock_items = set()
LOW_STOCK_THRESHOLD = 10
def add_item(item_id, name, quantity, price, category):
"""Add a new item to inventory."""
if item_id in inventory:
raise ValueError(f"Item {item_id} already exists")
inventory[item_id] = {
"name": name,
"quantity": quantity,
"price": price,
"category": category,
"created_at": datetime.now().isoformat()
}
# Log as immutable tuple
transactions.append((datetime.now().isoformat(), item_id, "ADD", quantity, price))
_check_stock(item_id)
print(f"Added: {name} (qty={quantity}, price=${price:.2f})")
def sell(item_id, qty):
"""Record a sale."""
if item_id not in inventory:
raise KeyError(f"Item {item_id} not found")
item = inventory[item_id]
if item["quantity"] < qty:
raise ValueError(f"Insufficient stock. Available: {item['quantity']}")
item["quantity"] -= qty
transactions.append((datetime.now().isoformat(), item_id, "SELL", qty, item["price"]))
_check_stock(item_id)
revenue = qty * item["price"]
print(f"Sold {qty}x {item['name']} — Revenue: ${revenue:.2f}")
def restock(item_id, qty):
"""Restock an item."""
if item_id not in inventory:
raise KeyError(f"Item {item_id} not found")
inventory[item_id]["quantity"] += qty
transactions.append((datetime.now().isoformat(), item_id, "RESTOCK", qty, 0))
low_stock_items.discard(item_id) # Remove from alerts if restocked
print(f"Restocked {inventory[item_id]['name']} by {qty}")
def _check_stock(item_id):
"""Check and update low stock alerts."""
if inventory[item_id]["quantity"] <= LOW_STOCK_THRESHOLD:
low_stock_items.add(item_id)
def get_summary():
"""Generate inventory summary."""
# Unique categories (set)
categories = {item["category"] for item in inventory.values()}
# Revenue by category (defaultdict)
rev_by_cat = defaultdict(float)
for t in transactions:
ts, item_id, action, qty, price = t
if action == "SELL":
category = inventory[item_id]["category"]
rev_by_cat[category] += qty * price
# Total inventory value
total_value = sum(item["quantity"] * item["price"] for item in inventory.values())
print("\n📊 INVENTORY SUMMARY")
print(f"Total items: {len(inventory)}")
print(f"Categories: {', '.join(categories)}")
print(f"Total value: ${total_value:,.2f}")
print(f"Transactions: {len(transactions)}")
print(f"Low stock: {len(low_stock_items)} items")
print("\n💰 Revenue by Category:")
for cat, rev in sorted(rev_by_cat.items(), key=lambda x: -x[1]):
print(f" {cat:<15} ${rev:,.2f}")
if low_stock_items:
print(f"\n⚠️ LOW STOCK ALERTS:")
for item_id in low_stock_items:
item = inventory[item_id]
print(f" {item['name']} — only {item['quantity']} left!")
# Demo
add_item("L001", "Laptop", 50, 999.99, "Electronics")
add_item("M001", "Mouse", 200, 29.99, "Accessories")
add_item("K001", "Keyboard", 100, 79.99, "Accessories")
add_item("M002", "Monitor", 30, 499.99, "Electronics")
sell("L001", 45) # Sells most laptops → low stock
sell("M001", 190) # Sells most mice → low stock
sell("K001", 50)
restock("L001", 100)
get_summary()
11. Edge Cases and Errors
Common Mistakes
# ❌ MISTAKE 1: Empty set vs empty dict
empty_set = set() # ✅ Correct
empty_set2 = {} # ❌ This is a DICT, not a set!
print(type(empty_set)) # <class 'set'>
print(type(empty_set2)) # <class 'dict'>
# ❌ MISTAKE 2: Single-element tuple
not_tuple = (42) # ❌ Just an int
real_tuple = (42,) # ✅ Need the trailing comma!
print(type(not_tuple)) # <class 'int'>
print(type(real_tuple)) # <class 'tuple'>
# ❌ MISTAKE 3: Modifying list while iterating
numbers = [1, 2, 3, 4, 5, 6]
for n in numbers:
if n % 2 == 0:
numbers.remove(n) # Skips elements!
print(numbers) # [1, 3, 5] — looks right but can be wrong!
# ✅ FIX: Filter instead
numbers = [n for n in numbers if n % 2 != 0]
# OR iterate a copy
for n in numbers[:]:
if n % 2 == 0:
numbers.remove(n)
# ❌ MISTAKE 4: Mutable default argument
def add_to(item, lst=[]): # WRONG — list persists between calls!
lst.append(item)
return lst
print(add_to(1)) # [1]
print(add_to(2)) # [1, 2] — BUG! Expected [2]
# ✅ FIX:
def add_to(item, lst=None):
if lst is None:
lst = []
lst.append(item)
return lst
# ❌ MISTAKE 5: dict.fromkeys() with mutable default
d = dict.fromkeys(["a", "b", "c"], [])
d["a"].append(1)
print(d) # {'a': [1], 'b': [1], 'c': [1]} — ALL lists are the SAME object!
# ✅ FIX: Use dict comprehension
d = {k: [] for k in ["a", "b", "c"]}
d["a"].append(1)
print(d) # {'a': [1], 'b': [], 'c': []} ✅
# ❌ MISTAKE 6: Forgetting that set is unordered
s = {3, 1, 4, 1, 5}
# s[0] # ❌ TypeError: 'set' object is not subscriptable
# To access by index, convert to sorted list first
s_list = sorted(s) # [1, 3, 4, 5]
print(s_list[0]) # 1 ✅
# ❌ MISTAKE 7: KeyError on dict access
config = {"host": "localhost"}
# port = config["port"] # ❌ KeyError: 'port'
port = config.get("port", 5432) # ✅ Returns default 5432
# ❌ MISTAKE 8: Assuming dict order in old Python
# In Python < 3.7, dicts were UNORDERED
# In Python 3.7+, dicts maintain INSERTION ORDER
# Don't rely on dict order if supporting old Python!
# ❌ MISTAKE 9: Shallow copy of nested structures
original = {"data": [1, 2, 3]}
copy1 = original.copy()
copy1["data"].append(999)
print(original) # {'data': [1, 2, 3, 999]} — modified!
import copy
original = {"data": [1, 2, 3]}
copy2 = copy.deepcopy(original)
copy2["data"].append(999)
print(original) # {'data': [1, 2, 3]} ✅
# ❌ MISTAKE 10: Using list when set is better for membership
banned = ["user1", "user2", "user3", "user4", "user5"] # List
if "user3" in banned: # O(n) — scans entire list
print("Banned")
# ✅ Use set for O(1) membership testing
banned = {"user1", "user2", "user3", "user4", "user5"} # Set
if "user3" in banned: # O(1) — instant hash lookup
print("Banned")
Error Handling
# Handle common errors gracefully
data = [1, 2, 3]
# IndexError — list
try:
item = data[10]
except IndexError as e:
print(f"Index out of range: {e}")
item = None
# KeyError — dict
config = {"host": "localhost"}
try:
port = config["port"]
except KeyError:
port = 5432 # Use default
# TypeError — unhashable type
try:
s = {[1, 2, 3]} # List as set element
except TypeError as e:
print(f"Can't add to set: {e}")
# ValueError — tuple unpacking
try:
a, b = (1, 2, 3) # Too many values
except ValueError as e:
print(f"Unpacking error: {e}")
# AttributeError — wrong method
try:
t = (1, 2, 3)
t.append(4)
except AttributeError as e:
print(f"Tuples don't support: {e}")
12. Pro Developer Insights
Best Practices
# 1. Choose the right structure from the start
# Bad: Using list when uniqueness matters
user_ids_bad = [101, 102, 101, 103] # Duplicate 101!
# Good: Use set
user_ids_good = {101, 102, 103}
# 2. Use dict.get() — never bare dict[key] for optional keys
settings = {"theme": "dark"}
# Bad
try:
font = settings["font"]
except KeyError:
font = "Arial"
# Good
font = settings.get("font", "Arial")
# 3. Use tuple for multiple return values
def parse_coords(string):
# Good — named tuple makes it self-documenting
from collections import namedtuple
Coords = namedtuple("Coords", ["lat", "lng"])
lat, lng = map(float, string.split(","))
return Coords(lat, lng)
coords = parse_coords("28.61, 77.21")
print(coords.lat) # 28.61 — much clearer than coords[0]
# 4. Prefer set operations over nested loops
# Bad: O(n*m) — nested loop
list1 = [1, 2, 3, 4, 5]
list2 = [3, 4, 5, 6, 7]
common_bad = [x for x in list1 if x in list2]
# Good: O(n+m) — set intersection
common_good = list(set(list1) & set(list2))
# 5. Use defaultdict to avoid KeyError in accumulation patterns
from collections import defaultdict
# Bad
word_groups = {}
for word in ["apple", "ant", "ball", "bear", "cat"]:
first = word[0]
if first not in word_groups:
word_groups[first] = []
word_groups[first].append(word)
# Good
word_groups = defaultdict(list)
for word in ["apple", "ant", "ball", "bear", "cat"]:
word_groups[word[0]].append(word)
# 6. Unpack tuples for readability
# Bad
def process(record):
print(record[0], record[1])
# Good
def process(record):
name, score = record
print(name, score)
# 7. Use frozenset when you need a set as a dict key
# Cache results indexed by a set of inputs
cache = {}
def process_features(features):
key = frozenset(features) # Hashable set!
if key not in cache:
cache[key] = sum(features) # Expensive computation
return cache[key]
print(process_features([1, 2, 3]))
print(process_features([3, 1, 2])) # Same result — order doesn't matter
13. Performance Comparison
Benchmark: Membership Testing
import timeit
# Build structures
data_list = list(range(100_000))
data_tuple = tuple(range(100_000))
data_set = set(range(100_000))
data_dict = {i: i for i in range(100_000)}
n = 10_000
# Membership: x in structure
list_time = timeit.timeit(lambda: 99_999 in data_list, number=n)
tuple_time = timeit.timeit(lambda: 99_999 in data_tuple, number=n)
set_time = timeit.timeit(lambda: 99_999 in data_set, number=n)
dict_time = timeit.timeit(lambda: 99_999 in data_dict, number=n)
print(f"List: {list_time:.4f}s") # ~0.2s — O(n)
print(f"Tuple: {tuple_time:.4f}s") # ~0.2s — O(n)
print(f"Set: {set_time:.6f}s") # ~0.00001s — O(1)
print(f"Dict: {dict_time:.6f}s") # ~0.00001s — O(1)
# Set and Dict are ~10,000x faster for membership!
Complexity Reference
Operation | List | Tuple | Set | Dict |
|---|---|---|---|---|
Access by index | O(1) | O(1) | N/A | N/A |
Access by key | N/A | N/A | N/A | O(1) avg |
Search ( | O(n) | O(n) | O(1) avg | O(1) avg |
Append/Add | O(1) | N/A | O(1) avg | O(1) avg |
Insert at index | O(n) | N/A | N/A | N/A |
Delete by index | O(n) | N/A | N/A | N/A |
Delete by key | N/A | N/A | O(1) avg | O(1) avg |
Iteration | O(n) | O(n) | O(n) | O(n) |
Sort | O(n log n) | N/A | N/A | N/A |
Memory (relative) | Medium | Low | Medium | High |
When to Use Which
Situation | Best Choice | Why |
|---|---|---|
Ordered sequence of items | List | Mutable, indexed |
Fixed coordinates / DB row | Tuple | Immutable, hashable |
Unique tags / IDs | Set | Auto-deduplication |
Config / JSON-like data | Dict | Named access |
Fast membership testing | Set or Dict | O(1) lookup |
Function multiple returns | Tuple | Unpacking |
Counting occurrences | Counter (Dict) | Specialized |
Group items by category | defaultdict(list) | Auto-init |
Immutable dict key | Tuple or frozenset | Hashable |
LRU Cache | Dict | O(1) access |
14. Data Science Perspective
Working with DataFrames (Pandas)
import pandas as pd
from collections import Counter
# Dict → DataFrame (most common pattern in Data Science)
data = {
"name": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
"age": [25, 30, 35, 28, 32],
"salary": [70000, 85000, 95000, 78000, 90000],
"dept": ["Engineering", "Marketing", "Engineering", "HR", "Engineering"],
"skills": [
["Python", "SQL", "ML"],
["Excel", "PowerPoint", "SQL"],
["Python", "Docker", "K8s"],
["Excel", "Communication"],
["Python", "TensorFlow", "SQL"]
]
}
df = pd.DataFrame(data)
# Set operations on DataFrame columns
engineering = set(df[df["dept"] == "Engineering"]["name"])
high_earners = set(df[df["salary"] > 80000]["name"])
print(engineering & high_earners) # Engineers who earn > 80k
# Find all unique skills
all_skills = set()
for skill_list in df["skills"]:
all_skills.update(skill_list)
print(f"All skills: {all_skills}")
# Count skill frequency
skill_counter = Counter()
for skill_list in df["skills"]:
skill_counter.update(skill_list)
print(skill_counter.most_common(3)) # Top 3 skills
# Dict comprehension for salary lookup
salary_lookup = dict(zip(df["name"], df["salary"]))
print(salary_lookup["Alice"]) # 70000
# Group by using defaultdict
from collections import defaultdict
dept_salaries = defaultdict(list)
for _, row in df.iterrows():
dept_salaries[row["dept"]].append(row["salary"])
# Average salary per department
dept_avg = {dept: sum(salaries)/len(salaries)
for dept, salaries in dept_salaries.items()}
print(dept_avg)
Data Cleaning Patterns
# Typical data cleaning using dict, set, list, tuple
raw_records = [
("Alice", "25", "alice@email.com", "Engineering"),
("Bob", "abc", "bob@email.com", "Marketing"), # Invalid age
("Charlie","30", None, "Engineering"), # Missing email
("Alice", "25", "alice@email.com", "Engineering"), # Duplicate
("Diana", "28", "diana@email.com", "HR"),
]
def clean_records(records):
seen = set() # Track duplicates
cleaned = [] # Clean results
errors = [] # Error log
for record in records:
name, age_str, email, dept = record
# Deduplication
fingerprint = (name, age_str, email or "")
if fingerprint in seen:
errors.append({"type": "duplicate", "record": record})
continue
seen.add(fingerprint)
# Validation
try:
age = int(age_str)
except (ValueError, TypeError):
errors.append({"type": "invalid_age", "record": record})
continue
if email is None:
errors.append({"type": "missing_email", "record": record})
continue
# Build clean record (tuple — immutable, memory-efficient)
cleaned.append({
"name": name.strip().title(),
"age": age,
"email": email.lower().strip(),
"dept": dept
})
return cleaned, errors
clean, errors = clean_records(raw_records)
print(f"Clean records: {len(clean)}")
print(f"Errors: {len(errors)}")
for err in errors:
print(f" [{err['type']}] {err['record']}")
15. Interview Questions
Basic Level
Q1: What is the difference between a List and a Tuple?
Both are ordered sequences, but a list is mutable (can be changed) while a tuple is immutable (cannot be changed). Tuples are faster, use less memory, and can be used as dictionary keys.
Q2: How do you remove duplicates from a list?
lst = [1, 2, 2, 3, 3, 4]
# Method 1: Set (doesn't preserve order)
unique = list(set(lst))
# Method 2: Preserve order (Python 3.7+)
unique = list(dict.fromkeys(lst))
# Method 3: Preserve order (any Python 3)
seen = set()
unique = [x for x in lst if not (x in seen or seen.add(x))]
Q3: What is the difference between remove(), pop(), and del in a list?
remove(x)removes the first occurrence of value x.pop(i)removes and returns the item at index i (defaults to last).del lst[i]deletes by index but returns nothing.
Q4: How do you create an empty set?
s = set() # ✅ Correct
s = {} # ❌ This creates an empty DICT!
Q5: What are dict keys required to be?
Dict keys must be hashable — immutable types like
str,int,float,bool,tuple. Lists, dicts, and sets cannot be keys because they are mutable.
Intermediate Level
Q6: What is the difference between dict.get() and dict[key]?
dict[key]raises aKeyErrorif the key doesn't exist.dict.get(key, default)returns the default value (orNone) if the key is missing — no exception.
Q7: How do you merge two dictionaries in Python 3.9+?
d1 = {"a": 1, "b": 2}
d2 = {"c": 3, "d": 4}
merged = d1 | d2 # New dict
d1 |= d2 # In-place merge
merged = {**d1, **d2} # Python 3.5+ unpacking
Q8: What is the difference between set.remove() and set.discard()?
remove(x)raises aKeyErrorif x is not in the set.discard(x)silently does nothing if x is missing. Usediscard()when absence is not an error.
Q9: When would you use namedtuple over a regular tuple?
When you want named access instead of index-based access.
namedtuplemakes code more readable (point.xvspoint[0]) while keeping tuple immutability and performance.
Q10: How is a set implemented internally in Python?
A set is implemented as a hash table (similar to dict but only stores keys). This gives O(1) average-case lookup, insert, and delete. Elements must be hashable.
Advanced Level
Q11: What is the time complexity difference between searching in a list vs a set?
List: O(n) — must scan every element. Set: O(1) average — uses hash table lookup. For 1 million elements, set lookup is ~1,000,000x faster than list.
Q12: What is a frozenset and when would you use it?
A
frozensetis an immutable set. Since it's hashable, it can be used as a dictionary key or as a set element. Useful when you need set operations but the set itself shouldn't change.
# Use case: Cache keyed by a set of features
cache = {}
def get_result(features):
key = frozenset(features) # Hashable!
if key not in cache:
cache[key] = expensive_compute(features)
return cache[key]
Q13: Explain the memory layout difference between list and tuple.
Both store references to objects. But tuple is allocated as a fixed block of memory (no overhead for resizing), while list maintains extra capacity for future appends (over-allocates). Tuple is ~15-20% smaller than an equivalent list.
Q14: How does Python's dict maintain insertion order?
Since Python 3.7, dictionaries are implemented as compact hash tables that store keys in insertion order by maintaining a separate array of indices. The hash table maps hash values to this index array, preserving insertion order while keeping O(1) lookup.
Scenario-Based
Q15: You have a list of 10 million user IDs. You need to check if a given ID is blocked. Which structure do you use?
# Use SET — O(1) lookup instead of O(n) for list
blocked_ids = set(load_blocked_ids_from_db()) # Convert once
def is_blocked(user_id):
return user_id in blocked_ids # O(1) — instant!
Q16: How would you group a list of words by their first letter?
from collections import defaultdict
words = ["apple", "ant", "ball", "bear", "cat", "cobra"]
groups = defaultdict(list)
for word in words:
groups[word[0]].append(word)
print(dict(groups))
# {'a': ['apple', 'ant'], 'b': ['ball', 'bear'], 'c': ['cat', 'cobra']}
Q17: Find all words that appear in both documents.
doc1 = "the quick brown fox jumps over the lazy dog"
doc2 = "a fox and a dog are quick and brown"
words1 = set(doc1.split())
words2 = set(doc2.split())
common = words1 & words2
print(common) # {'fox', 'quick', 'brown', 'dog'}
Q18: How would you implement a simple LRU Cache using a dictionary?
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = OrderedDict()
def get(self, key):
if key not in self.cache:
return -1
self.cache.move_to_end(key) # Mark as recently used
return self.cache[key]
def put(self, key, value):
self.cache[key] = value
self.cache.move_to_end(key)
if len(self.cache) > self.capacity:
self.cache.popitem(last=False) # Remove least recently used
cache = LRUCache(3)
cache.put(1, "one")
cache.put(2, "two")
cache.put(3, "three")
print(cache.get(1)) # "one" — moves to end
cache.put(4, "four") # Removes 2 (least recently used)
print(cache.get(2)) # -1 — evicted
16. Conclusion
Summary of All Four
┌─────────────────────────────────────────────────────────────┐
│ CHOOSING YOUR DATA STRUCTURE │
├─────────────┬───────────────────────────────────────────────┤
│ LIST │ Use when order matters and you need to │
│ [1,2,3] │ add/remove/modify elements. The workhorse. │
├─────────────┼───────────────────────────────────────────────┤
│ TUPLE │ Use when data shouldn't change. Function │
│ (1,2,3) │ returns, DB rows, coordinates, dict keys. │
├─────────────┼───────────────────────────────────────────────┤
│ SET │ Use when uniqueness matters or you need │
│ {1,2,3} │ fast membership testing / set operations. │
├─────────────┼───────────────────────────────────────────────┤
│ DICT │ Use when you need to label data or look up │
│ {"k":"v"} │ values by a meaningful key. JSON, configs. │
└─────────────┴───────────────────────────────────────────────┘
Key Takeaways
List — ordered, mutable, allows duplicates, indexed — your default go-to
Tuple — ordered, immutable, allows duplicates, indexed — for fixed data
Set — unordered, mutable, unique only, no indexing — for fast membership & math ops
Dict — ordered (3.7+), mutable, unique keys, keyed access — for labeled data
Final Rules to Live By
Need to add/remove items? → List (or Dict if labeled)
Need uniqueness? → Set
Need instant lookup by key? → Dict or Set
Data shouldn't change? → Tuple or frozenset
Working with key-value data (JSON, configs)? → Dict
Returning multiple values from a function? → Tuple
Removing duplicates from a list? → Convert to Set
Counting occurrences? →
collections.CounterGrouping by category? →
collections.defaultdict(list)Need set as dict key? → frozenset
"Use the right tool for the right job — and Python gives you four excellent tools."
Happy Coding! 🐍
Written by an experienced Python Developer and Data Scientist. Drop a comment below for questions or feedback!