Back to all posts

Python List, Tuple, Set & Dictionary: The Ultimate Complete Guide

Master Python's four core data structures — List, Tuple, Set, and Dictionary — with real examples, methods, performance tips, and interview questions.

1. Introduction

Every real-world program needs to store and manage collections of data. Python gives you four powerful, built-in data structures for this:

Structure

Think of it as...

Real-world example

List

A shopping cart — ordered and changeable

["milk", "eggs", "bread"]

Tuple

GPS coordinates — fixed, read-only

(28.6139, 77.2090)

Set

A bag of unique coins — no duplicates

{"USD", "EUR", "INR"}

Dictionary

A contact book — name maps to number

{"Alice": "9876543210"}

Why does this matter?

Choosing the wrong data structure can make your code:

  • Slower — searching a list vs a set is O(n) vs O(1)

  • Buggier — allowing duplicates when you shouldn't

  • Harder to read — using data[0] instead of data["name"]

By the end of this guide you'll know exactly which one to use, when, and why.


2. Quick Comparison Overview

Feature

List

Tuple

Set

Dictionary

Syntax

[1, 2, 3]

(1, 2, 3)

{1, 2, 3}

{"k": "v"}

Ordered

✅ Yes

✅ Yes

❌ No

✅ Yes (Python 3.7+)

Mutable

✅ Yes

❌ No

✅ Yes

✅ Yes

Allows Duplicates

✅ Yes

✅ Yes

❌ No

❌ No (keys)

Indexed Access

✅ Yes

✅ Yes

❌ No

✅ By key

Hashable (usable as dict key)

❌ No

✅ Yes

❌ No

❌ No

Lookup Speed

O(n)

O(n)

O(1)

O(1)

Memory

Medium

Low

Medium

High

Use Case

General collection

Fixed data

Unique items

Key-value mapping


3. LIST — Complete Deep Dive

A list is an ordered, mutable collection that can hold any data type — even mixed types. It is the most versatile and commonly used data structure in Python.

3.1 Creating Lists

Python
# Empty list
empty = []
empty = list()

# List with values
numbers = [1, 2, 3, 4, 5]
names   = ["Alice", "Bob", "Charlie"]
mixed   = [1, "hello", 3.14, True, None]   # Mixed types — Python allows this

# Nested list (2D list / matrix)
matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

# List from other iterables
from_string  = list("Python")         # ['P', 'y', 't', 'h', 'o', 'n']
from_range   = list(range(1, 6))      # [1, 2, 3, 4, 5]
from_tuple   = list((1, 2, 3))        # [1, 2, 3]
from_set     = list({3, 1, 2})        # [1, 2, 3] — order not guaranteed

# List repetition
zeros   = [0] * 5                     # [0, 0, 0, 0, 0]
pattern = [1, 2] * 3                  # [1, 2, 1, 2, 1, 2]

# List comprehension (most Pythonic)
squares = [x**2 for x in range(1, 6)] # [1, 4, 9, 16, 25]

print(type(numbers))   # <class 'list'>
print(len(numbers))    # 5

3.2 Indexing and Slicing

Python
fruits = ["apple", "banana", "cherry", "date", "elderberry"]
#           0         1          2        3          4         ← Positive index
#          -5        -4         -3       -2         -1         ← Negative index

# Indexing — access single element
print(fruits[0])     # apple      — first element
print(fruits[2])     # cherry
print(fruits[-1])    # elderberry — last element
print(fruits[-2])    # date       — second to last

# Slicing — a[start:stop:step]  (stop is EXCLUSIVE)
print(fruits[1:3])   # ['banana', 'cherry'] — index 1 and 2
print(fruits[:3])    # ['apple', 'banana', 'cherry'] — first 3
print(fruits[2:])    # ['cherry', 'date', 'elderberry'] — from index 2 onward
print(fruits[::2])   # ['apple', 'cherry', 'elderberry'] — every 2nd
print(fruits[::-1])  # ['elderberry', 'date', 'cherry', 'banana', 'apple'] — reversed
print(fruits[1:4:2]) # ['banana', 'date'] — from 1 to 3, step 2

# Nested list access
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(matrix[0])        # [1, 2, 3]
print(matrix[1][2])     # 6 — row 1, column 2
print(matrix[-1][-1])   # 9 — last row, last column

3.3 Adding Elements

Python
lst = [1, 2, 3]

# append() — add ONE item to the END — O(1)
lst.append(4)
print(lst)          # [1, 2, 3, 4]
lst.append([5, 6])  # Adds the LIST as a single element!
print(lst)          # [1, 2, 3, 4, [5, 6]]

# insert(index, item) — insert at specific position — O(n)
lst = [1, 2, 3, 4]
lst.insert(2, 99)   # Insert 99 at index 2
print(lst)          # [1, 2, 99, 3, 4]
lst.insert(0, 0)    # Insert at beginning
print(lst)          # [0, 1, 2, 99, 3, 4]
lst.insert(100, 5)  # Index beyond length → appends to end
print(lst)          # [0, 1, 2, 99, 3, 4, 5]

# extend(iterable) — add ALL items from another iterable — O(k)
lst = [1, 2, 3]
lst.extend([4, 5, 6])   # Adds each element individually
print(lst)              # [1, 2, 3, 4, 5, 6]
lst.extend("abc")       # Strings are iterable!
print(lst)              # [1, 2, 3, 4, 5, 6, 'a', 'b', 'c']
lst.extend(range(3))    # Works with any iterable
print(lst)              # [1, 2, 3, 4, 5, 6, 'a', 'b', 'c', 0, 1, 2]

# + operator — concatenation — creates NEW list
a = [1, 2, 3]
b = [4, 5, 6]
c = a + b
print(c)    # [1, 2, 3, 4, 5, 6]
print(a)    # [1, 2, 3] — unchanged

3.4 Removing Elements

Python
lst = [10, 20, 30, 20, 40, 50]

# remove(value) — removes FIRST occurrence of value — O(n)
lst.remove(20)
print(lst)          # [10, 30, 20, 40, 50]
# lst.remove(99)    # ValueError if not found!

# pop() — removes and RETURNS last item — O(1)
last = lst.pop()
print(last)         # 50
print(lst)          # [10, 30, 20, 40]

# pop(index) — removes and RETURNS item at index — O(n)
item = lst.pop(1)
print(item)         # 30
print(lst)          # [10, 20, 40]

# del — delete by index or slice — O(n)
lst = [10, 20, 30, 40, 50]
del lst[2]
print(lst)          # [10, 20, 40, 50]
del lst[1:3]        # Delete slice
print(lst)          # [10, 50]
del lst             # Delete the entire list!

# clear() — remove ALL elements — O(1)
lst = [1, 2, 3, 4, 5]
lst.clear()
print(lst)          # []

3.5 Searching and Counting

Python
lst = [10, 20, 30, 20, 40, 20, 50]

# in / not in — membership check — O(n)
print(20 in lst)        # True
print(99 in lst)        # False
print(99 not in lst)    # True

# index(value) — returns index of FIRST occurrence — O(n)
print(lst.index(20))    # 1
print(lst.index(20, 2)) # 3 — search from index 2 onward
print(lst.index(20, 2, 5)) # 3 — search between index 2 and 4
# lst.index(99)         # ValueError if not found!

# count(value) — count occurrences — O(n)
print(lst.count(20))    # 3
print(lst.count(99))    # 0 — no error, returns 0

3.6 Sorting and Reversing

Python
lst = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]

# sort() — in-place sort — O(n log n)
lst.sort()
print(lst)                        # [1, 1, 2, 3, 3, 4, 5, 5, 6, 9]

lst.sort(reverse=True)
print(lst)                        # [9, 6, 5, 5, 4, 3, 3, 2, 1, 1]

# sorted() — returns NEW sorted list, original unchanged
original = [3, 1, 4, 1, 5]
new_list = sorted(original)
print(original)    # [3, 1, 4, 1, 5] — unchanged
print(new_list)    # [1, 1, 3, 4, 5]

# Sort with custom key
words = ["banana", "Apple", "cherry", "Date"]
words.sort()                           # ['Apple', 'Date', 'banana', 'cherry'] — uppercase first
words.sort(key=str.lower)             # ['Apple', 'banana', 'cherry', 'Date'] — case-insensitive
words.sort(key=len)                   # Sort by length
print(words)                           # ['Date', 'Apple', 'banana', 'cherry']

# Sort list of dicts
students = [
    {"name": "Alice", "score": 92},
    {"name": "Bob",   "score": 87},
    {"name": "Charlie", "score": 95}
]
students.sort(key=lambda s: s["score"], reverse=True)
for s in students:
    print(f"{s['name']}: {s['score']}")
# Charlie: 95 / Alice: 92 / Bob: 87

# reverse() — in-place reversal — O(n)
lst = [1, 2, 3, 4, 5]
lst.reverse()
print(lst)          # [5, 4, 3, 2, 1]

# Reversed without modifying
print(lst[::-1])    # [1, 2, 3, 4, 5]
print(list(reversed(lst)))  # [1, 2, 3, 4, 5]

3.7 Copying Lists

Python
import copy

original = [1, [2, 3], [4, 5]]

# Assignment — NOT a copy, just another reference!
ref = original
ref[0] = 99
print(original)   # [99, [2, 3], [4, 5]] — BOTH changed!

# Shallow copy — copies top level only
original = [1, [2, 3], [4, 5]]
shallow1 = original.copy()
shallow2 = original[:]
shallow3 = list(original)

shallow1[0] = 99          # Changes only shallow1
print(original)            # [1, [2, 3], [4, 5]] — safe

shallow1[1][0] = 999      # Modifies nested list — SHARED reference!
print(original)            # [1, [999, 3], [4, 5]] — oops!

# Deep copy — copies everything recursively
original = [1, [2, 3], [4, 5]]
deep = copy.deepcopy(original)
deep[1][0] = 999
print(original)            # [1, [2, 3], [4, 5]] — completely safe ✅

3.8 Other Useful List Operations

Python
lst = [3, 1, 4, 1, 5, 9]

# Aggregate functions
print(len(lst))       # 6    — number of elements
print(sum(lst))       # 23   — sum of all elements
print(min(lst))       # 1    — minimum value
print(max(lst))       # 9    — maximum value

# zip() — pair elements from multiple lists
names  = ["Alice", "Bob", "Charlie"]
scores = [92, 87, 95]
grades = ["A", "B", "A"]

for name, score, grade in zip(names, scores, grades):
    print(f"{name}: {score} ({grade})")

# enumerate() — index + value together
for i, name in enumerate(names, start=1):
    print(f"{i}. {name}")
# 1. Alice  /  2. Bob  /  3. Charlie

# any() and all()
numbers = [2, 4, 6, 8, 10]
print(any(n > 9 for n in numbers))    # True  — at least one > 9
print(all(n % 2 == 0 for n in numbers)) # True — all are even

# List flattening
nested = [[1, 2], [3, 4], [5, 6]]
flat = [item for sublist in nested for item in sublist]
print(flat)   # [1, 2, 3, 4, 5, 6]

# Remove duplicates while preserving order
seen = set()
unique = [x for x in [3, 1, 4, 1, 5, 9, 2, 6, 5] if not (x in seen or seen.add(x))]
print(unique)   # [3, 1, 4, 5, 9, 2, 6]

3.9 Complete List Methods Reference

Method

Description

Returns

Complexity

append(x)

Add x to end

None

O(1)

insert(i, x)

Insert x at index i

None

O(n)

extend(iterable)

Add all items from iterable

None

O(k)

remove(x)

Remove first x

None

O(n)

pop(i=-1)

Remove & return item at i

item

O(1)/O(n)

clear()

Remove all items

None

O(1)

index(x, start, end)

First index of x

int

O(n)

count(x)

Count of x

int

O(n)

sort(key, reverse)

In-place sort

None

O(n log n)

reverse()

In-place reverse

None

O(n)

copy()

Shallow copy

list

O(n)


4. TUPLE — Complete Deep Dive

A tuple is an ordered, immutable collection. Once created, it cannot be changed. Think of tuples as "read-only lists" that are faster and safer.

4.1 Creating Tuples

Python
# Empty tuple
empty = ()
empty = tuple()

# Single-element tuple — THE COMMA IS CRITICAL!
single = (42,)          # ✅ This is a tuple
not_tuple = (42)        # ❌ This is just an int!
print(type(single))     # <class 'tuple'>
print(type(not_tuple))  # <class 'int'>

# Multiple elements
coords  = (10.5, 20.3)
rgb     = (255, 128, 0)
person  = ("Alice", 30, "NYC")
mixed   = (1, "hello", 3.14, True, None)

# Without parentheses — tuple packing
point = 3, 4            # Still a tuple!
print(type(point))      # <class 'tuple'>

# From iterable
from_list  = tuple([1, 2, 3])      # (1, 2, 3)
from_str   = tuple("Python")       # ('P', 'y', 't', 'h', 'o', 'n')
from_range = tuple(range(1, 6))    # (1, 2, 3, 4, 5)

# Nested tuple
nested = ((1, 2), (3, 4), (5, 6))

print(type(coords))     # <class 'tuple'>
print(len(person))      # 3

4.2 Accessing Tuple Elements

Python
person = ("Alice", 30, "New York", "Engineer")

# Indexing — same as list
print(person[0])       # Alice
print(person[-1])      # Engineer
print(person[1:3])     # (30, 'New York')
print(person[::-1])    # ('Engineer', 'New York', 30, 'Alice')

# Nested access
matrix = ((1, 2, 3), (4, 5, 6), (7, 8, 9))
print(matrix[1])       # (4, 5, 6)
print(matrix[1][2])    # 6

4.3 Tuple Unpacking — The Superpower

Python
# Basic unpacking — number of variables must match
coords = (10, 20)
x, y = coords
print(x, y)     # 10  20

person = ("Alice", 30, "NYC")
name, age, city = person
print(name, age, city)   # Alice  30  NYC

# Swap using tuple unpacking
a, b = 10, 20
a, b = b, a
print(a, b)     # 20  10

# Extended unpacking with * (star expression)
numbers = (1, 2, 3, 4, 5)

first, *rest = numbers
print(first)    # 1
print(rest)     # [2, 3, 4, 5]

*start, last = numbers
print(start)    # [1, 2, 3, 4]
print(last)     # 5

first, *middle, last = numbers
print(first)    # 1
print(middle)   # [2, 3, 4]
print(last)     # 5

# Ignore values with _
name, _, city = ("Alice", 30, "NYC")   # Ignore age
print(name, city)   # Alice  NYC

_, second, *_ = (1, 2, 3, 4, 5)
print(second)       # 2

# Nested unpacking
data = ((1, 2), (3, 4))
(a, b), (c, d) = data
print(a, b, c, d)   # 1  2  3  4

# In for loops
points = [(1, 2), (3, 4), (5, 6)]
for x, y in points:
    distance = (x**2 + y**2) ** 0.5
    print(f"({x},{y}) → distance: {distance:.2f}")

4.4 Tuple Methods

Tuples only have two methods because they're immutable:

Python
t = (10, 20, 30, 20, 40, 20)

# count(value) — count occurrences
print(t.count(20))     # 3
print(t.count(99))     # 0

# index(value) — first index of value
print(t.index(20))     # 1
print(t.index(20, 2))  # 3 — search from index 2
# t.index(99)          # ValueError if not found!

# Built-in functions work on tuples
print(len(t))          # 6
print(sum(t))          # 140
print(min(t))          # 10
print(max(t))          # 40
print(sorted(t))       # [10, 20, 20, 20, 30, 40] — returns a LIST

4.5 Immutability — What It Means

Python
t = (1, 2, 3)

# You CANNOT modify a tuple
# t[0] = 99     # ❌ TypeError: 'tuple' object does not support item assignment
# t.append(4)   # ❌ AttributeError: 'tuple' object has no attribute 'append'
# del t[0]      # ❌ TypeError

# BUT if the tuple contains mutable objects (like a list), those CAN be changed
mixed = (1, [2, 3], "hello")
mixed[1].append(4)        # Modifies the list INSIDE the tuple
print(mixed)               # (1, [2, 3, 4], 'hello') — tuple itself unchanged
mixed[1][0] = 99
print(mixed)               # (1, [99, 3, 4], 'hello')

# "Modify" a tuple by creating a new one
t = (1, 2, 3)
t = t + (4, 5)             # Creates a NEW tuple
print(t)   # (1, 2, 3, 4, 5)
t = t[:2] + (99,) + t[3:]  # "Replace" index 2 with 99
print(t)   # (1, 2, 99, 4, 5)

4.6 Why Use Tuples? Key Advantages

Python
# 1. Tuples are FASTER than lists
import timeit
list_time  = timeit.timeit("[1, 2, 3, 4, 5]", number=10_000_000)
tuple_time = timeit.timeit("(1, 2, 3, 4, 5)", number=10_000_000)
print(f"List:  {list_time:.3f}s")
print(f"Tuple: {tuple_time:.3f}s")  # Consistently faster

# 2. Tuples use LESS MEMORY
import sys
lst = [1, 2, 3, 4, 5]
tpl = (1, 2, 3, 4, 5)
print(sys.getsizeof(lst))   # 88 bytes
print(sys.getsizeof(tpl))   # 72 bytes

# 3. Tuples can be DICT KEYS (lists cannot)
locations = {
    (28.6139, 77.2090): "New Delhi",
    (19.0760, 72.8777): "Mumbai",
    (13.0827, 80.2707): "Chennai"
}
print(locations[(28.6139, 77.2090)])   # New Delhi
# {[1,2]: "a"}  # ❌ TypeError — list is unhashable

# 4. Tuples as function return values (multiple returns)
def get_stats(numbers):
    return min(numbers), max(numbers), sum(numbers) / len(numbers)

minimum, maximum, average = get_stats([3, 1, 4, 1, 5, 9])
print(f"Min={minimum}, Max={maximum}, Avg={average:.2f}")

# 5. Named Tuples — best of both worlds
from collections import namedtuple

# Define the structure
Employee = namedtuple("Employee", ["name", "department", "salary"])
Point3D  = namedtuple("Point3D",  ["x", "y", "z"])

# Create instances
emp = Employee("Alice", "Engineering", 95000)
p   = Point3D(1.0, 2.5, -3.0)

# Access by name OR index
print(emp.name)          # Alice
print(emp[0])            # Alice
print(emp.salary)        # 95000
print(p.x, p.y, p.z)    # 1.0  2.5  -3.0

# Still immutable — behaves like a tuple
print(emp._asdict())     # OrderedDict([('name', 'Alice'), ...])
emp2 = emp._replace(salary=100000)  # Creates new namedtuple
print(emp2)              # Employee(name='Alice', department='Engineering', salary=100000)
print(emp)               # Unchanged

# Python 3.6+ dataclass (modern alternative to namedtuple)
from dataclasses import dataclass

@dataclass(frozen=True)   # frozen=True makes it immutable like a tuple
class Config:
    host: str
    port: int
    debug: bool = False

cfg = Config("localhost", 5432)
print(cfg.host)    # localhost
print(cfg)         # Config(host='localhost', port=5432, debug=False)

5. SET — Complete Deep Dive

A set is an unordered collection of unique elements. Sets are perfect when you need to eliminate duplicates, test membership quickly, or perform mathematical set operations.

5.1 Creating Sets

Python
# Empty set — MUST use set(), not {} (that creates a dict!)
empty = set()
print(type(empty))      # <class 'set'>
print(type({}))         # <class 'dict'> ← Be careful!

# Set with values
fruits  = {"apple", "banana", "cherry"}
numbers = {1, 2, 3, 4, 5}
mixed   = {1, "hello", 3.14, True}   # Allows mixed types

# Duplicates are automatically removed!
s = {1, 2, 2, 3, 3, 3, 4}
print(s)    # {1, 2, 3, 4} — only unique values

# From other iterables
from_list  = set([1, 2, 2, 3, 3])          # {1, 2, 3}
from_str   = set("Mississippi")             # {'M', 'i', 's', 'p'} — unique chars
from_tuple = set((1, 2, 2, 3))             # {1, 2, 3}

# frozenset — immutable set (can be used as dict key)
frozen = frozenset([1, 2, 3, 4])
print(type(frozen))     # <class 'frozenset'>
# frozen.add(5)         # ❌ AttributeError — frozenset is immutable

# Set comprehension
even_squares = {x**2 for x in range(1, 11) if x % 2 == 0}
print(even_squares)    # {4, 16, 36, 64, 100}

print(type(fruits))    # <class 'set'>

5.2 Adding and Removing Elements

Python
s = {1, 2, 3}

# add(element) — add ONE element — O(1)
s.add(4)
print(s)    # {1, 2, 3, 4}
s.add(2)    # Adding duplicate — silently ignored
print(s)    # {1, 2, 3, 4} — unchanged

# update(iterable) — add multiple elements — O(k)
s.update([5, 6, 7])
print(s)    # {1, 2, 3, 4, 5, 6, 7}
s.update("ab", {8, 9})   # Can accept multiple iterables
print(s)    # {1, 2, 3, 4, 5, 6, 7, 8, 9, 'a', 'b'}

s = {1, 2, 3, 4, 5}

# remove(element) — remove element, raises KeyError if not found — O(1)
s.remove(3)
print(s)      # {1, 2, 4, 5}
# s.remove(99)  # ❌ KeyError: 99

# discard(element) — remove element, NO error if not found — O(1)
s.discard(4)
print(s)      # {1, 2, 5}
s.discard(99) # ✅ No error
print(s)      # {1, 2, 5}

# pop() — remove and return ARBITRARY element (sets are unordered) — O(1)
popped = s.pop()
print(popped)   # Some element (unpredictable!)
print(s)

# clear() — remove all elements — O(1)
s = {1, 2, 3}
s.clear()
print(s)    # set()

5.3 Set Operations — Mathematical Power

Python
A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}

# Union — all elements from both sets
print(A | B)             # {1, 2, 3, 4, 5, 6, 7}
print(A.union(B))        # Same result
print(A.union(B, {8, 9})) # Can chain multiple sets

# Intersection — elements in BOTH sets
print(A & B)                # {3, 4, 5}
print(A.intersection(B))    # Same result

# Difference — elements in A but NOT in B
print(A - B)                # {1, 2}
print(A.difference(B))      # Same result
print(B - A)                # {6, 7}

# Symmetric Difference — elements in EITHER but NOT BOTH
print(A ^ B)                             # {1, 2, 6, 7}
print(A.symmetric_difference(B))         # Same result

# In-place versions (modify the set directly)
A |= B         # A = A | B (union update)
A &= B         # A = A & B (intersection update)
A -= B         # A = A - B (difference update)
A ^= B         # A = A ^ B (symmetric difference update)

# Subset and Superset checks
A = {1, 2, 3}
B = {1, 2, 3, 4, 5}

print(A.issubset(B))      # True  — A ⊆ B (A is subset of B)
print(A <= B)             # True  — same as issubset
print(A < B)              # True  — strict subset (A ⊂ B, A ≠ B)
print(B.issuperset(A))    # True  — B ⊇ A
print(B >= A)             # True  — same as issuperset
print(B > A)              # True  — strict superset

print(A <= A)             # True  — every set is a subset of itself
print(A < A)              # False — not a STRICT subset

# Disjoint — do sets share ANY elements?
X = {1, 2, 3}
Y = {4, 5, 6}
Z = {3, 4, 5}
print(X.isdisjoint(Y))    # True  — no common elements
print(X.isdisjoint(Z))    # False — 3 is common

5.4 Set Methods Reference

Python
s = {3, 1, 4, 1, 5, 9, 2, 6}

# Viewing
print(len(s))         # 7 (unique count)
print(3 in s)         # True
print(99 not in s)    # True
print(min(s))         # 1
print(max(s))         # 9
print(sum(s))         # 30
print(sorted(s))      # [1, 2, 3, 4, 5, 6, 9] — returns a LIST

# Iteration (order is NOT guaranteed)
for item in s:
    print(item)   # Some order — don't rely on it

# Convert to sorted list when you need order
sorted_items = sorted(s)

Method

Description

Returns

add(x)

Add element x

None

update(*others)

Add all from iterables

None

remove(x)

Remove x, KeyError if missing

None

discard(x)

Remove x, no error if missing

None

pop()

Remove/return arbitrary element

element

clear()

Remove all elements

None

union(*others)

All elements from sets

new set

intersection(*others)

Common elements

new set

difference(*others)

Elements not in others

new set

symmetric_difference(other)

In either, not both

new set

issubset(other)

Is self ⊆ other?

bool

issuperset(other)

Is self ⊇ other?

bool

isdisjoint(other)

No common elements?

bool

copy()

Shallow copy

new set

frozenset(s)

Immutable version

frozenset

5.5 Key Set Use Cases

Python
# Use Case 1: Remove duplicates from a list
emails = ["a@test.com", "b@test.com", "a@test.com", "c@test.com", "b@test.com"]
unique_emails = list(set(emails))
print(unique_emails)   # ['a@test.com', 'b@test.com', 'c@test.com'] — order not guaranteed

# Preserve order while deduplicating
seen = set()
unique_ordered = [e for e in emails if not (e in seen or seen.add(e))]
print(unique_ordered)  # ['a@test.com', 'b@test.com', 'c@test.com']

# Use Case 2: Fast membership testing
banned_users = {"spammer123", "bot456", "fake789"}
user = "spammer123"
if user in banned_users:    # O(1) — instant, regardless of set size!
    print("Access denied!")

# Use Case 3: Find common interests (intersection)
alice_skills = {"Python", "SQL", "Machine Learning", "Docker"}
bob_skills   = {"JavaScript", "Python", "Docker", "Kubernetes"}
common = alice_skills & bob_skills
print(f"Common skills: {common}")   # {'Python', 'Docker'}

# Use Case 4: Find unique words in text
text = "the quick brown fox jumps over the lazy dog the fox"
unique_words = set(text.split())
print(f"Unique words: {len(unique_words)}")   # 8

# Use Case 5: Missing elements
required_cols = {"name", "email", "age", "salary"}
available_cols = {"name", "email", "phone"}
missing = required_cols - available_cols
print(f"Missing: {missing}")   # {'age', 'salary'}

6. DICTIONARY — Complete Deep Dive

A dictionary is an ordered (Python 3.7+), mutable collection of key-value pairs. Keys must be unique and hashable. Think of it as a lookup table — you get a value instantly by its key.

6.1 Creating Dictionaries

Python
# Empty dict
empty = {}
empty = dict()

# Basic dict
person = {"name": "Alice", "age": 30, "city": "NYC"}

# Different ways to create
# 1. From keyword arguments
d1 = dict(name="Alice", age=30, city="NYC")

# 2. From list of tuples
d2 = dict([("name", "Alice"), ("age", 30), ("city", "NYC")])

# 3. From two lists using zip
keys   = ["name", "age", "city"]
values = ["Alice", 30, "NYC"]
d3 = dict(zip(keys, values))

# 4. fromkeys() — create dict with same value for all keys
d4 = dict.fromkeys(["a", "b", "c"], 0)
print(d4)   # {'a': 0, 'b': 0, 'c': 0}
d5 = dict.fromkeys(["x", "y", "z"])
print(d5)   # {'x': None, 'y': None, 'z': None}

# 5. Dict comprehension
squared = {x: x**2 for x in range(1, 6)}
print(squared)   # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

# Valid key types — must be hashable
valid_keys = {
    42: "int key",
    3.14: "float key",
    True: "bool key",
    "name": "str key",
    (1, 2): "tuple key"
}

# ❌ Invalid key types — unhashable
# {[1,2]: "a"}   # TypeError: unhashable type: 'list'
# {{}: "a"}      # TypeError: unhashable type: 'dict'

# Nested dict
company = {
    "name": "TechCorp",
    "founded": 2010,
    "address": {
        "street": "123 Main St",
        "city": "San Francisco",
        "state": "CA"
    },
    "employees": ["Alice", "Bob", "Charlie"],
    "is_active": True
}

print(type(person))    # <class 'dict'>
print(len(person))     # 3

6.2 Accessing Values

Python
person = {"name": "Alice", "age": 30, "city": "NYC"}

# Direct access — KeyError if key doesn't exist
print(person["name"])    # Alice
print(person["age"])     # 30
# print(person["email"]) # ❌ KeyError: 'email'

# get(key, default) — safe access, returns default if key missing
print(person.get("name"))           # Alice
print(person.get("email"))          # None — no error
print(person.get("email", "N/A"))   # N/A — custom default

# Nested access
company = {
    "address": {
        "city": "SF",
        "zip": "94102"
    }
}
print(company["address"]["city"])           # SF
print(company.get("address", {}).get("city", "Unknown"))  # SF — safe nested access

# Check if key exists
print("name" in person)          # True
print("email" in person)         # False
print("email" not in person)     # True

# in checks KEYS only (not values)
print(30 in person)              # False — 30 is a value, not a key
print(30 in person.values())     # True  — check values explicitly

6.3 Adding and Updating

Python
person = {"name": "Alice", "age": 30}

# Add new key-value pair
person["email"] = "alice@example.com"
person["city"]  = "NYC"
print(person)   # {'name': 'Alice', 'age': 30, 'email': 'alice@...', 'city': 'NYC'}

# Update existing key
person["age"] = 31
print(person["age"])   # 31

# update() — add/update multiple key-value pairs
person.update({"phone": "555-1234", "age": 32})
person.update(country="USA", state="NY")   # Keyword args also work
print(person)

# setdefault(key, default) — add key ONLY if it doesn't exist
person.setdefault("nickname", "Ali")   # Adds 'nickname': 'Ali'
print(person.get("nickname"))          # Ali
person.setdefault("name", "Bob")       # Does NOT change 'name' — it already exists!
print(person.get("name"))             # Alice — unchanged

# Merging dicts (Python 3.9+)
d1 = {"a": 1, "b": 2}
d2 = {"c": 3, "d": 4}
d3 = {"b": 99, "e": 5}

merged = d1 | d2
print(merged)      # {'a': 1, 'b': 2, 'c': 3, 'd': 4}

# Later key wins when duplicates
merged = d1 | d3
print(merged)      # {'a': 1, 'b': 99, 'e': 5} — d3's 'b' overwrites d1's

d1 |= d2           # In-place merge (like +=)
print(d1)          # {'a': 1, 'b': 2, 'c': 3, 'd': 4}

# Python 3.5+ alternative
merged = {**d1, **d2}  # Unpack both dicts

6.4 Removing Elements

Python
d = {"a": 1, "b": 2, "c": 3, "d": 4}

# del — remove by key, KeyError if missing
del d["b"]
print(d)         # {'a': 1, 'c': 3, 'd': 4}
# del d["z"]     # ❌ KeyError

# pop(key, default) — remove and RETURN value
val = d.pop("c")
print(val)       # 3
print(d)         # {'a': 1, 'd': 4}
val = d.pop("z", "not found")   # ✅ No error with default
print(val)       # not found

# popitem() — remove and return LAST inserted key-value pair (Python 3.7+)
d = {"x": 10, "y": 20, "z": 30}
item = d.popitem()
print(item)      # ('z', 30) — last inserted
print(d)         # {'x': 10, 'y': 20}

# clear() — remove everything
d.clear()
print(d)         # {}

6.5 Iterating Over Dictionaries

Python
person = {"name": "Alice", "age": 30, "city": "NYC"}

# Iterate over KEYS (default behavior)
for key in person:
    print(key)      # name  age  city

# Iterate over KEYS explicitly
for key in person.keys():
    print(key)

# Iterate over VALUES
for value in person.values():
    print(value)    # Alice  30  NYC

# Iterate over KEY-VALUE PAIRS — most common!
for key, value in person.items():
    print(f"{key}: {value}")
# name: Alice
# age: 30
# city: NYC

# With index
for i, (key, value) in enumerate(person.items(), start=1):
    print(f"{i}. {key} = {value}")

# Dict comprehension — transform while iterating
upper_keys = {k.upper(): v for k, v in person.items()}
print(upper_keys)   # {'NAME': 'Alice', 'AGE': 30, 'CITY': 'NYC'}

# Filter while iterating
adults = {name: age for name, age in {"Alice": 30, "Bob": 17, "Charlie": 25}.items() if age >= 18}
print(adults)   # {'Alice': 30, 'Charlie': 25}

6.6 All Dictionary Methods

Python
d = {"a": 1, "b": 2, "c": 3}

# View objects (dynamic — reflect changes)
print(d.keys())    # dict_keys(['a', 'b', 'c'])
print(d.values())  # dict_values([1, 2, 3])
print(d.items())   # dict_items([('a', 1), ('b', 2), ('c', 3)])

# Convert to list if needed
keys   = list(d.keys())   # ['a', 'b', 'c']
values = list(d.values()) # [1, 2, 3]
items  = list(d.items())  # [('a', 1), ('b', 2), ('c', 3)]

# Copying
shallow_copy = d.copy()

# dict.fromkeys()
template = dict.fromkeys(["name", "age", "email"], "")
print(template)   # {'name': '', 'age': '', 'email': ''}

Method

Description

Returns

d[key]

Get value (KeyError if missing)

value

d.get(key, default)

Safe get

value or default

d.keys()

All keys

dict_keys view

d.values()

All values

dict_values view

d.items()

All key-value pairs

dict_items view

d.update(other)

Add/update from other

None

d.setdefault(key, val)

Get or set default

value

d.pop(key, default)

Remove & return

value

d.popitem()

Remove & return last pair

(key, value)

d.clear()

Remove all

None

d.copy()

Shallow copy

dict

dict.fromkeys(keys, val)

Create from keys

dict

6.7 Advanced Dictionary Patterns

Python
from collections import defaultdict, Counter, OrderedDict

# defaultdict — auto-creates default value for missing keys
word_count = defaultdict(int)
text = "the quick brown fox jumps over the lazy dog the"
for word in text.split():
    word_count[word] += 1     # No KeyError! Default int is 0
print(dict(word_count))

# Group items by category
from collections import defaultdict
category_items = defaultdict(list)
products = [("fruit", "apple"), ("veggie", "carrot"), ("fruit", "banana"), ("veggie", "broccoli")]
for category, item in products:
    category_items[category].append(item)
print(dict(category_items))
# {'fruit': ['apple', 'banana'], 'veggie': ['carrot', 'broccoli']}

# Counter — specialized dict for counting
from collections import Counter
text = "hello world"
char_count = Counter(text)
print(char_count)                    # Counter({'l': 3, 'o': 2, ...})
print(char_count.most_common(3))     # [('l', 3), ('o', 2), ('h', 1)]

# Counter arithmetic
c1 = Counter(["a", "b", "a", "c"])
c2 = Counter(["a", "b", "b", "d"])
print(c1 + c2)    # Counter({'a': 3, 'b': 3, 'c': 1, 'd': 1})
print(c1 - c2)    # Counter({'a': 1, 'c': 1})
print(c1 & c2)    # Counter({'a': 1, 'b': 1}) — min of each

# OrderedDict — maintains insertion order (less needed in Python 3.7+)
od = OrderedDict()
od["first"]  = 1
od["second"] = 2
od["third"]  = 3
print(list(od.keys()))   # ['first', 'second', 'third']

# Move to end / beginning
od.move_to_end("first")
print(list(od.keys()))   # ['second', 'third', 'first']
od.move_to_end("first", last=False)
print(list(od.keys()))   # ['first', 'second', 'third']

# Nested defaultdict
nested = defaultdict(lambda: defaultdict(int))
nested["Alice"]["Python"] += 5
nested["Alice"]["SQL"] += 3
nested["Bob"]["Python"] += 2
print(dict(nested["Alice"]))   # {'Python': 5, 'SQL': 3}

7. Intermediate Usage — Combining All Four

7.1 Converting Between Data Structures

Python
# List ↔ Tuple ↔ Set ↔ Dict

lst = [1, 2, 2, 3, 3, 4]

# List → Set (removes duplicates)
s = set(lst)        # {1, 2, 3, 4}

# List → Tuple
t = tuple(lst)      # (1, 2, 2, 3, 3, 4)

# Set → List (order not guaranteed)
back_to_list = list(s)   # [1, 2, 3, 4] — some order

# Tuple → List (to make it mutable)
mutable = list(t)

# List of pairs → Dict
pairs = [("name", "Alice"), ("age", 30), ("city", "NYC")]
d = dict(pairs)
print(d)            # {'name': 'Alice', 'age': 30, 'city': 'NYC'}

# Dict → List of keys, values, or pairs
keys   = list(d.keys())         # ['name', 'age', 'city']
values = list(d.values())       # ['Alice', 30, 'NYC']
items  = list(d.items())        # [('name', 'Alice'), ('age', 30), ...]

# Dict → Set of keys
key_set = set(d)                # {'name', 'age', 'city'}

7.2 Nesting Data Structures

Python
# List of Dicts — most common in real data work
employees = [
    {"id": 1, "name": "Alice",   "dept": "Engineering", "salary": 90000},
    {"id": 2, "name": "Bob",     "dept": "Marketing",   "salary": 70000},
    {"id": 3, "name": "Charlie", "dept": "Engineering", "salary": 95000},
    {"id": 4, "name": "Diana",   "dept": "Marketing",   "salary": 75000},
]

# Filter engineers
engineers = [e for e in employees if e["dept"] == "Engineering"]

# Get all names
names = [e["name"] for e in employees]
print(names)    # ['Alice', 'Bob', 'Charlie', 'Diana']

# Average salary
avg_salary = sum(e["salary"] for e in employees) / len(employees)
print(f"Average: ${avg_salary:,.0f}")

# Group by department
from collections import defaultdict
by_dept = defaultdict(list)
for emp in employees:
    by_dept[emp["dept"]].append(emp["name"])
print(dict(by_dept))
# {'Engineering': ['Alice', 'Charlie'], 'Marketing': ['Bob', 'Diana']}

# Sort by salary descending
sorted_emps = sorted(employees, key=lambda e: e["salary"], reverse=True)

# Dict of Lists
schedule = {
    "Monday":    ["Alice", "Bob"],
    "Tuesday":   ["Charlie", "Diana"],
    "Wednesday": ["Alice", "Charlie"],
}
# Who works Monday?
print(schedule["Monday"])   # ['Alice', 'Bob']
# All unique workers
all_workers = set(name for names in schedule.values() for name in names)
print(all_workers)   # {'Alice', 'Bob', 'Charlie', 'Diana'}

# Dict of Dicts
inventory = {
    "laptop":  {"quantity": 50,  "price": 999.99},
    "mouse":   {"quantity": 150, "price": 29.99},
    "keyboard": {"quantity": 100, "price": 79.99},
}
# Total value
total = sum(item["quantity"] * item["price"] for item in inventory.values())
print(f"Total inventory value: ${total:,.2f}")

7.3 Unpacking with Zip

Python
# Zip multiple iterables
names  = ("Alice", "Bob", "Charlie")
scores = (92, 87, 95)
grades = ("A", "B", "A")

# zip stops at the shortest iterable
for name, score, grade in zip(names, scores, grades):
    print(f"{name}: {score} ({grade})")

# Build dict using zip
person = dict(zip(["name", "age", "city"], ["Alice", 30, "NYC"]))
print(person)

# Unzip — separate a list of tuples into separate tuples
data = [("Alice", 92), ("Bob", 87), ("Charlie", 95)]
names, scores = zip(*data)    # * unpacks the list
print(names)    # ('Alice', 'Bob', 'Charlie')
print(scores)   # (92, 87, 95)

# zip_longest — use when lists have different lengths
from itertools import zip_longest
a = [1, 2, 3]
b = [4, 5]
for x, y in zip_longest(a, b, fillvalue=0):
    print(x, y)
# 1 4
# 2 5
# 3 0

8. Advanced Concepts

8.1 List Under the Hood — Dynamic Array

Python
import sys

# Python list is a dynamic array — doubles capacity when full
lst = []
prev_size = 0
for i in range(20):
    lst.append(i)
    curr_size = sys.getsizeof(lst)
    if curr_size != prev_size:
        print(f"n={len(lst):2d} → size={curr_size} bytes")
        prev_size = curr_size

# Output shows capacity doublings:
# n= 1 → size=88
# n= 5 → size=120
# n= 9 → size=184
# n=17 → size=248

8.2 Dict Under the Hood — Hash Table

Python
# Dicts use hash tables — this is why keys must be hashable
# hash() returns the hash value of an object
print(hash("hello"))    # Some integer (varies)
print(hash(42))         # 42
print(hash((1, 2, 3)))  # Some integer
# hash([1,2,3])         # ❌ TypeError — lists are unhashable

# Dict lookup is O(1) because:
# 1. Compute hash(key)
# 2. Find bucket using hash % table_size
# 3. Compare key (handle collisions)

# Demonstration of O(1) lookup
import time
small_dict = {i: i**2 for i in range(10)}
large_dict = {i: i**2 for i in range(10_000_000)}

# Both lookups are equally fast!
start = time.time()
_ = small_dict.get(9)
print(f"Small dict: {time.time()-start:.8f}s")

start = time.time()
_ = large_dict.get(9_999_999)
print(f"Large dict: {time.time()-start:.8f}s")
# Times are nearly identical — true O(1)!

8.3 Memory-Efficient Patterns

Python
# Generator expressions — don't build entire list in memory
import sys

# List: all values in memory at once
lst_expr = [x**2 for x in range(10000)]
print(sys.getsizeof(lst_expr))   # ~87,624 bytes

# Generator: one value at a time
gen_expr = (x**2 for x in range(10000))
print(sys.getsizeof(gen_expr))   # 112 bytes — tiny!

# But you can still iterate and use it
total = sum(gen_expr)   # Consumes the generator

# For huge datasets — read and process in chunks
def chunk(lst, size):
    """Split list into chunks of given size."""
    for i in range(0, len(lst), size):
        yield lst[i:i+size]

data = list(range(100))
for batch in chunk(data, 10):
    print(f"Processing batch of {len(batch)}")

8.4 Sorting Complex Structures

Python
from operator import attrgetter, itemgetter

students = [
    {"name": "Charlie", "grade": "A", "score": 92},
    {"name": "Alice",   "grade": "B", "score": 87},
    {"name": "Bob",     "grade": "A", "score": 95},
    {"name": "Diana",   "grade": "B", "score": 87},
]

# Sort by single key
by_score = sorted(students, key=itemgetter("score"), reverse=True)

# Sort by multiple keys (grade ASC, then score DESC)
multi_sort = sorted(students, key=lambda s: (s["grade"], -s["score"]))
for s in multi_sort:
    print(f"{s['name']}: {s['grade']} / {s['score']}")
# Alice: B / 87
# Bob: A / 95
# Charlie: A / 92
# Diana: B / 87  ← Wait, both A grade sorted by score descending

# Stable sort — equal elements maintain original order
by_name = sorted(students, key=itemgetter("name"))

# Sort with None values (handle gracefully)
data = [{"score": 92}, {"score": None}, {"score": 87}]
sorted_data = sorted(data, key=lambda x: (x["score"] is None, x["score"] or 0))

9. Real-World Use Cases

9.1 Data Processing Pipeline

Python
# Process sales records using all four structures
raw_data = [
    {"product": "Laptop",   "region": "North", "sales": 150, "revenue": 149850},
    {"product": "Mouse",    "region": "South", "sales": 500, "revenue": 14995},
    {"product": "Keyboard", "region": "North", "sales": 300, "revenue": 23997},
    {"product": "Laptop",   "region": "South", "sales": 120, "revenue": 119880},
    {"product": "Mouse",    "region": "North", "sales": 450, "revenue": 13455},
    {"product": "Keyboard", "region": "South", "sales": 200, "revenue": 15998},
]

# 1. Get unique products (Set)
unique_products = {record["product"] for record in raw_data}
print(f"Products: {unique_products}")

# 2. Total revenue per product (Dict)
revenue_by_product = {}
for record in raw_data:
    product = record["product"]
    revenue_by_product[product] = revenue_by_product.get(product, 0) + record["revenue"]
print("Revenue by product:", revenue_by_product)

# 3. Top-performing product (Dict + max)
top_product = max(revenue_by_product, key=revenue_by_product.get)
print(f"Top product: {top_product} (${revenue_by_product[top_product]:,.0f})")

# 4. Summary tuple (immutable snapshot)
summary = (
    len(raw_data),
    sum(r["revenue"] for r in raw_data),
    max(r["revenue"] for r in raw_data),
)
total_records, total_rev, max_rev = summary
print(f"Records: {total_records}, Total: ${total_rev:,.0f}, Max: ${max_rev:,.0f}")

9.2 Config Management

Python
# Application configuration using dict
DEFAULT_CONFIG = {
    "database": {
        "host": "localhost",
        "port": 5432,
        "name": "mydb",
        "pool_size": 10
    },
    "cache": {
        "host": "localhost",
        "port": 6379,
        "ttl": 3600
    },
    "app": {
        "debug": False,
        "secret_key": "change-me-in-production",
        "allowed_hosts": ["localhost", "127.0.0.1"],
        "cors_origins": {"https://myapp.com", "https://api.myapp.com"}
    }
}

def get_config(env="development"):
    """Get configuration for given environment."""
    config = dict(DEFAULT_CONFIG)    # Start with defaults

    if env == "production":
        config["app"]["debug"] = False
        config["database"]["pool_size"] = 20

    return config

config = get_config("production")
db_host = config["database"]["host"]
print(f"Connecting to {db_host}:{config['database']['port']}")

9.3 Caching with Dictionary

Python
# Memoization — cache expensive function results
def create_cache():
    cache = {}

    def memoize(func):
        def wrapper(*args):
            if args not in cache:
                cache[args] = func(*args)
            return cache[args]
        return wrapper

    return memoize

memoize = create_cache()

@memoize
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

print(fibonacci(50))   # Instant! Without cache: exponential time

# Built-in LRU Cache
from functools import lru_cache

@lru_cache(maxsize=128)
def expensive_calculation(n):
    import time
    time.sleep(0.1)    # Simulate slow computation
    return n ** 2

print(expensive_calculation(5))   # Slow first time
print(expensive_calculation(5))   # Instant — from cache!
print(expensive_calculation.cache_info())

9.4 API Response Handling

Python
import json

# Typical API response as dict
api_response = {
    "status": "success",
    "code": 200,
    "data": {
        "users": [
            {"id": 1, "name": "Alice", "roles": ["admin", "editor"],  "active": True},
            {"id": 2, "name": "Bob",   "roles": ["viewer"],           "active": True},
            {"id": 3, "name": "Charlie", "roles": ["editor"],         "active": False},
        ],
        "pagination": {"page": 1, "per_page": 10, "total": 3}
    }
}

# Extract active users
users      = api_response["data"]["users"]
active     = [u for u in users if u["active"]]
admin_ids  = {u["id"] for u in users if "admin" in u["roles"]}  # Set

# Get all unique roles
all_roles  = set()
for user in users:
    all_roles.update(user["roles"])
print(f"All roles: {all_roles}")   # {'admin', 'editor', 'viewer'}

# Build lookup by ID
user_by_id = {u["id"]: u for u in users}
print(user_by_id[1]["name"])    # Alice

# Safe navigation
page_info = api_response.get("data", {}).get("pagination", {})
print(f"Page {page_info.get('page', 1)} of {page_info.get('total', 0)}")

10. Practical Examples

Example 1: Student Database System

Python
"""
Student management system demonstrating all four data structures.
"""

# --- Data Storage ---
students_db = {}  # Dict: {student_id: student_record}
enrolled_courses = {}  # Dict: {student_id: set of courses}
all_courses = {"Python101", "DataScience201", "WebDev301", "Algorithms401"}

# --- Functions ---
def add_student(student_id, name, age, email):
    """Add a new student. Returns error message if ID exists."""
    if student_id in students_db:
        return f"Error: Student ID {student_id} already exists"

    students_db[student_id] = {
        "id": student_id,
        "name": name,
        "age": age,
        "email": email,
        "scores": [],       # List — ordered, mutable
        "badges": set(),    # Set — unique achievements
    }
    enrolled_courses[student_id] = set()
    return f"Student {name} added successfully"

def enroll(student_id, course):
    """Enroll student in a course."""
    if student_id not in students_db:
        return "Student not found"
    if course not in all_courses:
        return f"Course '{course}' does not exist"

    enrolled_courses[student_id].add(course)  # Set handles duplicates
    return f"Enrolled {students_db[student_id]['name']} in {course}"

def add_score(student_id, subject, score):
    """Add a score for a student."""
    if student_id not in students_db:
        return "Student not found"

    students_db[student_id]["scores"].append((subject, score))  # Tuple: immutable pair
    if score >= 90:
        students_db[student_id]["badges"].add("High Achiever")
    if score == 100:
        students_db[student_id]["badges"].add("Perfect Score")

def get_report(student_id):
    """Generate student report."""
    if student_id not in students_db:
        return "Student not found"

    s = students_db[student_id]
    scores = s["scores"]
    avg = sum(sc for _, sc in scores) / len(scores) if scores else 0

    # Build report using f-string
    report = f"""
╔══════════════════════════════════╗
  STUDENT REPORT
╠══════════════════════════════════╣
  ID:       {s['id']}
  Name:     {s['name']}
  Age:      {s['age']}
  Email:    {s['email']}
  Average:  {avg:.1f}
  Badges:   {', '.join(s['badges']) or 'None'}
  Courses:  {', '.join(enrolled_courses[student_id]) or 'None'}
  Scores:
"""
    for subject, score in sorted(scores, key=lambda x: -x[1]):
        report += f"    • {subject:<20} {score}\n"

    report += "╚══════════════════════════════════╝"
    return report

def get_top_students(n=3):
    """Return top N students by average score."""
    averages = []
    for sid, data in students_db.items():
        scores = data["scores"]
        if scores:
            avg = sum(sc for _, sc in scores) / len(scores)
            averages.append((data["name"], avg))

    return sorted(averages, key=lambda x: -x[1])[:n]

def find_common_enrollments(id1, id2):
    """Find courses both students are enrolled in."""
    if id1 not in students_db or id2 not in students_db:
        return set()
    return enrolled_courses[id1] & enrolled_courses[id2]   # Set intersection

# --- Usage ---
print(add_student(1, "Alice Johnson", 22, "alice@college.edu"))
print(add_student(2, "Bob Smith",     21, "bob@college.edu"))
print(add_student(3, "Charlie Brown", 23, "charlie@college.edu"))

print(enroll(1, "Python101"))
print(enroll(1, "DataScience201"))
print(enroll(2, "Python101"))
print(enroll(2, "WebDev301"))
print(enroll(3, "Python101"))
print(enroll(3, "DataScience201"))

add_score(1, "Python101", 95)
add_score(1, "DataScience201", 88)
add_score(2, "Python101", 100)
add_score(2, "WebDev301", 72)
add_score(3, "Python101", 78)
add_score(3, "DataScience201", 91)

print(get_report(1))

print("\nTop Students:")
for name, avg in get_top_students():
    print(f"  {name}: {avg:.1f}")

common = find_common_enrollments(1, 3)
print(f"\nAlice & Charlie share: {common}")

Example 2: Inventory Management

Python
"""
Inventory tracker using List, Tuple, Set, and Dict together.
"""
from collections import defaultdict
from datetime import datetime

# Inventory as dict of dicts
inventory = {}

# Transaction log as list of tuples (immutable records)
transactions = []   # List of tuples: (timestamp, item, action, qty, price)

# Low stock alerts as set
low_stock_items = set()
LOW_STOCK_THRESHOLD = 10

def add_item(item_id, name, quantity, price, category):
    """Add a new item to inventory."""
    if item_id in inventory:
        raise ValueError(f"Item {item_id} already exists")

    inventory[item_id] = {
        "name": name,
        "quantity": quantity,
        "price": price,
        "category": category,
        "created_at": datetime.now().isoformat()
    }

    # Log as immutable tuple
    transactions.append((datetime.now().isoformat(), item_id, "ADD", quantity, price))
    _check_stock(item_id)
    print(f"Added: {name} (qty={quantity}, price=${price:.2f})")

def sell(item_id, qty):
    """Record a sale."""
    if item_id not in inventory:
        raise KeyError(f"Item {item_id} not found")

    item = inventory[item_id]
    if item["quantity"] < qty:
        raise ValueError(f"Insufficient stock. Available: {item['quantity']}")

    item["quantity"] -= qty
    transactions.append((datetime.now().isoformat(), item_id, "SELL", qty, item["price"]))
    _check_stock(item_id)
    revenue = qty * item["price"]
    print(f"Sold {qty}x {item['name']} — Revenue: ${revenue:.2f}")

def restock(item_id, qty):
    """Restock an item."""
    if item_id not in inventory:
        raise KeyError(f"Item {item_id} not found")

    inventory[item_id]["quantity"] += qty
    transactions.append((datetime.now().isoformat(), item_id, "RESTOCK", qty, 0))
    low_stock_items.discard(item_id)   # Remove from alerts if restocked
    print(f"Restocked {inventory[item_id]['name']} by {qty}")

def _check_stock(item_id):
    """Check and update low stock alerts."""
    if inventory[item_id]["quantity"] <= LOW_STOCK_THRESHOLD:
        low_stock_items.add(item_id)

def get_summary():
    """Generate inventory summary."""
    # Unique categories (set)
    categories = {item["category"] for item in inventory.values()}

    # Revenue by category (defaultdict)
    rev_by_cat = defaultdict(float)
    for t in transactions:
        ts, item_id, action, qty, price = t
        if action == "SELL":
            category = inventory[item_id]["category"]
            rev_by_cat[category] += qty * price

    # Total inventory value
    total_value = sum(item["quantity"] * item["price"] for item in inventory.values())

    print("\n📊 INVENTORY SUMMARY")
    print(f"Total items:    {len(inventory)}")
    print(f"Categories:     {', '.join(categories)}")
    print(f"Total value:    ${total_value:,.2f}")
    print(f"Transactions:   {len(transactions)}")
    print(f"Low stock:      {len(low_stock_items)} items")

    print("\n💰 Revenue by Category:")
    for cat, rev in sorted(rev_by_cat.items(), key=lambda x: -x[1]):
        print(f"  {cat:<15} ${rev:,.2f}")

    if low_stock_items:
        print(f"\n⚠️  LOW STOCK ALERTS:")
        for item_id in low_stock_items:
            item = inventory[item_id]
            print(f"  {item['name']} — only {item['quantity']} left!")

# Demo
add_item("L001", "Laptop",   50,  999.99, "Electronics")
add_item("M001", "Mouse",    200, 29.99,  "Accessories")
add_item("K001", "Keyboard", 100, 79.99,  "Accessories")
add_item("M002", "Monitor",  30,  499.99, "Electronics")

sell("L001", 45)      # Sells most laptops → low stock
sell("M001", 190)     # Sells most mice → low stock
sell("K001", 50)

restock("L001", 100)

get_summary()

11. Edge Cases and Errors

Common Mistakes

Python
# ❌ MISTAKE 1: Empty set vs empty dict
empty_set  = set()   # ✅ Correct
empty_set2 = {}      # ❌ This is a DICT, not a set!
print(type(empty_set))   # <class 'set'>
print(type(empty_set2))  # <class 'dict'>

# ❌ MISTAKE 2: Single-element tuple
not_tuple = (42)     # ❌ Just an int
real_tuple = (42,)   # ✅ Need the trailing comma!
print(type(not_tuple))   # <class 'int'>
print(type(real_tuple))  # <class 'tuple'>

# ❌ MISTAKE 3: Modifying list while iterating
numbers = [1, 2, 3, 4, 5, 6]
for n in numbers:
    if n % 2 == 0:
        numbers.remove(n)   # Skips elements!
print(numbers)   # [1, 3, 5] — looks right but can be wrong!

# ✅ FIX: Filter instead
numbers = [n for n in numbers if n % 2 != 0]
# OR iterate a copy
for n in numbers[:]:
    if n % 2 == 0:
        numbers.remove(n)

# ❌ MISTAKE 4: Mutable default argument
def add_to(item, lst=[]):    # WRONG — list persists between calls!
    lst.append(item)
    return lst

print(add_to(1))   # [1]
print(add_to(2))   # [1, 2] — BUG! Expected [2]

# ✅ FIX:
def add_to(item, lst=None):
    if lst is None:
        lst = []
    lst.append(item)
    return lst

# ❌ MISTAKE 5: dict.fromkeys() with mutable default
d = dict.fromkeys(["a", "b", "c"], [])
d["a"].append(1)
print(d)   # {'a': [1], 'b': [1], 'c': [1]} — ALL lists are the SAME object!

# ✅ FIX: Use dict comprehension
d = {k: [] for k in ["a", "b", "c"]}
d["a"].append(1)
print(d)   # {'a': [1], 'b': [], 'c': []} ✅

# ❌ MISTAKE 6: Forgetting that set is unordered
s = {3, 1, 4, 1, 5}
# s[0]   # ❌ TypeError: 'set' object is not subscriptable
# To access by index, convert to sorted list first
s_list = sorted(s)   # [1, 3, 4, 5]
print(s_list[0])     # 1 ✅

# ❌ MISTAKE 7: KeyError on dict access
config = {"host": "localhost"}
# port = config["port"]          # ❌ KeyError: 'port'
port = config.get("port", 5432)  # ✅ Returns default 5432

# ❌ MISTAKE 8: Assuming dict order in old Python
# In Python < 3.7, dicts were UNORDERED
# In Python 3.7+, dicts maintain INSERTION ORDER
# Don't rely on dict order if supporting old Python!

# ❌ MISTAKE 9: Shallow copy of nested structures
original = {"data": [1, 2, 3]}
copy1 = original.copy()
copy1["data"].append(999)
print(original)   # {'data': [1, 2, 3, 999]} — modified!

import copy
original = {"data": [1, 2, 3]}
copy2 = copy.deepcopy(original)
copy2["data"].append(999)
print(original)   # {'data': [1, 2, 3]} ✅

# ❌ MISTAKE 10: Using list when set is better for membership
banned = ["user1", "user2", "user3", "user4", "user5"]   # List
if "user3" in banned:   # O(n) — scans entire list
    print("Banned")

# ✅ Use set for O(1) membership testing
banned = {"user1", "user2", "user3", "user4", "user5"}   # Set
if "user3" in banned:   # O(1) — instant hash lookup
    print("Banned")

Error Handling

Python
# Handle common errors gracefully
data = [1, 2, 3]

# IndexError — list
try:
    item = data[10]
except IndexError as e:
    print(f"Index out of range: {e}")
    item = None

# KeyError — dict
config = {"host": "localhost"}
try:
    port = config["port"]
except KeyError:
    port = 5432  # Use default

# TypeError — unhashable type
try:
    s = {[1, 2, 3]}   # List as set element
except TypeError as e:
    print(f"Can't add to set: {e}")

# ValueError — tuple unpacking
try:
    a, b = (1, 2, 3)   # Too many values
except ValueError as e:
    print(f"Unpacking error: {e}")

# AttributeError — wrong method
try:
    t = (1, 2, 3)
    t.append(4)
except AttributeError as e:
    print(f"Tuples don't support: {e}")

12. Pro Developer Insights

Best Practices

Python
# 1. Choose the right structure from the start
# Bad: Using list when uniqueness matters
user_ids_bad = [101, 102, 101, 103]   # Duplicate 101!
# Good: Use set
user_ids_good = {101, 102, 103}

# 2. Use dict.get() — never bare dict[key] for optional keys
settings = {"theme": "dark"}
# Bad
try:
    font = settings["font"]
except KeyError:
    font = "Arial"
# Good
font = settings.get("font", "Arial")

# 3. Use tuple for multiple return values
def parse_coords(string):
    # Good — named tuple makes it self-documenting
    from collections import namedtuple
    Coords = namedtuple("Coords", ["lat", "lng"])
    lat, lng = map(float, string.split(","))
    return Coords(lat, lng)

coords = parse_coords("28.61, 77.21")
print(coords.lat)   # 28.61 — much clearer than coords[0]

# 4. Prefer set operations over nested loops
# Bad: O(n*m) — nested loop
list1 = [1, 2, 3, 4, 5]
list2 = [3, 4, 5, 6, 7]
common_bad = [x for x in list1 if x in list2]

# Good: O(n+m) — set intersection
common_good = list(set(list1) & set(list2))

# 5. Use defaultdict to avoid KeyError in accumulation patterns
from collections import defaultdict

# Bad
word_groups = {}
for word in ["apple", "ant", "ball", "bear", "cat"]:
    first = word[0]
    if first not in word_groups:
        word_groups[first] = []
    word_groups[first].append(word)

# Good
word_groups = defaultdict(list)
for word in ["apple", "ant", "ball", "bear", "cat"]:
    word_groups[word[0]].append(word)

# 6. Unpack tuples for readability
# Bad
def process(record):
    print(record[0], record[1])

# Good
def process(record):
    name, score = record
    print(name, score)

# 7. Use frozenset when you need a set as a dict key
# Cache results indexed by a set of inputs
cache = {}
def process_features(features):
    key = frozenset(features)  # Hashable set!
    if key not in cache:
        cache[key] = sum(features)  # Expensive computation
    return cache[key]

print(process_features([1, 2, 3]))
print(process_features([3, 1, 2]))  # Same result — order doesn't matter

13. Performance Comparison

Benchmark: Membership Testing

Python
import timeit

# Build structures
data_list  = list(range(100_000))
data_tuple = tuple(range(100_000))
data_set   = set(range(100_000))
data_dict  = {i: i for i in range(100_000)}

n = 10_000

# Membership: x in structure
list_time  = timeit.timeit(lambda: 99_999 in data_list,  number=n)
tuple_time = timeit.timeit(lambda: 99_999 in data_tuple, number=n)
set_time   = timeit.timeit(lambda: 99_999 in data_set,   number=n)
dict_time  = timeit.timeit(lambda: 99_999 in data_dict,  number=n)

print(f"List:   {list_time:.4f}s")    # ~0.2s   — O(n)
print(f"Tuple:  {tuple_time:.4f}s")   # ~0.2s   — O(n)
print(f"Set:    {set_time:.6f}s")     # ~0.00001s — O(1)
print(f"Dict:   {dict_time:.6f}s")    # ~0.00001s — O(1)
# Set and Dict are ~10,000x faster for membership!

Complexity Reference

Operation

List

Tuple

Set

Dict

Access by index

O(1)

O(1)

N/A

N/A

Access by key

N/A

N/A

N/A

O(1) avg

Search (in)

O(n)

O(n)

O(1) avg

O(1) avg

Append/Add

O(1)

N/A

O(1) avg

O(1) avg

Insert at index

O(n)

N/A

N/A

N/A

Delete by index

O(n)

N/A

N/A

N/A

Delete by key

N/A

N/A

O(1) avg

O(1) avg

Iteration

O(n)

O(n)

O(n)

O(n)

Sort

O(n log n)

N/A

N/A

N/A

Memory (relative)

Medium

Low

Medium

High

When to Use Which

Situation

Best Choice

Why

Ordered sequence of items

List

Mutable, indexed

Fixed coordinates / DB row

Tuple

Immutable, hashable

Unique tags / IDs

Set

Auto-deduplication

Config / JSON-like data

Dict

Named access

Fast membership testing

Set or Dict

O(1) lookup

Function multiple returns

Tuple

Unpacking

Counting occurrences

Counter (Dict)

Specialized

Group items by category

defaultdict(list)

Auto-init

Immutable dict key

Tuple or frozenset

Hashable

LRU Cache

Dict

O(1) access


14. Data Science Perspective

Working with DataFrames (Pandas)

Python
import pandas as pd
from collections import Counter

# Dict → DataFrame (most common pattern in Data Science)
data = {
    "name":    ["Alice", "Bob", "Charlie", "Diana", "Eve"],
    "age":     [25, 30, 35, 28, 32],
    "salary":  [70000, 85000, 95000, 78000, 90000],
    "dept":    ["Engineering", "Marketing", "Engineering", "HR", "Engineering"],
    "skills":  [
        ["Python", "SQL", "ML"],
        ["Excel", "PowerPoint", "SQL"],
        ["Python", "Docker", "K8s"],
        ["Excel", "Communication"],
        ["Python", "TensorFlow", "SQL"]
    ]
}

df = pd.DataFrame(data)

# Set operations on DataFrame columns
engineering = set(df[df["dept"] == "Engineering"]["name"])
high_earners = set(df[df["salary"] > 80000]["name"])
print(engineering & high_earners)   # Engineers who earn > 80k

# Find all unique skills
all_skills = set()
for skill_list in df["skills"]:
    all_skills.update(skill_list)
print(f"All skills: {all_skills}")

# Count skill frequency
skill_counter = Counter()
for skill_list in df["skills"]:
    skill_counter.update(skill_list)
print(skill_counter.most_common(3))   # Top 3 skills

# Dict comprehension for salary lookup
salary_lookup = dict(zip(df["name"], df["salary"]))
print(salary_lookup["Alice"])   # 70000

# Group by using defaultdict
from collections import defaultdict
dept_salaries = defaultdict(list)
for _, row in df.iterrows():
    dept_salaries[row["dept"]].append(row["salary"])

# Average salary per department
dept_avg = {dept: sum(salaries)/len(salaries)
            for dept, salaries in dept_salaries.items()}
print(dept_avg)

Data Cleaning Patterns

Python
# Typical data cleaning using dict, set, list, tuple
raw_records = [
    ("Alice",  "25", "alice@email.com",  "Engineering"),
    ("Bob",    "abc", "bob@email.com",   "Marketing"),    # Invalid age
    ("Charlie","30", None,               "Engineering"),  # Missing email
    ("Alice",  "25", "alice@email.com",  "Engineering"),  # Duplicate
    ("Diana",  "28", "diana@email.com",  "HR"),
]

def clean_records(records):
    seen = set()          # Track duplicates
    cleaned = []          # Clean results
    errors = []           # Error log

    for record in records:
        name, age_str, email, dept = record

        # Deduplication
        fingerprint = (name, age_str, email or "")
        if fingerprint in seen:
            errors.append({"type": "duplicate", "record": record})
            continue
        seen.add(fingerprint)

        # Validation
        try:
            age = int(age_str)
        except (ValueError, TypeError):
            errors.append({"type": "invalid_age", "record": record})
            continue

        if email is None:
            errors.append({"type": "missing_email", "record": record})
            continue

        # Build clean record (tuple — immutable, memory-efficient)
        cleaned.append({
            "name":  name.strip().title(),
            "age":   age,
            "email": email.lower().strip(),
            "dept":  dept
        })

    return cleaned, errors

clean, errors = clean_records(raw_records)
print(f"Clean records: {len(clean)}")
print(f"Errors: {len(errors)}")
for err in errors:
    print(f"  [{err['type']}] {err['record']}")

15. Interview Questions

Basic Level

Q1: What is the difference between a List and a Tuple?

Both are ordered sequences, but a list is mutable (can be changed) while a tuple is immutable (cannot be changed). Tuples are faster, use less memory, and can be used as dictionary keys.

Q2: How do you remove duplicates from a list?

Python
lst = [1, 2, 2, 3, 3, 4]

# Method 1: Set (doesn't preserve order)
unique = list(set(lst))

# Method 2: Preserve order (Python 3.7+)
unique = list(dict.fromkeys(lst))

# Method 3: Preserve order (any Python 3)
seen = set()
unique = [x for x in lst if not (x in seen or seen.add(x))]

Q3: What is the difference between remove(), pop(), and del in a list?

remove(x) removes the first occurrence of value x. pop(i) removes and returns the item at index i (defaults to last). del lst[i] deletes by index but returns nothing.

Q4: How do you create an empty set?

Python
s = set()    # ✅ Correct
s = {}       # ❌ This creates an empty DICT!

Q5: What are dict keys required to be?

Dict keys must be hashable — immutable types like str, int, float, bool, tuple. Lists, dicts, and sets cannot be keys because they are mutable.


Intermediate Level

Q6: What is the difference between dict.get() and dict[key]?

dict[key] raises a KeyError if the key doesn't exist. dict.get(key, default) returns the default value (or None) if the key is missing — no exception.

Q7: How do you merge two dictionaries in Python 3.9+?

Python
d1 = {"a": 1, "b": 2}
d2 = {"c": 3, "d": 4}
merged = d1 | d2          # New dict
d1 |= d2                  # In-place merge
merged = {**d1, **d2}     # Python 3.5+ unpacking

Q8: What is the difference between set.remove() and set.discard()?

remove(x) raises a KeyError if x is not in the set. discard(x) silently does nothing if x is missing. Use discard() when absence is not an error.

Q9: When would you use namedtuple over a regular tuple?

When you want named access instead of index-based access. namedtuple makes code more readable (point.x vs point[0]) while keeping tuple immutability and performance.

Q10: How is a set implemented internally in Python?

A set is implemented as a hash table (similar to dict but only stores keys). This gives O(1) average-case lookup, insert, and delete. Elements must be hashable.


Advanced Level

Q11: What is the time complexity difference between searching in a list vs a set?

List: O(n) — must scan every element. Set: O(1) average — uses hash table lookup. For 1 million elements, set lookup is ~1,000,000x faster than list.

Q12: What is a frozenset and when would you use it?

A frozenset is an immutable set. Since it's hashable, it can be used as a dictionary key or as a set element. Useful when you need set operations but the set itself shouldn't change.

Python
# Use case: Cache keyed by a set of features
cache = {}
def get_result(features):
    key = frozenset(features)   # Hashable!
    if key not in cache:
        cache[key] = expensive_compute(features)
    return cache[key]

Q13: Explain the memory layout difference between list and tuple.

Both store references to objects. But tuple is allocated as a fixed block of memory (no overhead for resizing), while list maintains extra capacity for future appends (over-allocates). Tuple is ~15-20% smaller than an equivalent list.

Q14: How does Python's dict maintain insertion order?

Since Python 3.7, dictionaries are implemented as compact hash tables that store keys in insertion order by maintaining a separate array of indices. The hash table maps hash values to this index array, preserving insertion order while keeping O(1) lookup.


Scenario-Based

Q15: You have a list of 10 million user IDs. You need to check if a given ID is blocked. Which structure do you use?

Python
# Use SET — O(1) lookup instead of O(n) for list
blocked_ids = set(load_blocked_ids_from_db())  # Convert once

def is_blocked(user_id):
    return user_id in blocked_ids   # O(1) — instant!

Q16: How would you group a list of words by their first letter?

Python
from collections import defaultdict

words = ["apple", "ant", "ball", "bear", "cat", "cobra"]
groups = defaultdict(list)
for word in words:
    groups[word[0]].append(word)
print(dict(groups))
# {'a': ['apple', 'ant'], 'b': ['ball', 'bear'], 'c': ['cat', 'cobra']}

Q17: Find all words that appear in both documents.

Python
doc1 = "the quick brown fox jumps over the lazy dog"
doc2 = "a fox and a dog are quick and brown"

words1 = set(doc1.split())
words2 = set(doc2.split())
common = words1 & words2
print(common)   # {'fox', 'quick', 'brown', 'dog'}

Q18: How would you implement a simple LRU Cache using a dictionary?

Python
from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = OrderedDict()

    def get(self, key):
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)   # Mark as recently used
        return self.cache[key]

    def put(self, key, value):
        self.cache[key] = value
        self.cache.move_to_end(key)
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)   # Remove least recently used

cache = LRUCache(3)
cache.put(1, "one")
cache.put(2, "two")
cache.put(3, "three")
print(cache.get(1))    # "one" — moves to end
cache.put(4, "four")   # Removes 2 (least recently used)
print(cache.get(2))    # -1 — evicted

16. Conclusion

Summary of All Four

SQL
┌─────────────────────────────────────────────────────────────┐
│                    CHOOSING YOUR DATA STRUCTURE             │
├─────────────┬───────────────────────────────────────────────┤
│ LIST        │ Use when order matters and you need to        │
│ [1,2,3]     │ add/remove/modify elements. The workhorse.   │
├─────────────┼───────────────────────────────────────────────┤
│ TUPLE       │ Use when data shouldn't change. Function      │
│ (1,2,3)     │ returns, DB rows, coordinates, dict keys.    │
├─────────────┼───────────────────────────────────────────────┤
│ SET         │ Use when uniqueness matters or you need       │
│ {1,2,3}     │ fast membership testing / set operations.    │
├─────────────┼───────────────────────────────────────────────┤
│ DICT        │ Use when you need to label data or look up   │
│ {"k":"v"}   │ values by a meaningful key. JSON, configs.   │
└─────────────┴───────────────────────────────────────────────┘

Key Takeaways

  • List — ordered, mutable, allows duplicates, indexed — your default go-to

  • Tuple — ordered, immutable, allows duplicates, indexed — for fixed data

  • Set — unordered, mutable, unique only, no indexing — for fast membership & math ops

  • Dict — ordered (3.7+), mutable, unique keys, keyed access — for labeled data

Final Rules to Live By

  1. Need to add/remove items? → List (or Dict if labeled)

  2. Need uniqueness? → Set

  3. Need instant lookup by key? → Dict or Set

  4. Data shouldn't change? → Tuple or frozenset

  5. Working with key-value data (JSON, configs)? → Dict

  6. Returning multiple values from a function? → Tuple

  7. Removing duplicates from a list? → Convert to Set

  8. Counting occurrences?collections.Counter

  9. Grouping by category?collections.defaultdict(list)

  10. Need set as dict key? → frozenset

"Use the right tool for the right job — and Python gives you four excellent tools."


Happy Coding! 🐍


Written by an experienced Python Developer and Data Scientist. Drop a comment below for questions or feedback!

0 likes

Rate this post

No rating

Tap a star to rate

0 comments

Latest comments

0 comments

No comments yet.

Keep building your data skillset

Explore more SQL, Python, analytics, and engineering tutorials.