Introduction
What is JSON?
JSON stands for JavaScript Object Notation. It is a lightweight data format used to store and exchange data between systems. Even though it has "JavaScript" in its name, JSON is completely language-independent. It works perfectly with Python, Java, Go, Node.js, and almost every other language.
JSON looks like a Python dictionary. It stores data as key-value pairs and is very easy to read for both humans and machines.
Here is a simple JSON example:
{
"name": "Alice",
"age": 30,
"is_active": true,
"skills": ["Python", "SQL", "Machine Learning"]
}
Why JSON is Important
JSON is the backbone of modern software. Every web API, configuration system, and data pipeline relies on JSON in some form. As a Python developer or data scientist, you will deal with JSON every single day.
Here are some reasons JSON is critical:
It is the most common format for REST APIs
It is human-readable and easy to debug
It is supported natively in Python without third-party libraries
It works across all platforms and languages
It is lightweight and fast to transfer over a network
Real-World Usage
JSON is used everywhere in software:
REST APIs return responses in JSON format
Configuration files (like package.json, settings.json) are written in JSON
NoSQL databases like MongoDB store data as JSON-like documents
Log files and event data are often in JSON format
Machine learning pipelines pass metadata in JSON
Webhooks send payloads as JSON
CI/CD systems like GitHub Actions use JSON and YAML for configuration
Basic Concepts
JSON Data Types
JSON supports these data types. Each one maps to a Python type:
JSON Type | Python Type |
|---|---|
string | str |
number (int) | int |
number (float) | float |
boolean (true) | True (bool) |
boolean (false) | False (bool) |
null | None |
array | list |
object | dict |
These mappings are important. When Python converts JSON to Python objects and back, it follows these rules automatically.
JSON Syntax Rules
Keys must always be strings and must be wrapped in double quotes
Strings must use double quotes, not single quotes
No trailing commas after the last item
No comments are allowed inside JSON
Boolean values are lowercase: true, false (not True, False)
Null is lowercase: null (not None)
Valid JSON example:
{
"user_id": 101,
"username": "john_doe",
"is_admin": false,
"score": 98.5,
"tags": ["python", "developer"],
"address": null
}
Python's Built-in JSON Module
Python provides a built-in module called json. You do not need to install anything. Just import it:
import json
The json module provides four main functions you will use constantly:
json.loads()— Convert JSON string to Python objectjson.dumps()— Convert Python object to JSON stringjson.load()— Read JSON from a filejson.dump()— Write JSON to a file
Think of the "s" at the end as "string". loads and dumps deal with strings. load and dump deal with files.
All Functions and Features
1. json.loads() — Parse JSON String to Python
This function takes a JSON-formatted string and converts it into a Python object.
import json
json_string = '{"name": "Alice", "age": 30, "active": true}'
data = json.loads(json_string)
print(data) # {'name': 'Alice', 'age': 30, 'active': True}
print(type(data)) # <class 'dict'>
print(data["name"]) # Alice
Line-by-line explanation:
Line 1: Import the json module
Line 3: A raw JSON string (must use double quotes inside)
Line 4:
json.loads()parses the string and returns a Python dictLine 6: We can now access the data like a normal dictionary
Real-world use: You receive an API response as a string. You use json.loads() to convert it into a Python dictionary so you can work with the data.
2. json.dumps() — Convert Python Object to JSON String
This function converts a Python object (dict, list, etc.) into a JSON-formatted string.
import json
data = {
"name": "Bob",
"age": 25,
"active": False,
"score": None
}
json_string = json.dumps(data)
print(json_string)
# Output: {"name": "Bob", "age": 25, "active": false, "score": null}
Notice how Python's False became false and None became null. This is the automatic type conversion.
Using indent for Pretty Printing:
json_string = json.dumps(data, indent=4)
print(json_string)
Output:
{
"name": "Bob",
"age": 25,
"active": false,
"score": null
}
Using sort_keys:
json_string = json.dumps(data, indent=4, sort_keys=True)
This sorts all the keys alphabetically. Useful when you need consistent output.
Using separators (compact output):
json_string = json.dumps(data, separators=(",", ":"))
# {"name":"Bob","age":25,"active":false,"score":null}
Using separators=(",", ":") removes all extra whitespace. This gives you the smallest possible JSON string — useful for network transfer where size matters.
3. json.load() — Read JSON from a File
This function reads a JSON file and converts it directly into a Python object.
import json
with open("config.json", "r") as file:
data = json.load(file)
print(data)
Line-by-line explanation:
Line 3: Open the file in read mode using a context manager (
withstatement)Line 4:
json.load()reads the file and parses the JSON automaticallyLine 6:
datais now a Python dictionary or list depending on the file
The context manager ensures the file is closed properly after reading, even if an error occurs.
Real-world use: Your application has a settings.json or config.json file. At startup, you read it with json.load() to load all the configuration values.
4. json.dump() — Write Python Object to a File
This function converts a Python object to JSON and writes it directly to a file.
import json
data = {
"database": "postgres",
"host": "localhost",
"port": 5432
}
with open("config.json", "w") as file:
json.dump(data, file, indent=4)
Line-by-line explanation:
Lines 3–7: A Python dictionary with config data
Line 9: Open the file in write mode
Line 10:
json.dump()converts data to JSON and writes it to the fileindent=4makes it human-readable in the file
Real-world use: Your application generates a report or saves user preferences. You use json.dump() to write that data to a file so it persists.
5. json.JSONDecodeError — Handling Parsing Errors
When you try to parse invalid JSON, Python raises json.JSONDecodeError.
import json
bad_json = "{'name': 'Alice'}" # Wrong: single quotes
try:
data = json.loads(bad_json)
except json.JSONDecodeError as e:
print(f"Error: {e}")
print(f"Line: {e.lineno}, Column: {e.colno}")
Output:
Error: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
Line: 1, Column: 2
Always wrap json.loads() in a try-except block when parsing data from external sources.
6. Custom Serialization with default Parameter
By default, Python cannot serialize custom objects like datetime, Decimal, or custom classes. The default parameter lets you handle this.
import json
from datetime import datetime
data = {
"username": "alice",
"created_at": datetime(2024, 5, 15, 10, 30)
}
def custom_serializer(obj):
if isinstance(obj, datetime):
return obj.isoformat()
raise TypeError(f"Object of type {type(obj)} is not JSON serializable")
json_string = json.dumps(data, default=custom_serializer, indent=4)
print(json_string)
Output:
{
"username": "alice",
"created_at": "2024-05-15T10:30:00"
}
7. Custom Deserialization with object_hook
The object_hook parameter lets you customize how JSON objects are converted when parsing.
import json
from datetime import datetime
def date_parser(obj):
if "created_at" in obj:
obj["created_at"] = datetime.fromisoformat(obj["created_at"])
return obj
json_string = '{"username": "alice", "created_at": "2024-05-15T10:30:00"}'
data = json.loads(json_string, object_hook=date_parser)
print(data["created_at"]) # 2024-05-15 10:30:00
print(type(data["created_at"])) # <class 'datetime.datetime'>
8. json.JSONEncoder Class — Full Custom Encoder
For more control, you can subclass json.JSONEncoder.
import json
from decimal import Decimal
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Decimal):
return float(obj)
if isinstance(obj, set):
return list(obj)
return super().default(obj)
data = {
"price": Decimal("19.99"),
"tags": {"python", "json", "tutorial"}
}
json_string = json.dumps(data, cls=CustomEncoder, indent=4)
print(json_string)
Output:
{
"price": 19.99,
"tags": ["python", "json", "tutorial"]
}
9. json.JSONDecoder Class — Full Custom Decoder
import json
class StrictDecoder(json.JSONDecoder):
def __init__(self, *args, **kwargs):
super().__init__(object_hook=self.parse_object, *args, **kwargs)
def parse_object(self, obj):
return {k: v.upper() if isinstance(v, str) else v
for k, v in obj.items()}
json_string = '{"name": "alice", "city": "london"}'
data = json.loads(json_string, cls=StrictDecoder)
print(data) # {'name': 'ALICE', 'city': 'LONDON'}
Intermediate Usage
Reading and Writing Nested JSON
import json
user_json = '''
{
"user": {
"id": 1,
"name": "Alice",
"address": {
"city": "New York",
"zip": "10001"
},
"orders": [
{"order_id": 101, "total": 59.99},
{"order_id": 102, "total": 120.00}
]
}
}
'''
data = json.loads(user_json)
# Access nested data
city = data["user"]["address"]["city"]
print(city) # New York
# Loop through orders
for order in data["user"]["orders"]:
print(f"Order {order['order_id']}: ${order['total']}")
Updating JSON Data
import json
# Read
with open("users.json", "r") as f:
data = json.load(f)
# Modify
data["users"].append({"id": 3, "name": "Charlie"})
# Write back
with open("users.json", "w") as f:
json.dump(data, f, indent=4)
Merging Two JSON Objects
import json
json1 = '{"name": "Alice", "age": 30}'
json2 = '{"city": "NYC", "age": 31}'
dict1 = json.loads(json1)
dict2 = json.loads(json2)
merged = {**dict1, **dict2}
print(merged)
# {'name': 'Alice', 'age': 31, 'city': 'NYC'}
When both dicts have the same key (age), the second one wins.
Filtering JSON Data
import json
json_data = '''
[
{"name": "Alice", "score": 88},
{"name": "Bob", "score": 45},
{"name": "Charlie", "score": 92},
{"name": "Diana", "score": 70}
]
'''
students = json.loads(json_data)
# Filter students with score above 75
top_students = [s for s in students if s["score"] > 75]
print(json.dumps(top_students, indent=2))
Working with JSON Lines (JSONL) Format
JSONL is a format where each line is a separate JSON object — very common in log files and streaming data.
import json
# Reading JSONL
with open("events.jsonl", "r") as f:
events = [json.loads(line) for line in f if line.strip()]
# Writing JSONL
events = [
{"event": "login", "user": "alice"},
{"event": "purchase", "user": "bob", "amount": 50}
]
with open("events.jsonl", "w") as f:
for event in events:
f.write(json.dumps(event) + "\n")
Safely Accessing Nested Keys
Avoid KeyError when accessing nested JSON by using .get() with a default value.
data = {
"user": {
"name": "Alice",
"address": {
"city": "NYC"
}
}
}
# Risky - raises KeyError if "phone" doesn't exist
# phone = data["user"]["phone"]
# Safe - returns None if key doesn't exist
phone = data.get("user", {}).get("phone", "Not provided")
print(phone) # Not provided
Advanced Concepts
Custom Serialization for Complex Python Objects
Serializing Custom Classes:
import json
class User:
def __init__(self, name, age, email):
self.name = name
self.age = age
self.email = email
class UserEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, User):
return {
"__type__": "User",
"name": obj.name,
"age": obj.age,
"email": obj.email
}
return super().default(obj)
def user_decoder(obj):
if obj.get("__type__") == "User":
return User(obj["name"], obj["age"], obj["email"])
return obj
# Serialize
user = User("Alice", 30, "alice@example.com")
json_str = json.dumps(user, cls=UserEncoder)
print(json_str)
# Deserialize back to User object
restored = json.loads(json_str, object_hook=user_decoder)
print(restored.name) # Alice
print(type(restored)) # <class '__main__.User'>
Serializing NumPy and Pandas Objects
import json
import numpy as np
class NumpyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
if isinstance(obj, np.floating):
return float(obj)
if isinstance(obj, np.ndarray):
return obj.tolist()
return super().default(obj)
data = {
"scores": np.array([88, 92, 75, 95]),
"mean": np.float64(87.5),
"count": np.int64(4)
}
json_str = json.dumps(data, cls=NumpyEncoder, indent=4)
print(json_str)
For Pandas DataFrame:
import pandas as pd
df = pd.DataFrame({"name": ["Alice", "Bob"], "score": [88, 92]})
# To JSON string
json_str = df.to_json(orient="records", indent=2)
print(json_str)
# From JSON string back to DataFrame
df_restored = pd.read_json(json_str, orient="records")
Performance Optimization
1. ujson (Ultra JSON) — 2-5x faster than standard json:
import ujson # pip install ujson
data = {"key": "value", "numbers": list(range(1000))}
json_str = ujson.dumps(data)
parsed = ujson.loads(json_str)
2. orjson — Fastest option, returns bytes:
import orjson # pip install orjson
data = {"key": "value"}
json_bytes = orjson.dumps(data) # Returns bytes, not string
parsed = orjson.loads(json_bytes) # Accepts bytes or str
3. Streaming Large JSON with ijson:
import ijson # pip install ijson
with open("huge_data.json", "rb") as f:
for item in ijson.items(f, "records.item"):
# Process one item at a time
process(item)
JSON Schema Validation
import json
import jsonschema # pip install jsonschema
from jsonschema import validate
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer", "minimum": 0},
"email": {"type": "string"}
},
"required": ["name", "age"]
}
valid_data = {"name": "Alice", "age": 30}
invalid_data = {"name": "Bob", "age": -5}
# Valid - no error
validate(instance=valid_data, schema=schema)
print("Valid data passed")
# Invalid - raises ValidationError
try:
validate(instance=invalid_data, schema=schema)
except jsonschema.ValidationError as e:
print(f"Validation Error: {e.message}")
Using Pydantic for JSON Validation
from pydantic import BaseModel
from typing import Optional, List
class Address(BaseModel):
city: str
zip_code: str
class User(BaseModel):
name: str
age: int
email: str
address: Optional[Address] = None
skills: List[str] = []
# Parse JSON into a validated Pydantic model
json_str = '''
{
"name": "Alice",
"age": 30,
"email": "alice@example.com",
"address": {"city": "NYC", "zip_code": "10001"},
"skills": ["Python", "SQL"]
}
'''
user = User.model_validate_json(json_str)
print(user.name) # Alice
print(user.address.city) # NYC
# Convert back to JSON
json_output = user.model_dump_json(indent=4)
print(json_output)
Deep Copying JSON Data
import json
import copy
original = {"name": "Alice", "scores": [88, 92, 75]}
# Shallow copy - nested objects are still shared
shallow = original.copy()
# Deep copy using JSON trick
deep = json.loads(json.dumps(original))
# Best deep copy using copy module
deep2 = copy.deepcopy(original)
Real-World Use Cases
Use Case 1 — Consuming a REST API
import requests
response = requests.get("https://api.example.com/users/1")
response.raise_for_status() # Raise error if HTTP error occurred
data = response.json() # requests parses JSON directly
for item in data["results"]:
print(item["name"])
Use Case 2 — Configuration Files
config.json:
{
"database": {
"host": "localhost",
"port": 5432,
"name": "myapp_db"
},
"cache": {
"backend": "redis",
"timeout": 300
},
"debug": false
}
Python code to read config:
import json
import os
def load_config(env="development"):
config_file = f"config.{env}.json"
if not os.path.exists(config_file):
config_file = "config.json"
with open(config_file, "r", encoding="utf-8") as f:
config = json.load(f)
return config
config = load_config()
db_host = config["database"]["host"]
print(f"Connecting to: {db_host}")
Use Case 3 — Data Pipeline (ETL)
import json
from datetime import datetime
def extract(input_file):
with open(input_file, "r") as f:
return json.load(f)
def transform(data):
transformed = []
for record in data:
if record.get("status") == "active":
transformed.append({
"id": record["id"],
"full_name": f"{record['first_name']} {record['last_name']}",
"email": record["email"].lower().strip(),
"processed_at": datetime.now().isoformat()
})
return transformed
def load(data, output_file):
with open(output_file, "w") as f:
json.dump(data, f, indent=4)
print(f"Loaded {len(data)} records to {output_file}")
# Run the pipeline
raw_data = extract("raw_users.json")
clean_data = transform(raw_data)
load(clean_data, "processed_users.json")
Use Case 4 — JSON Structured Logging
import json
import logging
from datetime import datetime
class JSONFormatter(logging.Formatter):
def format(self, record):
log_entry = {
"timestamp": datetime.utcnow().isoformat(),
"level": record.levelname,
"message": record.getMessage(),
"module": record.module,
"function": record.funcName,
"line": record.lineno
}
if record.exc_info:
log_entry["exception"] = self.formatException(record.exc_info)
return json.dumps(log_entry)
logger = logging.getLogger("myapp")
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger.addHandler(handler)
logger.setLevel(logging.INFO)
logger.info("User logged in")
logger.error("Database connection failed")
Practical Examples
Practical Example 1 — Student Grade Report System
import json
from statistics import mean, stdev
from datetime import datetime
students_json = '''
[
{"id": 1, "name": "Alice", "scores": [88, 92, 85, 90]},
{"id": 2, "name": "Bob", "scores": [70, 65, 72, 68]},
{"id": 3, "name": "Charlie", "scores": [95, 98, 92, 96]},
{"id": 4, "name": "Diana", "scores": [55, 60, 58, 62]}
]
'''
# Step 1: Parse the JSON
students = json.loads(students_json)
# Step 2: Process each student
report = {
"generated_at": datetime.now().isoformat(),
"total_students": len(students),
"results": []
}
for student in students:
avg = mean(student["scores"])
if avg >= 90:
grade = "A"
elif avg >= 80:
grade = "B"
elif avg >= 70:
grade = "C"
else:
grade = "F"
report["results"].append({
"id": student["id"],
"name": student["name"],
"average": round(avg, 2),
"grade": grade,
"highest": max(student["scores"]),
"lowest": min(student["scores"])
})
# Step 3: Add class summary
all_averages = [r["average"] for r in report["results"]]
report["class_summary"] = {
"class_average": round(mean(all_averages), 2),
"std_deviation": round(stdev(all_averages), 2),
"top_student": max(report["results"], key=lambda x: x["average"])["name"]
}
# Step 4: Save to file
with open("grade_report.json", "w") as f:
json.dump(report, f, indent=4)
# Step 5: Print summary
print(json.dumps(report["class_summary"], indent=4))
Practical Example 2 — API Client with Error Handling
import json
import urllib.request
import urllib.error
from typing import Optional, Dict, Any
class APIClient:
def __init__(self, base_url: str, api_key: Optional[str] = None):
self.base_url = base_url.rstrip("/")
self.api_key = api_key
def get(self, endpoint: str) -> Optional[Dict[str, Any]]:
url = f"{self.base_url}/{endpoint.lstrip('/')}"
headers = {"Content-Type": "application/json"}
if self.api_key:
headers["Authorization"] = f"Bearer {self.api_key}"
request = urllib.request.Request(url, headers=headers)
try:
with urllib.request.urlopen(request, timeout=10) as response:
raw_data = response.read()
return json.loads(raw_data)
except urllib.error.HTTPError as e:
print(f"HTTP Error {e.code}: {e.reason}")
return None
except urllib.error.URLError as e:
print(f"Network Error: {e.reason}")
return None
except json.JSONDecodeError as e:
print(f"Invalid JSON response: {e}")
return None
def post(self, endpoint: str, payload: Dict) -> Optional[Dict]:
url = f"{self.base_url}/{endpoint.lstrip('/')}"
data = json.dumps(payload).encode("utf-8")
headers = {
"Content-Type": "application/json",
"Accept": "application/json"
}
if self.api_key:
headers["Authorization"] = f"Bearer {self.api_key}"
request = urllib.request.Request(
url, data=data, headers=headers, method="POST"
)
try:
with urllib.request.urlopen(request, timeout=10) as response:
return json.loads(response.read())
except Exception as e:
print(f"Error: {e}")
return None
# Usage
client = APIClient("https://jsonplaceholder.typicode.com")
user = client.get("/users/1")
if user:
print(f"User: {user['name']}, Email: {user['email']}")
Practical Example 3 — JSON-Based Cache System
import json
import os
import time
from typing import Any, Optional
class JSONCache:
def __init__(self, cache_file: str = "cache.json", ttl: int = 3600):
self.cache_file = cache_file
self.ttl = ttl
self._load()
def _load(self):
if os.path.exists(self.cache_file):
with open(self.cache_file, "r") as f:
self.cache = json.load(f)
else:
self.cache = {}
def _save(self):
with open(self.cache_file, "w") as f:
json.dump(self.cache, f, indent=2)
def get(self, key: str) -> Optional[Any]:
entry = self.cache.get(key)
if entry is None:
return None
if time.time() > entry["expires_at"]:
del self.cache[key]
self._save()
return None
return entry["value"]
def set(self, key: str, value: Any) -> None:
self.cache[key] = {
"value": value,
"expires_at": time.time() + self.ttl,
"created_at": time.time()
}
self._save()
def delete(self, key: str) -> bool:
if key in self.cache:
del self.cache[key]
self._save()
return True
return False
def clear_expired(self):
now = time.time()
expired_keys = [k for k, v in self.cache.items()
if v["expires_at"] < now]
for key in expired_keys:
del self.cache[key]
if expired_keys:
self._save()
return len(expired_keys)
# Usage
cache = JSONCache(ttl=300) # 5 minute TTL
cache.set("user:101", {"name": "Alice", "email": "alice@example.com"})
user = cache.get("user:101")
if user:
print(f"Cache hit: {user['name']}")
else:
print("Cache miss - fetch from database")
Edge Cases and Errors
Common Mistake 1 — Single Quotes in JSON
# WRONG - This is NOT valid JSON
bad_json = "{'name': 'Alice'}"
# json.loads(bad_json) # Raises JSONDecodeError
# RIGHT - JSON requires double quotes
good_json = '{"name": "Alice"}'
data = json.loads(good_json)
Common Mistake 2 — Trailing Commas
# WRONG - Trailing comma is invalid in JSON
bad_json = '{"name": "Alice", "age": 30,}'
# RIGHT
good_json = '{"name": "Alice", "age": 30}'
Common Mistake 3 — Serializing Non-Serializable Objects
import json
from datetime import datetime
data = {"time": datetime.now()}
# WRONG - TypeError
# json.dumps(data)
# RIGHT
def custom_serializer(obj):
if isinstance(obj, datetime):
return obj.isoformat()
raise TypeError(f"Type {type(obj)} not serializable")
json.dumps(data, default=custom_serializer)
Common Mistake 4 — Wrong File Encoding
# WRONG - Can fail on Windows
with open("data.json", "r") as f:
data = json.load(f)
# RIGHT - Always specify encoding
with open("data.json", "r", encoding="utf-8") as f:
data = json.load(f)
Common Mistake 5 — Mutating JSON While Iterating
import json
data = json.loads('[{"id": 1}, {"id": 2}, {"id": 3}]')
# WRONG - Do not modify list while iterating
# for item in data:
# if item["id"] == 2:
# data.remove(item)
# RIGHT - Filter to create a new list
data = [item for item in data if item["id"] != 2]
Debugging Tips
1. Use indent=4 with default=str for quick debugging:
print(json.dumps(data, indent=4, default=str))
2. Detailed error location:
try:
data = json.loads(raw_text)
except json.JSONDecodeError as e:
print(f"JSON Error at position {e.pos}: {e.msg}")
print(f"Around: {raw_text[max(0, e.pos-20):e.pos+20]}")
3. Handle BOM (Byte Order Mark) in some files:
with open("data.json", "r", encoding="utf-8-sig") as f:
data = json.load(f)
Pro Developer Insights
1. Always Validate External JSON Input
import json
from jsonschema import validate, ValidationError
def safe_parse_user(json_string: str) -> dict:
try:
data = json.loads(json_string)
except json.JSONDecodeError as e:
raise ValueError(f"Invalid JSON: {e}")
schema = {
"type": "object",
"required": ["name", "email"],
"properties": {
"name": {"type": "string", "minLength": 1},
"email": {"type": "string"}
}
}
try:
validate(data, schema)
except ValidationError as e:
raise ValueError(f"Schema validation failed: {e.message}")
return data
2. Atomic File Writes to Prevent Corruption
import json
import os
import tempfile
def safe_json_write(filepath: str, data: dict) -> None:
dir_name = os.path.dirname(filepath) or "."
with tempfile.NamedTemporaryFile(
"w",
dir=dir_name,
suffix=".tmp",
delete=False,
encoding="utf-8"
) as tmp_file:
json.dump(data, tmp_file, indent=4)
tmp_path = tmp_file.name
os.replace(tmp_path, filepath) # Atomic rename
safe_json_write("config.json", {"version": "2.0"})
3. Never Log Sensitive Data from JSON
import json
import logging
logger = logging.getLogger(__name__)
def process_user(user_data: dict):
# WRONG - logs password in plaintext
# logger.debug(json.dumps(user_data))
# RIGHT - remove sensitive fields before logging
safe_log = {k: v for k, v in user_data.items()
if k not in ("password", "api_key", "token", "secret")}
logger.debug(json.dumps(safe_log))
Comparison with Alternatives
JSON vs YAML vs CSV vs XML vs pickle
Feature | JSON | YAML | CSV | XML | pickle |
|---|---|---|---|---|---|
Readability | Good | Excellent | Best (flat) | Poor | None (binary) |
Comments | No | Yes | No | Yes | No |
Nesting | Yes | Yes | No | Yes | Yes |
Speed | Fast | Slower | Fast | Slow | Fast |
Cross-language | Yes | Yes | Yes | Yes | Python only |
Security | Safe | Safe | Safe | Safe | NEVER untrusted |
Best for | APIs, web | Config/DevOps | Reports | Enterprise | Python-internal |
Security Warning: Never use
pickleto deserialize data from untrusted sources. It can execute arbitrary code.
Data Science Perspective
JSON in Data Analysis with Pandas
import pandas as pd
import json
from pandas import json_normalize
# Read JSON directly into DataFrame
df = pd.read_json("data.json")
# From a JSON string
json_str = '[{"name": "Alice", "score": 88}, {"name": "Bob", "score": 72}]'
df = pd.read_json(json_str)
# Normalize nested JSON into flat DataFrame
nested_json = [
{"name": "Alice", "address": {"city": "NYC", "zip": "10001"}},
{"name": "Bob", "address": {"city": "LA", "zip": "90001"}}
]
flat_df = json_normalize(nested_json)
print(flat_df.columns.tolist())
# ['name', 'address.city', 'address.zip']
# Export DataFrame to JSON
df.to_json("output.json", orient="records", indent=2)
JSON in Machine Learning — Saving Experiment Results
import json
from datetime import datetime
def save_experiment(model_name, params, metrics):
result = {
"experiment_id": f"exp_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
"model_name": model_name,
"parameters": params,
"metrics": metrics,
"timestamp": datetime.now().isoformat()
}
try:
with open("experiments.json", "r") as f:
experiments = json.load(f)
except FileNotFoundError:
experiments = []
experiments.append(result)
with open("experiments.json", "w") as f:
json.dump(experiments, f, indent=4)
save_experiment(
"RandomForest",
{"n_estimators": 100, "max_depth": 5},
{"accuracy": 0.92, "f1_score": 0.91}
)
Interview Questions
Basic Level
Q1. What is the difference between json.loads() and json.load()? json.loads() parses a JSON string. json.load() reads from a file object. The "s" stands for "string".
Q2. What Python types does json.dumps() produce for Python None, True, and False? None → null, True → true, False → false.
Q3. What error does Python raise when parsing invalid JSON? json.JSONDecodeError, which is a subclass of ValueError.
Q4. How do you make JSON output human-readable? Use the indent parameter: json.dumps(data, indent=4)
Q5. What happens if you try to json.dumps() a datetime object? It raises TypeError: Object of type datetime is not JSON serializable.
Intermediate Level
Q6. How do you serialize a Python class to JSON? Create a custom encoder by subclassing json.JSONEncoder and overriding the default() method. Pass it with cls=YourEncoder to json.dumps().
Q7. What is the object_hook parameter in json.loads()? A function called for every JSON object (dict) parsed. Used for custom deserialization, like converting date strings back to datetime objects.
Q8. How do you read a very large JSON file without running out of memory? Use the ijson library for streaming parsing. It reads and processes the file one item at a time.
Q9. What is JSON Lines (JSONL) and when do you use it? JSONL is a format where each line is a separate valid JSON object. Used for log files and large datasets because you can process line by line without loading the whole file.
Q10. How do you perform an atomic write of a JSON file? Write to a temp file first, then use os.replace() to atomically rename it to the target file.
Advanced Level
Q11. Why should you never use pickle instead of JSON for external data? Unpickling untrusted data can execute arbitrary Python code — a serious security vulnerability. JSON only handles data types, not executable code.
Q12. How does Pydantic improve on the standard json module? Pydantic adds type validation, automatic type coercion, detailed error messages, IDE support via type hints, and makes JSON schemas self-documenting.
Q13. How would you handle circular references when serializing to JSON? The standard json module raises ValueError for circular references. You must detect cycles manually or restructure the data to avoid them.
Q14. How do you efficiently serialize NumPy arrays to JSON? Create a custom JSONEncoder that converts numpy.ndarray to list and numpy number types to Python int or float.
Scenario-Based Questions
Q15. You receive JSON from untrusted users. What steps do you take? (1) Wrap json.loads() in try-except. (2) Validate with jsonschema or Pydantic. (3) Sanitize strings. (4) Limit JSON depth. (5) Never pass raw data to a database.
Q16. Your ETL pipeline processes a 50GB JSON file. Server has 8GB RAM. How? Use ijson for streaming parsing. Process records one at a time and write results immediately. Never load the whole file.
Q17. Your application's JSON config file got corrupted during a power failure. How do you prevent this in future? Use the atomic write pattern: write to a temp file, then use os.replace() to rename it atomically.
Conclusion
Summary of Key Learnings
json.loads()andjson.dumps()are for strings.json.load()andjson.dump()are for files.Python automatically maps JSON types to Python types:
null→None,true→True,false→False.Custom objects like
datetime,Decimal, and numpy arrays need custom encoders.Always wrap
json.loads()in try-except when parsing data from external sources.For large files, use
ijsonfor streaming. For speed, useorjsonorujson.Use Pydantic for production APIs where you need validation and type safety.
Always use
encoding="utf-8"when reading and writing JSON files.JSONL format is better than a single large JSON array for streaming and log data.
Never use pickle for data that crosses system boundaries — JSON is the safe choice.
Use atomic writes to prevent JSON file corruption in production.
When to Use JSON in Real Projects
Use JSON when:
Building or consuming REST APIs
Creating configuration files for your application
Storing document-style data in NoSQL databases
Passing structured data between microservices
Saving machine learning model configurations and experiment results
Generating reports and exporting data
Final Practical Advice
Start with the basics: loads, dumps, load, dump. Get comfortable with those four functions first. Then learn how to handle custom objects and edge cases.
In production code, always validate your JSON input. Use Pydantic for APIs. Use ijson for large files. Use orjson if performance matters.
JSON is simple on the surface but there are many edge cases and pitfalls. By understanding this guide fully, you are now equipped to handle JSON in any real-world Python project — from a simple script to a production microservice handling millions of requests.
Happy Coding!