Database me table hume:
Rows aur columns ke form me dikhti hai.
Lekin internally data store hone ka tareeka alag hota hai.
Do main storage types hote hain:
1️⃣ Row-Based Storage
2️⃣ Columnar Storage
Row-Based Storage (Traditional / OLTP Friendly)
Isme data row by row store hota hai.
Example table:
| Transaction_ID | Product_ID | Quantity | Price |
|---|---|---|---|
| 1 | 101 | 2 | 500 |
| 2 | 102 | 1 | 300 |
Internally storage aise hota hai:
Block 1 → (1, 101, 2, 500)
Block 2 → (2, 102, 1, 300)
Har row ek unit me store hoti hai.
Kab Useful Hai?
✔ OLTP systems
✔ Transactional processing
✔ Insert / Update heavy workload
Example:
New sale insert karna
→ Ek hi row update hogi
Row-based storage perfect hai.
Problem in Analytics
Agar tum query karo:
SELECT Product_ID, Quantity FROM Sales;
Aur table me 100 columns hain…
Row-based system:
👉 Har row ka 100 columns read karega
👉 Memory me bhejega
👉 Fir unwanted columns ignore karega
Data zyada process hota hai → slow query
Columnar Storage (Modern DW Friendly)
Isme data column by column store hota hai.
Example:
Block 1 → All Transaction_ID
Block 2 → All Product_ID
Block 3 → All Quantity
Block 4 → All Price
Agar tum query karo:
SELECT Product_ID, Quantity FROM Sales;
System sirf:
✔ Product_ID block
✔ Quantity block
Read karega.
Baaki 98 columns touch bhi nahi karega.
🚀 Why Columnar Storage Faster Hai?
1️⃣ Less Data Read
Agar 100 columns me se 5 chahiye
→ Sirf 5 columns read honge
Memory load kam
Processing fast
2️⃣ Better Compression
Columnar storage me:
✔ Ek block me same data type hota hai
✔ Similar values clustered hote hain
Example:
Gender column:
M M M M F F F F
Isko compress karna easy hai.
Less storage
Faster scan
3️⃣ Aggregation Fast
Data Warehouse queries mostly:
SUM()
COUNT()
AVG()
GROUP BY
Columnar storage me:
Pure column contiguous hota hai
Aggregation super fast hoti hai.
Real Example
Imagine:
Table = 100 columns
Rows = 100 million
Tum query karte ho 5 columns.
Row-based:
100 million × 100 columns scan
Column-based:
100 million × 5 columns scan
Huge difference 🚀
OLTP vs OLAP Storage Logic
| Feature | Row-Based | Column-Based |
|---|---|---|
| Best For | OLTP | OLAP |
| Insert Speed | Fast | Moderate |
| Update Speed | Fast | Moderate |
| Read Aggregation | Slower | Very Fast |
| Compression | Low | High |
Kaun Use Karta Hai Columnar Storage?
Modern cloud DW systems use it:
✔ Snowflake
✔ Azure Synapse
✔ Redshift
✔ BigQuery
Ye sab columnar storage use karte hain.
Simple Analogy
Row-based:
Ek file me student ka pura record ek line me likha hai.
Column-based:
Ek file me sirf marks,
ek file me sirf names,
ek file me sirf roll numbers.
Agar tumhe sirf marks chahiye?
Column-based best.