Back to all posts

Row-Based vs Column-Based (Columnar) Storage

Database me table hume: Rows aur columns ke form me dikhti hai. Lekin internally data store hone ka tareeka alag hota hai. Do main storage types hote hain:…

Database me table hume:

Rows aur columns ke form me dikhti hai.

Lekin internally data store hone ka tareeka alag hota hai.

Do main storage types hote hain:

1️⃣ Row-Based Storage
2️⃣ Columnar Storage


Row-Based Storage (Traditional / OLTP Friendly)

Isme data row by row store hota hai.

Example table:

Transaction_IDProduct_IDQuantityPrice
11012500
21021300

Internally storage aise hota hai:

Block 1 → (1, 101, 2, 500)
Block 2 → (2, 102, 1, 300)

Har row ek unit me store hoti hai.


Kab Useful Hai?

✔ OLTP systems
✔ Transactional processing
✔ Insert / Update heavy workload

Example:

New sale insert karna
→ Ek hi row update hogi

Row-based storage perfect hai.


Problem in Analytics

Agar tum query karo:

CSS
SELECT Product_ID, Quantity FROM Sales;

Aur table me 100 columns hain…

Row-based system:

👉 Har row ka 100 columns read karega
👉 Memory me bhejega
👉 Fir unwanted columns ignore karega

Data zyada process hota hai → slow query


Columnar Storage (Modern DW Friendly)

Isme data column by column store hota hai.

Example:

Block 1 → All Transaction_ID
Block 2 → All Product_ID
Block 3 → All Quantity
Block 4 → All Price

Agar tum query karo:

CSS
SELECT Product_ID, Quantity FROM Sales;

System sirf:

✔ Product_ID block
✔ Quantity block

Read karega.

Baaki 98 columns touch bhi nahi karega.


🚀 Why Columnar Storage Faster Hai?

1️⃣ Less Data Read

Agar 100 columns me se 5 chahiye
→ Sirf 5 columns read honge

Memory load kam
Processing fast


2️⃣ Better Compression

Columnar storage me:

✔ Ek block me same data type hota hai
✔ Similar values clustered hote hain

Example:

Gender column:

M M M M F F F F

Isko compress karna easy hai.

Less storage
Faster scan


3️⃣ Aggregation Fast

Data Warehouse queries mostly:

SUM()
COUNT()
AVG()
GROUP BY

Columnar storage me:

Pure column contiguous hota hai
Aggregation super fast hoti hai.


Real Example

Imagine:

Table = 100 columns
Rows = 100 million

Tum query karte ho 5 columns.

Row-based:

100 million × 100 columns scan

Column-based:

100 million × 5 columns scan

Huge difference 🚀


OLTP vs OLAP Storage Logic

FeatureRow-BasedColumn-Based
Best ForOLTPOLAP
Insert SpeedFastModerate
Update SpeedFastModerate
Read AggregationSlowerVery Fast
CompressionLowHigh

Kaun Use Karta Hai Columnar Storage?

Modern cloud DW systems use it:

✔ Snowflake
✔ Azure Synapse
✔ Redshift
✔ BigQuery

Ye sab columnar storage use karte hain.


Simple Analogy

Row-based:

Ek file me student ka pura record ek line me likha hai.

Column-based:

Ek file me sirf marks,
ek file me sirf names,
ek file me sirf roll numbers.

Agar tumhe sirf marks chahiye?
Column-based best.

Keep building your data skillset

Explore more SQL, Python, analytics, and engineering tutorials.