Back to all posts

Indexes in Data Warehouse

Database Me Data Kaise Store Hota Hai? Table ke rows disk par randomly store hote hain. Example: Row Customer_ID Sales 1 8 100 2 5 200 3 4 300 4 8 150 Agar…

Database Me Data Kaise Store Hota Hai?

Table ke rows disk par randomly store hote hain.

Example:

RowCustomer_IDSales
18100
25200
34300
48150

Agar tum query karo:

SQL
SELECT * FROM Sales WHERE Customer_ID = 5;

Toh kya hoga?

Database:

👉 Row 1 check karega
👉 Row 2 check karega
👉 Row 3 check karega
👉 Row 4 check karega

Isko bolte hain:

🔴 Full Table Scan

Large table me ye bahut slow hota hai.


Index Kya Hota Hai?

Index ek alag structure hota hai jo:

✔ Data ko sorted order me store karta hai
✔ Pointer rakhta hai ki actual row kahan hai

Example index on Customer_ID:

Customer_IDPointer (Row)
4Row 3
5Row 2
8Row 1
8Row 4

Ab agar Customer_ID = 5 search karna hai:

Database directly pointer se Row 2 par jump karega.

No full scan required.

Result:

⚡ Fast data retrieval

Downsides of Indexes

Indexes free nahi hote.

1️⃣ Storage space extra lagta hai
2️⃣ INSERT / UPDATE slow ho jata hai
3️⃣ Too many indexes → maintenance heavy

Isliye blindly indexes nahi lagane chahiye.


Types of Indexes (Important for Data Warehouse)

Main 2 important types explain kar raha hoon:

  • B-Tree Index
  • Bitmap Index

B-Tree Index – Standard Index

Ye database ka default index hota hai.

Agar tum sirf likho:

C++
CREATE INDEX idx_customer ON sales(customer_id);

Toh mostly B-Tree index create hota hai.


B-Tree Kaise Kaam Karta Hai?

Ye multi-level tree structure hota hai.

Simple samjho:

Data alphabetical ya sorted order me arrange hota hai.

Example:

A
→ AD
→ Adam (Row 20)

Binary search jaisa kaam karta hai.

Full table scan nahi karta.
Direct correct branch me jump karta hai.


Kab Use Kare?

✔ High cardinality column
(Zyada distinct values)

Example:

  • Customer_ID
  • Order_ID
  • Transaction_ID
  • Surrogate Keys

Ye almost unique hote hain.


Data Warehouse Me Kahan Useful?

Fact table me:

  • Primary Key
  • Foreign Keys
  • Date Key

Join fast karne ke liye B-Tree best hai.


⚠ Downsides

  • Extra storage lagta hai
  • INSERT / UPDATE slow ho jata hai
  • Har column par blindly use nahi karna chahiye

Bitmap Index – Data Warehouse Special

Bitmap index specially data warehouse ke liye powerful hota hai.


Kab Use Kare?

✔ Large table
✔ Low cardinality column (Kam distinct values)

Example:

  • Gender (M/F)
  • Payment_Type (Visa/MasterCard)
  • Status (Active/Inactive)
  • Region (North/South/East/West)

Bitmap Kaise Kaam Karta Hai?

Example:

RowPayment_Type
1Visa
2Visa
3Visa
4MasterCard

Bitmap create hoga:

Visa → 1 1 1 0
MasterCard → 0 0 0 1

Agar query karein:

SQL
SELECT * FROM sales WHERE payment_type = 'Visa';

System directly bitmap check karega.

Row 1,2,3 → fetch
Row 4 → ignore

No full scan.


Bitmap Ka Biggest Advantage

Data bits me store hota hai.

Storage efficient
Filtering extremely fast
Multiple bitmap combine kar sakte ho:

Example:

Gender = 'F' AND Region = 'North'

Bitwise AND operation hota hai → super fast.


Bitmap Ka Downside

  • Frequent update ke liye suitable nahi
  • OLTP systems ke liye achha nahi
  • Data warehouse me better fit hota hai

B-Tree vs Bitmap – Clear Comparison

FeatureB-TreeBitmap
Default indexYesNo
Best forHigh cardinalityLow cardinality
Good for OLTPYesNo
Good for DWYesYes (better for filters)
Update friendlyYesNot much
Filtering speedGoodExcellent

Practical Data Warehouse Guidelines

Fact Table:

✔ Foreign keys par B-Tree
✔ Date key par B-Tree
✔ Surrogate key par B-Tree

Dimension Table:

✔ Primary key par B-Tree
✔ Low cardinality columns par Bitmap (if supported)

Avoid:

❌ High update column par bitmap
❌ Har column par index
❌ Small table par unnecessary index


Simple Rule to Remember

High distinct values → B-Tree
Low distinct values → Bitmap

Data Warehouse me:

Most queries filter karte hain
Isliye bitmap powerful hota hai.


Keep building your data skillset

Explore more SQL, Python, analytics, and engineering tutorials.