Database Me Data Kaise Store Hota Hai?
Table ke rows disk par randomly store hote hain.
Example:
| Row | Customer_ID | Sales |
|---|---|---|
| 1 | 8 | 100 |
| 2 | 5 | 200 |
| 3 | 4 | 300 |
| 4 | 8 | 150 |
Agar tum query karo:
SELECT * FROM Sales WHERE Customer_ID = 5;
Toh kya hoga?
Database:
👉 Row 1 check karega
👉 Row 2 check karega
👉 Row 3 check karega
👉 Row 4 check karega
Isko bolte hain:
🔴 Full Table Scan
Large table me ye bahut slow hota hai.
Index Kya Hota Hai?
Index ek alag structure hota hai jo:
✔ Data ko sorted order me store karta hai
✔ Pointer rakhta hai ki actual row kahan hai
Example index on Customer_ID:
| Customer_ID | Pointer (Row) |
|---|---|
| 4 | Row 3 |
| 5 | Row 2 |
| 8 | Row 1 |
| 8 | Row 4 |
Ab agar Customer_ID = 5 search karna hai:
Database directly pointer se Row 2 par jump karega.
No full scan required.
Result:
⚡ Fast data retrieval
Downsides of Indexes
Indexes free nahi hote.
1️⃣ Storage space extra lagta hai
2️⃣ INSERT / UPDATE slow ho jata hai
3️⃣ Too many indexes → maintenance heavy
Isliye blindly indexes nahi lagane chahiye.
Types of Indexes (Important for Data Warehouse)
Main 2 important types explain kar raha hoon:
- B-Tree Index
- Bitmap Index
B-Tree Index – Standard Index
Ye database ka default index hota hai.
Agar tum sirf likho:
CREATE INDEX idx_customer ON sales(customer_id);
Toh mostly B-Tree index create hota hai.
B-Tree Kaise Kaam Karta Hai?
Ye multi-level tree structure hota hai.
Simple samjho:
Data alphabetical ya sorted order me arrange hota hai.
Example:
A
→ AD
→ Adam (Row 20)
Binary search jaisa kaam karta hai.
Full table scan nahi karta.
Direct correct branch me jump karta hai.
Kab Use Kare?
✔ High cardinality column
(Zyada distinct values)
Example:
- Customer_ID
- Order_ID
- Transaction_ID
- Surrogate Keys
Ye almost unique hote hain.
Data Warehouse Me Kahan Useful?
Fact table me:
- Primary Key
- Foreign Keys
- Date Key
Join fast karne ke liye B-Tree best hai.
⚠ Downsides
- Extra storage lagta hai
- INSERT / UPDATE slow ho jata hai
- Har column par blindly use nahi karna chahiye
Bitmap Index – Data Warehouse Special
Bitmap index specially data warehouse ke liye powerful hota hai.
Kab Use Kare?
✔ Large table
✔ Low cardinality column (Kam distinct values)
Example:
- Gender (M/F)
- Payment_Type (Visa/MasterCard)
- Status (Active/Inactive)
- Region (North/South/East/West)
Bitmap Kaise Kaam Karta Hai?
Example:
| Row | Payment_Type |
|---|---|
| 1 | Visa |
| 2 | Visa |
| 3 | Visa |
| 4 | MasterCard |
Bitmap create hoga:
Visa → 1 1 1 0
MasterCard → 0 0 0 1
Agar query karein:
SELECT * FROM sales WHERE payment_type = 'Visa';
System directly bitmap check karega.
Row 1,2,3 → fetch
Row 4 → ignore
No full scan.
Bitmap Ka Biggest Advantage
Data bits me store hota hai.
Storage efficient
Filtering extremely fast
Multiple bitmap combine kar sakte ho:
Example:
Gender = 'F' AND Region = 'North'
Bitwise AND operation hota hai → super fast.
Bitmap Ka Downside
- Frequent update ke liye suitable nahi
- OLTP systems ke liye achha nahi
- Data warehouse me better fit hota hai
B-Tree vs Bitmap – Clear Comparison
| Feature | B-Tree | Bitmap |
|---|---|---|
| Default index | Yes | No |
| Best for | High cardinality | Low cardinality |
| Good for OLTP | Yes | No |
| Good for DW | Yes | Yes (better for filters) |
| Update friendly | Yes | Not much |
| Filtering speed | Good | Excellent |
Practical Data Warehouse Guidelines
Fact Table:
✔ Foreign keys par B-Tree
✔ Date key par B-Tree
✔ Surrogate key par B-Tree
Dimension Table:
✔ Primary key par B-Tree
✔ Low cardinality columns par Bitmap (if supported)
Avoid:
❌ High update column par bitmap
❌ Har column par index
❌ Small table par unnecessary index
Simple Rule to Remember
High distinct values → B-Tree
Low distinct values → Bitmap
Data Warehouse me:
Most queries filter karte hain
Isliye bitmap powerful hota hai.