Dimensional Modeling me ek important topic hota hai:
👉 Natural Key
👉 Surrogate Key
Dono primary key ke type hain, lekin inka purpose aur usage alag hota hai.
Chalo simple language me samjhte hain.
What Is a Natural Key?
Natural Key wo key hoti hai jo source system se directly aati hai.
Example:
Product_ID = "PRD-AXZ-10987"
Transaction_ID = "TXN-2024-00056789"
Ye usually:
- Alphanumeric hoti hai
- Long aur bulky hoti hai
- Source system generate karta hai
Ye business ke liye meaningful hoti hai.
🔍 Problem with Natural Keys
Natural keys ke kuch issues hote hain:
❌ Bahut lambi hoti hain
❌ Storage zyada leti hain
❌ Join performance slow hoti hai
❌ Multiple systems me duplicate ho sakti hain
❌ Kabhi kabhi change bhi ho sakti hain
Example:
System A me Customer_ID = 100
System B me Customer_ID = 100
But dono alag customer ho sakte hain.
Yaha conflict ho jayega.
What Is a Surrogate Key?
Surrogate Key ek artificial key hoti hai jo:
👉 ETL process ke during generate hoti hai
👉 Simple integer hoti hai
👉 Usually auto-increment number hoti hai
Example:
Product_SK = 1
Customer_SK = 25
Date_SK = 20240101
Ye business meaning nahi rakhti.
Bas internal identification ke liye hoti hai.
Example: Product Dimension
Natural Key Version:
| Product_ID | Product_Name |
|---|---|
| PRD-AXZ-10987 | Laptop |
Surrogate Key Version:
| Product_SK | Product_ID | Product_Name |
|---|---|---|
| 1 | PRD-AXZ-10987 | Laptop |
Yaha:
Product_SK = Surrogate Key
Product_ID = Natural Key
Fact table me hum Product_SK use karte hain.
Why Surrogate Keys Are Better?
1️⃣ Better Performance
Integer joins:
✔ Faster
✔ Less storage
✔ Better indexing
Compare:
Join on: "PRD-AXZ-10987"
vs
Join on: 1
Obviously integer fast hoga.
2️⃣ Multiple Source Systems Integration
Agar 2 systems same ID use kare:
Surrogate key unique bana deta hai.
System A → Customer_SK = 1
System B → Customer_SK = 2
Problem solved.
3️⃣ Handle Missing Values Easily
Agar koi value missing hai:
We can use:
-1 → Unknown
0 → Not Available
Ye easily manageable hai.
4️⃣ Slowly Changing Dimensions (SCD)
Jab dimension value change hoti hai:
Surrogate key new row create karne me help karti hai.
Example:
Customer address change ho gaya.
Old record → SK = 1
New record → SK = 2
History maintain ho jati hai.
Natural key se ye manage karna difficult hota.
Simple Analogy
Natural Key = Aadhaar Number (real-world identity)
Surrogate Key = Internal employee number system ke andar
Business me Aadhaar meaningful hai
System me employee ID useful hai