Statistics SQL Server ke “mind” jaise hote hain. Ye SQL Server ko batate hain ki table me data kis tarah se distribute hai — jaise kitne values unique hain, kis column me zyada repeat hone wale data hai, aur kis range me values aati hain.
Main Purpose of Statistics
Statistics ka kaam hai Query Optimizer ko help karna best execution plan choose karne me.
Matlab — SQL Server ko jab koi query milti hai, to wo andaza lagata hai ki:
- Kitni rows return hongi,
- Kaunsa index use karna best rahega,
- Kis type ka join fast chalega (Nested Loop, Merge, ya Hash Join).
Ye saare decisions statistics ke basis pe hi liye jaate hain.
Example:
Maan le tere paas ek table hai Employees, jisme ek column hai DepartmentID.
Ab agar tere table me 1 lakh rows hain, aur 90% log DepartmentID = 1 me hain —
to agar tu likhe:
SELECT * FROM Employees WHERE DepartmentID = 1;
SQL Server ko pata hai ki ye query bohot saari rows return karegi,
to wo decide karega “Table Scan is better” (kyunki data zyada hai).
Lekin agar likhe:
SELECT * FROM Employees WHERE DepartmentID = 10;
aur Statistics bataye ki department 10 me sirf 50 rows hain —
to SQL Server bolega “Index Seek better rahega”.
How Statistics Work Internally
- Jab tu index banata hai, SQL Server automatically statistics bhi create karta hai.
- Non-indexed columns ke liye bhi statistics manually create kar sakte ho:
CREATE STATISTICS stat_EmpSalary ON Employees(Salary); - Ye statistics ek histogram store karta hai (data distribution ka chart jaise).
Histogram batata hai — kis range me kitni values hain.
When Statistics Go Outdated
Agar table me data bar-bar update/insert/delete ho raha hai,
to statistics outdated ho sakti hain.
Result? SQL Server galat execution plan choose karega → Query slow chalegi.
Fix:
- Auto Update Statistics ON rakho (default hota hai).
- Ya manually update karo:
UPDATE STATISTICS Employees;
CREATE OR ALTER PROCEDURE Update_Outdated_Statistics
@Mode INT=2, -- 1 = Show data, 2 = Update
@Threshold DECIMAL(5,2) = 0.20, -- Default 20%
@SchemaName SYSNAME = NULL, -- Optional schema filter
@TableName SYSNAME = NULL -- Optional table filter
AS
BEGIN
SET NOCOUNT ON;
DECLARE @StartTime DATETIME = GETDATE();
PRINT '====================================================';
PRINT 'Process started at: ' + CONVERT(VARCHAR(20), @StartTime, 120);
PRINT 'Threshold set to: ' + CAST(@Threshold * 100 AS NVARCHAR) + '%';
PRINT '====================================================';
-- Create log table if not exists
IF OBJECT_ID('dbo.StatsUpdateLog', 'U') IS NULL
BEGIN
CREATE TABLE dbo.StatsUpdateLog (
LogID INT IDENTITY(1,1) PRIMARY KEY,
TableName SYSNAME,
StatName SYSNAME,
UpdatedOn DATETIME DEFAULT GETDATE(),
Status NVARCHAR(50)
);
END
-- Mode 1: Show outdated stats
IF @Mode = 1
BEGIN
PRINT 'Step 1: Checking outdated statistics...';
SELECT
OBJECT_SCHEMA_NAME(s.object_id) AS SchemaName,
t.name AS TableName,
s.name AS StatName,
sp.last_updated,
sp.rows,
sp.rows_sampled,
sp.modification_counter,
CASE
WHEN sp.modification_counter > (sp.rows * @Threshold) THEN 'Update Needed'
ELSE 'OK'
END AS Status
FROM sys.stats AS s
INNER JOIN sys.objects AS t ON s.object_id = t.object_id
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
WHERE t.type = 'U'
AND (@SchemaName IS NULL OR OBJECT_SCHEMA_NAME(s.object_id) = @SchemaName)
AND (@TableName IS NULL OR t.name = @TableName)
ORDER BY sp.last_updated DESC;
PRINT 'Report completed. Use @Mode = 2 to update outdated statistics.';
END
-- Mode 2: Update outdated stats
ELSE IF @Mode = 2
BEGIN
PRINT 'Step 2: Updating outdated statistics...';
DECLARE @sql NVARCHAR(MAX) = N'';
DECLARE @ErrorMsg NVARCHAR(4000);
SELECT @sql = @sql +
'BEGIN TRY ' + CHAR(13) +
'UPDATE STATISTICS ' + QUOTENAME(OBJECT_SCHEMA_NAME(s.object_id)) + '.' +
QUOTENAME(OBJECT_NAME(s.object_id)) +
' (' + QUOTENAME(s.name) + ') WITH FULLSCAN; ' + CHAR(13) +
'INSERT INTO dbo.StatsUpdateLog (TableName, StatName, Status) ' +
'VALUES (''' + OBJECT_NAME(s.object_id) + ''', ''' + s.name + ''', ''Updated''); ' + CHAR(13) +
'END TRY BEGIN CATCH ' +
'INSERT INTO dbo.StatsUpdateLog (TableName, StatName, Status) ' +
'VALUES (''' + OBJECT_NAME(s.object_id) + ''', ''' + s.name + ''', ''Failed: ' +
REPLACE(ERROR_MESSAGE(), '''', '''''') + '''); ' +
'END CATCH;' + CHAR(13)
FROM sys.stats AS s
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
INNER JOIN sys.objects AS t ON s.object_id = t.object_id
WHERE t.type = 'U'
AND sp.modification_counter > (sp.rows * @Threshold)
AND (@SchemaName IS NULL OR OBJECT_SCHEMA_NAME(s.object_id) = @SchemaName)
AND (@TableName IS NULL OR t.name = @TableName);
IF LEN(@sql) > 0
BEGIN
PRINT 'Running updates...';
EXEC sp_executesql @sql;
PRINT 'All outdated statistics have been updated.';
END
ELSE
BEGIN
PRINT 'All statistics are already up to date.';
END
END
ELSE
BEGIN
PRINT 'Invalid parameter! Use @Mode = 1 for report or @Mode = 2 to update.';
END
DECLARE @EndTime DATETIME = GETDATE();
PRINT '====================================================';
PRINT 'Process completed at: ' + CONVERT(VARCHAR(20), @EndTime, 120);
PRINT 'Total Duration (sec): ' + CAST(DATEDIFF(SECOND, @StartTime, @EndTime) AS NVARCHAR);
PRINT '====================================================';
END;
GO
Summary of All Columns in Your Query
| Column Name | Meaning | Example Value | Notes |
|---|---|---|---|
| t.name | Table ka naam | EmployeeMaster | From sys.objects — user table ka actual name |
| s.name | Statistic ka naam | _WA_Sys_00000002_7B264821 | Har index ya column ke liye SQL Server automatically statistics banata hai |
| sp.last_updated | Statistics last time kab update hue | 2025-10-05 14:20:35.450 | Batata hai ki ye stats kitne purane hain — agar bahut purane ho gaye to refresh karna chahiye |
| sp.rows | Table me total rows (stats banate time) | 1000000 | Ye count stats ke banne ke time ka snapshot hota hai |
| sp.rows_sampled | Kitni rows sample hui thi statistics banate time | 200000 | Agar FULLSCAN hua to rows_sampled = rows, warna kam hota hai |
| sp.modification_counter | Last stats update ke baad kitni rows badli (insert/update/delete) | 300000 | High value means data change hua hai → stats update needed |
| Status (calculated column) | Batata hai ki stats update karni hai ya nahi | Update Needed / OK | Agar modification_counter > (rows * 0.20) → “Update Needed” |
Statistics in @Table Variable vs #Temp Table
SQL Server me hum do tarah ke temporary storage use karte hain:
1️⃣ Table Variable → @MyTable
2️⃣ Temporary Table → #MyTempTable
Dono ka kaam data ko temporary store karna hai, lekin Statistics ka behavior dono me completely different hota hai, aur ye performance pe direct impact dalta hai.
1️⃣ Statistics in Table Variable (@Table)
🔹 Default Behavior
SQL Server @table variable ke liye statistics create hi nahi karta (SQL Server 2019 se pehle).
Iska matlab:
👉 Query Optimizer hamesha assume karta hai ki sirf 1 row hai.
Chahe tumne 10 rows insert ki ho ya 10 lakh —
Optimizer bolega: “Mere hisaab se to sirf 1 row hai”.
🔹 Example
DECLARE @Emp TABLE (EmpID INT, Salary INT);
INSERT INTO @Emp
SELECT EmpID, Salary
FROM Employees; -- Maan lo 5 lakh rows hain
SELECT *
FROM @Emp
WHERE Salary > 50000;
❌ Problem
Optimizer sochega:
👉 “Bas 1 row hogi, index ki zarurat nahi, nested loop best hai.”
Reality:
👉 5 lakh rows process ho rahi hain → slow query, high CPU, wrong join method.
🔹 Why No Statistics?
Table variables:
- Memory optimized design ke liye bane the
- Lightweight object maane jaate hain
- SQL Server unpe cost based estimation nahi karta
Isliye:
👉 Row count estimate = 1 (by default)
⚠️ SQL Server 2019+ Improvement (Deferred Compilation)
SQL Server 2019 me Table Variable Deferred Compilation introduce hua.
Matlab:
- Query compile hote waqt nahi
- Actual execution ke time row count dekhta hai
- Fir better execution plan banata hai
But:
👉 Ye complex queries, joins aur large datasets me phir bhi temp table se weak hota hai.
2️⃣ Statistics in Temporary Table (#Temp Table)
🔹 Default Behavior
#Temp table ke liye SQL Server:
✅ Automatically statistics create karta hai
✅ Data distribution samajhta hai
✅ Best execution plan choose karta hai
🔹 Example
CREATE TABLE #Emp (EmpID INT, Salary INT);
INSERT INTO #Emp
SELECT EmpID, Salary
FROM Employees; -- 5 lakh rows
SELECT *
FROM #Emp
WHERE Salary > 50000;
✅ Optimizer kya karega?
- Salary column ka histogram dekhega
- Andaza lagayega: “Kitni rows return hongi?”
- Decide karega:
- Index Seek vs Table Scan
- Hash Join / Merge Join / Nested Loop
Result:
👉 Better plan, fast execution, low resource usage
📊 Comparison: @Table vs #Temp (Statistics Perspective)
| Feature | @Table Variable | #Temp Table |
|---|---|---|
| Statistics created? | ❌ No (2019 se pehle) | ✅ Yes |
| Row count estimation | ❌ Always 1 row | ✅ Actual data based |
| Histogram available | ❌ No | ✅ Yes |
| Execution plan quality | ❌ Often wrong | ✅ Optimized |
| Suitable for | Small data (few rows) | Large data / complex joins |
| Indexes allowed | Limited | Full indexing supported |
| Recompile based on data | ❌ No | ✅ Yes |
🧠 Real-Life Office Scenario
Maan lo:
- Tum ek Sales Dashboard bana rahe ho
- 10 lakh transactions ka data hai
- Tum intermediate result ko store kar rahe ho
❌ Using @Table
DECLARE @Sales TABLE (OrderID INT, Amount DECIMAL(10,2));
➡ Optimizer sochega: 1 row
➡ Join galat choose hoga
➡ Dashboard slow, timeout, client complain 😐
✅ Using #Temp Table
CREATE TABLE #Sales (OrderID INT, Amount DECIMAL(10,2));
➡ SQL Server statistics banayega
➡ Correct row estimation
➡ Best join type
➡ Dashboard fast 🚀
⚡ When Should You Use What?
✅ Use @Table when:
- Data bahut kam ho (1–100 rows)
- Simple logic, no heavy joins
- Scalar operations, small lookups
- Functions / short stored procedures
✅ Use #Temp Table when:
- Data large ho (1000+ rows)
- Multiple joins / aggregations
- Performance critical queries
- Reporting / ETL / analytics
- You need indexes + accurate statistics
🛠️ Best Practices (Production Level)
✔ Large dataset? → Always prefer #Temp Table
✔ Multiple joins? → Avoid @Table
✔ Debugging & tuning? → Use #Temp so you can see stats and execution plan
✔ SQL 2019+? → Table variable improved hai, but still not equal to temp table
✔ ETL / Reporting? → #Temp only
#temp table par statistics tab bhi banti hai chahe tum PRIMARY KEY / index na banao.
Lekin… performance aur execution plan par farq zaroor padta hai.
Step-by-step samajhte hain.
🔹 Case 1: #Temp Table without Primary Key / Index
CREATE TABLE #Emp
(
EmpID INT,
Salary INT
);
INSERT INTO #Emp
SELECT EmpID, Salary
FROM Employees;
✔ Kya hoga?
1️⃣ SQL Server:
- Column-level statistics automatically create karega
- Histogram banega (data distribution ke liye)
2️⃣ Query Optimizer:
- Ye samajh paayega ki kitni rows hain
- Ye estimate kar paayega:
- kitni rows filter ke baad aayengi
- kaunsa join type best rahega
❌ Lekin kya problem hogi?
- Koi index nahi hai → Data search ke liye:
Table Scanlagega
- Matlab:
- Statistics sahi hai
- Par access path slow ho sakta hai
👉 Stats milengi, par seek nahi hoga → sirf scan
🔹 Case 2: #Temp Table with Primary Key / Index
CREATE TABLE #Emp
(
EmpID INT PRIMARY KEY,
Salary INT
);
✔ Kya extra benefit milega?
1️⃣ Primary Key banate hi:
- Clustered Index ban jaata hai (default)
- SQL Server index statistics bhi create karta hai
2️⃣ Query Optimizer:
- Histogram bhi dekhega
- Index bhi use karega
👉 Result: Index Seek + Accurate row estimation = Fast query 🚀
📊 Comparison
| Scenario | Statistics | Index | Execution Plan | Performance |
|---|---|---|---|---|
| #Temp without PK | ✅ Yes | ❌ No | Table Scan | Medium / Slow (large data) |
| #Temp with PK | ✅ Yes | ✅ Yes | Index Seek/Table Scan | Fast |
| @Table variable | ❌ No (pre-2019) | Limited | Poor plan | Slow |
Bilkul Boss 👍 — comma , ki jagah pipe | separator use kar lete hain.
Logic same rahega, bas split character change kar denge.
Neeche final clean version hai jo:
✅ | separated SP / queries accept kare
✅ CPU, Time, IO compare kare
✅ End me BEST flag de
✅ Performance Comparator ( | Separated Queries / SPs )
🔧 Step 1: Log Table (Same)
IF OBJECT_ID('dbo.QueryPerformanceLog', 'U') IS NULL
BEGIN
CREATE TABLE dbo.QueryPerformanceLog
(
LogID INT IDENTITY(1,1) PRIMARY KEY,
TestName NVARCHAR(200),
QueryText NVARCHAR(MAX),
CPU_Time_ms BIGINT,
Elapsed_Time_ms BIGINT,
Logical_Reads BIGINT,
ExecutionTime DATETIME DEFAULT GETDATE()
);
END
GO
🔧 Step 2: Stored Procedure ( | Separator )
CREATE OR ALTER PROCEDURE dbo.Compare_Query_Performance
(
@TestName NVARCHAR(200),
@QueryList NVARCHAR(MAX) -- | separated SPs / Queries
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE
@Query NVARCHAR(MAX),
@Pos INT = 1,
@NextPos INT,
@Len INT = LEN(@QueryList);
PRINT '===========================================';
PRINT 'Performance Test Started: ' + @TestName;
PRINT '===========================================';
WHILE @Pos <= @Len
BEGIN
SET @NextPos = CHARINDEX('|', @QueryList, @Pos);
IF @NextPos = 0
SET @NextPos = @Len + 1;
SET @Query = LTRIM(RTRIM(SUBSTRING(@QueryList, @Pos, @NextPos - @Pos)));
PRINT '-------------------------------------------';
PRINT 'Executing: ' + @Query;
PRINT '-------------------------------------------';
-- Metrics BEFORE execution
DECLARE
@CPU_Before BIGINT,
@Elapsed_Before BIGINT,
@LogicalReads_Before BIGINT;
SELECT
@CPU_Before = SUM(total_worker_time),
@Elapsed_Before = SUM(total_elapsed_time),
@LogicalReads_Before = SUM(total_logical_reads)
FROM sys.dm_exec_query_stats;
-- Execute SP / Query
EXEC sp_executesql @Query;
-- Metrics AFTER execution
DECLARE
@CPU_After BIGINT,
@Elapsed_After BIGINT,
@LogicalReads_After BIGINT;
SELECT
@CPU_After = SUM(total_worker_time),
@Elapsed_After = SUM(total_elapsed_time),
@LogicalReads_After = SUM(total_logical_reads)
FROM sys.dm_exec_query_stats;
-- Save result
INSERT INTO dbo.QueryPerformanceLog
(
TestName,
QueryText,
CPU_Time_ms,
Elapsed_Time_ms,
Logical_Reads
)
VALUES
(
@TestName,
@Query,
(@CPU_After - @CPU_Before) / 1000,
(@Elapsed_After - @Elapsed_Before) / 1000,
(@LogicalReads_After - @LogicalReads_Before)
);
SET @Pos = @NextPos + 1;
END
PRINT '===========================================';
PRINT 'Test Completed.';
PRINT '===========================================';
-- Final Comparison with BEST Flag
SELECT
TestName,
QueryText,
CPU_Time_ms,
Elapsed_Time_ms,
Logical_Reads,
--CASE
-- WHEN CPU_Time_ms = MIN(CPU_Time_ms) OVER()
-- AND Elapsed_Time_ms = MIN(Elapsed_Time_ms) OVER()
-- AND Logical_Reads = MIN(Logical_Reads) OVER()
-- THEN 'BEST'
-- ELSE 'NOT BEST'
-- END AS PerformanceFlag,
ExecutionTime
FROM dbo.QueryPerformanceLog
WHERE TestName = @TestName
ORDER BY ExecutionTime DESC
TRUNCATE TABLE dbo.QueryPerformanceLog;
END;
GO
▶️ Usage Examples
1️⃣ Multiple Stored Procedures
EXEC dbo.Compare_Query_Performance
@TestName = 'SP_Comparison_Test',
@QueryList =
'EXEC dbo.GetEmployees_V1 |
EXEC dbo.GetEmployees_V2 |
EXEC dbo.GetEmployees_V3';
2️⃣ Multiple Queries
EXEC dbo.Compare_Query_Performance
@TestName = 'Query_Comparison_Test',
@QueryList =
'SELECT * FROM Employees WHERE Salary > 50000 |
SELECT EmpID, EmpName FROM Employees WHERE Salary > 50000 |
SELECT EmpID FROM Employees WHERE Salary > 50000';
3️⃣ Variable Based Queries
DECLARE @Q1 NVARCHAR(MAX) = 'SELECT * FROM Employees WHERE DepartmentID = 1';
DECLARE @Q2 NVARCHAR(MAX) = 'SELECT EmpID, EmpName FROM Employees WHERE DepartmentID = 1';
EXEC dbo.Compare_Query_Performance
@TestName = 'Variable_Query_Test',
@QueryList = @Q1 + ' | ' + @Q2;
🧠 Important Points
🔹 Statistics capture via DMVs → CPU, Elapsed Time, Logical Reads
🔹 BEST flag → Jiska CPU + Time + IO sabse kam