Statistics in SQL Server

Statistics SQL Server ke “mind” jaise hote hain. Ye SQL Server ko batate hain ki table me data kis tarah se distribute hai — jaise kitne values unique hain…

Mar 19, 2026 12 min read

Statistics SQL Server ke “mind” jaise hote hain. Ye SQL Server ko batate hain ki table me data kis tarah se distribute hai — jaise kitne values unique hain, kis column me zyada repeat hone wale data hai, aur kis range me values aati hain.

Main Purpose of Statistics

Statistics ka kaam hai Query Optimizer ko help karna best execution plan choose karne me.
Matlab — SQL Server ko jab koi query milti hai, to wo andaza lagata hai ki:

Kitni rows return hongi,
Kaunsa index use karna best rahega,
Kis type ka join fast chalega (Nested Loop, Merge, ya Hash Join).

Ye saare decisions statistics ke basis pe hi liye jaate hain.

Example:

Maan le tere paas ek table hai Employees, jisme ek column hai DepartmentID.
Ab agar tere table me 1 lakh rows hain, aur 90% log DepartmentID = 1 me hain —
to agar tu likhe:

SELECT * FROM Employees WHERE DepartmentID = 1;

SQL Server ko pata hai ki ye query bohot saari rows return karegi,
to wo decide karega “Table Scan is better” (kyunki data zyada hai).

Lekin agar likhe:

SELECT * FROM Employees WHERE DepartmentID = 10;

aur Statistics bataye ki department 10 me sirf 50 rows hain —
to SQL Server bolega “Index Seek better rahega”.

How Statistics Work Internally

Jab tu index banata hai, SQL Server automatically statistics bhi create karta hai.
Non-indexed columns ke liye bhi statistics manually create kar sakte ho: CREATE STATISTICS stat_EmpSalary ON Employees(Salary);
Ye statistics ek histogram store karta hai (data distribution ka chart jaise).
Histogram batata hai — kis range me kitni values hain.

When Statistics Go Outdated

Agar table me data bar-bar update/insert/delete ho raha hai,
to statistics outdated ho sakti hain.
Result? SQL Server galat execution plan choose karega → Query slow chalegi.

Fix:

Auto Update Statistics ON rakho (default hota hai).
Ya manually update karo: UPDATE STATISTICS Employees;

CREATE OR ALTER PROCEDURE Update_Outdated_Statistics
    @Mode INT=2,                                -- 1 = Show data, 2 = Update
    @Threshold DECIMAL(5,2) = 0.20,           -- Default 20%
    @SchemaName SYSNAME = NULL,               -- Optional schema filter
    @TableName SYSNAME = NULL                 -- Optional table filter
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE @StartTime DATETIME = GETDATE();
    PRINT '====================================================';
    PRINT 'Process started at: ' + CONVERT(VARCHAR(20), @StartTime, 120);
    PRINT 'Threshold set to: ' + CAST(@Threshold * 100 AS NVARCHAR) + '%';
    PRINT '====================================================';

    -- Create log table if not exists
    IF OBJECT_ID('dbo.StatsUpdateLog', 'U') IS NULL
    BEGIN
        CREATE TABLE dbo.StatsUpdateLog (
            LogID INT IDENTITY(1,1) PRIMARY KEY,
            TableName SYSNAME,
            StatName SYSNAME,
            UpdatedOn DATETIME DEFAULT GETDATE(),
            Status NVARCHAR(50)
        );
    END

    -- Mode 1: Show outdated stats
    IF @Mode = 1
    BEGIN
        PRINT 'Step 1: Checking outdated statistics...';

        SELECT 
            OBJECT_SCHEMA_NAME(s.object_id) AS SchemaName,
            t.name AS TableName,
            s.name AS StatName,
            sp.last_updated,
            sp.rows,
            sp.rows_sampled,
            sp.modification_counter,
            CASE 
                WHEN sp.modification_counter > (sp.rows * @Threshold) THEN 'Update Needed'
                ELSE 'OK'
            END AS Status
        FROM sys.stats AS s
        INNER JOIN sys.objects AS t ON s.object_id = t.object_id
        CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
        WHERE t.type = 'U'
          AND (@SchemaName IS NULL OR OBJECT_SCHEMA_NAME(s.object_id) = @SchemaName)
          AND (@TableName IS NULL OR t.name = @TableName)
        ORDER BY sp.last_updated DESC;

        PRINT 'Report completed. Use @Mode = 2 to update outdated statistics.';
    END

    -- Mode 2: Update outdated stats
    ELSE IF @Mode = 2
    BEGIN
        PRINT 'Step 2: Updating outdated statistics...';

        DECLARE @sql NVARCHAR(MAX) = N'';
        DECLARE @ErrorMsg NVARCHAR(4000);

        SELECT @sql = @sql + 
            'BEGIN TRY ' + CHAR(13) +
            'UPDATE STATISTICS ' + QUOTENAME(OBJECT_SCHEMA_NAME(s.object_id)) + '.' + 
            QUOTENAME(OBJECT_NAME(s.object_id)) + 
            ' (' + QUOTENAME(s.name) + ') WITH FULLSCAN; ' + CHAR(13) +
            'INSERT INTO dbo.StatsUpdateLog (TableName, StatName, Status) ' +
            'VALUES (''' + OBJECT_NAME(s.object_id) + ''', ''' + s.name + ''', ''Updated''); ' + CHAR(13) +
            'END TRY BEGIN CATCH ' +
            'INSERT INTO dbo.StatsUpdateLog (TableName, StatName, Status) ' +
            'VALUES (''' + OBJECT_NAME(s.object_id) + ''', ''' + s.name + ''', ''Failed: ' + 
            REPLACE(ERROR_MESSAGE(), '''', '''''') + '''); ' +
            'END CATCH;' + CHAR(13)
        FROM sys.stats AS s
        CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
        INNER JOIN sys.objects AS t ON s.object_id = t.object_id
        WHERE t.type = 'U'
          AND sp.modification_counter > (sp.rows * @Threshold)
          AND (@SchemaName IS NULL OR OBJECT_SCHEMA_NAME(s.object_id) = @SchemaName)
          AND (@TableName IS NULL OR t.name = @TableName);

        IF LEN(@sql) > 0
        BEGIN
            PRINT 'Running updates...';
            EXEC sp_executesql @sql;
            PRINT 'All outdated statistics have been updated.';
        END
        ELSE
        BEGIN
            PRINT 'All statistics are already up to date.';
        END
    END
    ELSE
    BEGIN
        PRINT 'Invalid parameter! Use @Mode = 1 for report or @Mode = 2 to update.';
    END

    DECLARE @EndTime DATETIME = GETDATE();
    PRINT '====================================================';
    PRINT 'Process completed at: ' + CONVERT(VARCHAR(20), @EndTime, 120);
    PRINT 'Total Duration (sec): ' + CAST(DATEDIFF(SECOND, @StartTime, @EndTime) AS NVARCHAR);
    PRINT '====================================================';
END;
GO

Summary of All Columns in Your Query

Column Name	Meaning	Example Value	Notes
t.name	Table ka naam	`EmployeeMaster`	From `sys.objects` — user table ka actual name
s.name	Statistic ka naam	`_WA_Sys_00000002_7B264821`	Har index ya column ke liye SQL Server automatically statistics banata hai
sp.last_updated	Statistics last time kab update hue	`2025-10-05 14:20:35.450`	Batata hai ki ye stats kitne purane hain — agar bahut purane ho gaye to refresh karna chahiye
sp.rows	Table me total rows (stats banate time)	`1000000`	Ye count stats ke banne ke time ka snapshot hota hai
sp.rows_sampled	Kitni rows sample hui thi statistics banate time	`200000`	Agar FULLSCAN hua to `rows_sampled = rows`, warna kam hota hai
sp.modification_counter	Last stats update ke baad kitni rows badli (insert/update/delete)	`300000`	High value means data change hua hai → stats update needed
Status (calculated column)	Batata hai ki stats update karni hai ya nahi	`Update Needed` / `OK`	Agar modification_counter > (rows * 0.20) → “Update Needed”

Statistics in `@Table` Variable vs `#Temp` Table

SQL Server me hum do tarah ke temporary storage use karte hain:

1️⃣ Table Variable → @MyTable
2️⃣ Temporary Table → #MyTempTable

Dono ka kaam data ko temporary store karna hai, lekin Statistics ka behavior dono me completely different hota hai, aur ye performance pe direct impact dalta hai.

1️⃣ Statistics in Table Variable (`@Table`)

🔹 Default Behavior

SQL Server @table variable ke liye statistics create hi nahi karta (SQL Server 2019 se pehle).

Iska matlab:
👉 Query Optimizer hamesha assume karta hai ki sirf 1 row hai.

Chahe tumne 10 rows insert ki ho ya 10 lakh —
Optimizer bolega: “Mere hisaab se to sirf 1 row hai”.

🔹 Example

DECLARE @Emp TABLE (EmpID INT, Salary INT);

INSERT INTO @Emp
SELECT EmpID, Salary
FROM Employees;   -- Maan lo 5 lakh rows hain

SELECT *
FROM @Emp
WHERE Salary > 50000;

❌ Problem

Optimizer sochega:
👉 “Bas 1 row hogi, index ki zarurat nahi, nested loop best hai.”

Reality:
👉 5 lakh rows process ho rahi hain → slow query, high CPU, wrong join method.

🔹 Why No Statistics?

Table variables:

Memory optimized design ke liye bane the
Lightweight object maane jaate hain
SQL Server unpe cost based estimation nahi karta

Isliye:
👉 Row count estimate = 1 (by default)

⚠️ SQL Server 2019+ Improvement (Deferred Compilation)

SQL Server 2019 me Table Variable Deferred Compilation introduce hua.

Matlab:

Query compile hote waqt nahi
Actual execution ke time row count dekhta hai
Fir better execution plan banata hai

But:
👉 Ye complex queries, joins aur large datasets me phir bhi temp table se weak hota hai.

2️⃣ Statistics in Temporary Table (`#Temp Table`)

🔹 Default Behavior

#Temp table ke liye SQL Server:
✅ Automatically statistics create karta hai
✅ Data distribution samajhta hai
✅ Best execution plan choose karta hai

🔹 Example

CREATE TABLE #Emp (EmpID INT, Salary INT);

INSERT INTO #Emp
SELECT EmpID, Salary
FROM Employees;   -- 5 lakh rows

SELECT *
FROM #Emp
WHERE Salary > 50000;

✅ Optimizer kya karega?

Salary column ka histogram dekhega
Andaza lagayega: “Kitni rows return hongi?”
Decide karega:
- Index Seek vs Table Scan
- Hash Join / Merge Join / Nested Loop

Result:
👉 Better plan, fast execution, low resource usage

📊 Comparison: `@Table` vs `#Temp` (Statistics Perspective)

Feature	@Table Variable	#Temp Table
Statistics created?	❌ No (2019 se pehle)	✅ Yes
Row count estimation	❌ Always 1 row	✅ Actual data based
Histogram available	❌ No	✅ Yes
Execution plan quality	❌ Often wrong	✅ Optimized
Suitable for	Small data (few rows)	Large data / complex joins
Indexes allowed	Limited	Full indexing supported
Recompile based on data	❌ No	✅ Yes

🧠 Real-Life Office Scenario

Maan lo:

Tum ek Sales Dashboard bana rahe ho
10 lakh transactions ka data hai
Tum intermediate result ko store kar rahe ho

❌ Using `@Table`

DECLARE @Sales TABLE (OrderID INT, Amount DECIMAL(10,2));

➡ Optimizer sochega: 1 row
➡ Join galat choose hoga
➡ Dashboard slow, timeout, client complain 😐

✅ Using `#Temp Table`

CREATE TABLE #Sales (OrderID INT, Amount DECIMAL(10,2));

➡ SQL Server statistics banayega
➡ Correct row estimation
➡ Best join type
➡ Dashboard fast 🚀

⚡ When Should You Use What?

✅ Use `@Table` when:

Data bahut kam ho (1–100 rows)
Simple logic, no heavy joins
Scalar operations, small lookups
Functions / short stored procedures

✅ Use `#Temp Table` when:

Data large ho (1000+ rows)
Multiple joins / aggregations
Performance critical queries
Reporting / ETL / analytics
You need indexes + accurate statistics

🛠️ Best Practices (Production Level)

✔ Large dataset? → Always prefer #Temp Table
✔ Multiple joins? → Avoid @Table
✔ Debugging & tuning? → Use #Temp so you can see stats and execution plan
✔ SQL 2019+? → Table variable improved hai, but still not equal to temp table
✔ ETL / Reporting? → #Temp only

#temp table par statistics tab bhi banti hai chahe tum PRIMARY KEY / index na banao.

Lekin… performance aur execution plan par farq zaroor padta hai.
Step-by-step samajhte hain.

🔹 Case 1: `#Temp Table` without Primary Key / Index

CREATE TABLE #Emp
(
    EmpID INT,
    Salary INT
);

INSERT INTO #Emp
SELECT EmpID, Salary
FROM Employees;

✔ Kya hoga?

1️⃣ SQL Server:

Column-level statistics automatically create karega
Histogram banega (data distribution ke liye)

2️⃣ Query Optimizer:

Ye samajh paayega ki kitni rows hain
Ye estimate kar paayega:
- kitni rows filter ke baad aayengi
- kaunsa join type best rahega

❌ Lekin kya problem hogi?

Koi index nahi hai → Data search ke liye:
- Table Scan lagega
Matlab:
- Statistics sahi hai
- Par access path slow ho sakta hai

👉 Stats milengi, par seek nahi hoga → sirf scan

🔹 Case 2: `#Temp Table` with Primary Key / Index

CREATE TABLE #Emp
(
    EmpID INT PRIMARY KEY,
    Salary INT
);

✔ Kya extra benefit milega?

1️⃣ Primary Key banate hi:

Clustered Index ban jaata hai (default)
SQL Server index statistics bhi create karta hai

2️⃣ Query Optimizer:

Histogram bhi dekhega
Index bhi use karega

👉 Result: Index Seek + Accurate row estimation = Fast query 🚀

📊 Comparison

Scenario	Statistics	Index	Execution Plan	Performance
#Temp without PK	✅ Yes	❌ No	Table Scan	Medium / Slow (large data)
#Temp with PK	✅ Yes	✅ Yes	Index Seek/Table Scan	Fast
@Table variable	❌ No (pre-2019)	Limited	Poor plan	Slow

Bilkul Boss 👍 — comma , ki jagah pipe | separator use kar lete hain.
Logic same rahega, bas split character change kar denge.

Neeche final clean version hai jo:

✅ | separated SP / queries accept kare
✅ CPU, Time, IO compare kare
✅ End me BEST flag de

✅ Performance Comparator ( `|` Separated Queries / SPs )

🔧 Step 1: Log Table (Same)

IF OBJECT_ID(&#039;dbo.QueryPerformanceLog&#039;, &#039;U&#039;) IS NULL
BEGIN
    CREATE TABLE dbo.QueryPerformanceLog
    (
        LogID INT IDENTITY(1,1) PRIMARY KEY,
        TestName NVARCHAR(200),
        QueryText NVARCHAR(MAX),
        CPU_Time_ms BIGINT,
        Elapsed_Time_ms BIGINT,
        Logical_Reads BIGINT,
        ExecutionTime DATETIME DEFAULT GETDATE()
    );
END
GO

🔧 Step 2: Stored Procedure ( `|` Separator )

CREATE OR ALTER PROCEDURE dbo.Compare_Query_Performance
(
    @TestName NVARCHAR(200),
    @QueryList NVARCHAR(MAX)   -- | separated SPs / Queries
)
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE 
        @Query NVARCHAR(MAX),
        @Pos INT = 1,
        @NextPos INT,
        @Len INT = LEN(@QueryList);

    PRINT &#039;===========================================&#039;;
    PRINT &#039;Performance Test Started: &#039; + @TestName;
    PRINT &#039;===========================================&#039;;

    WHILE @Pos <= @Len
    BEGIN
        SET @NextPos = CHARINDEX(&#039;|&#039;, @QueryList, @Pos);
        IF @NextPos = 0 
            SET @NextPos = @Len + 1;

        SET @Query = LTRIM(RTRIM(SUBSTRING(@QueryList, @Pos, @NextPos - @Pos)));

        PRINT &#039;-------------------------------------------&#039;;
        PRINT &#039;Executing: &#039; + @Query;
        PRINT &#039;-------------------------------------------&#039;;

        -- Metrics BEFORE execution
        DECLARE 
            @CPU_Before BIGINT,
            @Elapsed_Before BIGINT,
            @LogicalReads_Before BIGINT;

        SELECT  
            @CPU_Before = SUM(total_worker_time),
            @Elapsed_Before = SUM(total_elapsed_time),
            @LogicalReads_Before = SUM(total_logical_reads)
        FROM sys.dm_exec_query_stats;

        -- Execute SP / Query
        EXEC sp_executesql @Query;

        -- Metrics AFTER execution
        DECLARE 
            @CPU_After BIGINT,
            @Elapsed_After BIGINT,
            @LogicalReads_After BIGINT;

        SELECT  
            @CPU_After = SUM(total_worker_time),
            @Elapsed_After = SUM(total_elapsed_time),
            @LogicalReads_After = SUM(total_logical_reads)
        FROM sys.dm_exec_query_stats;

        -- Save result
        INSERT INTO dbo.QueryPerformanceLog
        (
            TestName,
            QueryText,
            CPU_Time_ms,
            Elapsed_Time_ms,
            Logical_Reads
        )
        VALUES
        (
            @TestName,
            @Query,
            (@CPU_After - @CPU_Before) / 1000,        
            (@Elapsed_After - @Elapsed_Before) / 1000,
            (@LogicalReads_After - @LogicalReads_Before)
        );

        SET @Pos = @NextPos + 1;
    END

    PRINT &#039;===========================================&#039;;
    PRINT &#039;Test Completed.&#039;;
    PRINT &#039;===========================================&#039;;

    -- Final Comparison with BEST Flag

    
      
    SELECT 
        TestName,
        QueryText,
        CPU_Time_ms,
        Elapsed_Time_ms,
        Logical_Reads,
        --CASE 
--            WHEN CPU_Time_ms = MIN(CPU_Time_ms) OVER()
--             AND Elapsed_Time_ms = MIN(Elapsed_Time_ms) OVER()
--             AND Logical_Reads = MIN(Logical_Reads) OVER()
--            THEN &#039;BEST&#039;
--            ELSE &#039;NOT BEST&#039;
--        END AS PerformanceFlag,
        ExecutionTime
    FROM dbo.QueryPerformanceLog
    WHERE TestName = @TestName
    ORDER BY ExecutionTime DESC

    TRUNCATE TABLE dbo.QueryPerformanceLog;
END;
GO

▶️ Usage Examples

1️⃣ Multiple Stored Procedures

EXEC dbo.Compare_Query_Performance
    @TestName = &#039;SP_Comparison_Test&#039;,
    @QueryList = 
    &#039;EXEC dbo.GetEmployees_V1 |
     EXEC dbo.GetEmployees_V2 |
     EXEC dbo.GetEmployees_V3&#039;;

2️⃣ Multiple Queries

EXEC dbo.Compare_Query_Performance
    @TestName = &#039;Query_Comparison_Test&#039;,
    @QueryList = 
    &#039;SELECT * FROM Employees WHERE Salary > 50000 |
     SELECT EmpID, EmpName FROM Employees WHERE Salary > 50000 |
     SELECT EmpID FROM Employees WHERE Salary > 50000&#039;;

3️⃣ Variable Based Queries

DECLARE @Q1 NVARCHAR(MAX) = &#039;SELECT * FROM Employees WHERE DepartmentID = 1&#039;;
DECLARE @Q2 NVARCHAR(MAX) = &#039;SELECT EmpID, EmpName FROM Employees WHERE DepartmentID = 1&#039;;

EXEC dbo.Compare_Query_Performance
    @TestName = &#039;Variable_Query_Test&#039;,
    @QueryList = @Q1 + &#039; | &#039; + @Q2;

🧠 Important Points

🔹 Statistics capture via DMVs → CPU, Elapsed Time, Logical Reads
🔹 BEST flag → Jiska CPU + Time + IO sabse kam

Main Purpose of Statistics

Example:

How Statistics Work Internally

When Statistics Go Outdated

Summary of All Columns in Your Query

Statistics in @Table Variable vs #Temp Table

1️⃣ Statistics in Table Variable (@Table)

🔹 Default Behavior

🔹 Example

❌ Problem

🔹 Why No Statistics?

⚠️ SQL Server 2019+ Improvement (Deferred Compilation)

2️⃣ Statistics in Temporary Table (#Temp Table)

🔹 Default Behavior

🔹 Example

✅ Optimizer kya karega?

📊 Comparison: @Table vs #Temp (Statistics Perspective)

🧠 Real-Life Office Scenario

❌ Using @Table

✅ Using #Temp Table

⚡ When Should You Use What?

✅ Use @Table when:

✅ Use #Temp Table when:

🛠️ Best Practices (Production Level)

#temp table par statistics tab bhi banti hai chahe tum PRIMARY KEY / index na banao.

🔹 Case 1: #Temp Table without Primary Key / Index

✔ Kya hoga?

❌ Lekin kya problem hogi?

🔹 Case 2: #Temp Table with Primary Key / Index

✔ Kya extra benefit milega?

📊 Comparison

✅ Performance Comparator ( | Separated Queries / SPs )

🔧 Step 1: Log Table (Same)

🔧 Step 2: Stored Procedure ( | Separator )

▶️ Usage Examples

1️⃣ Multiple Stored Procedures

2️⃣ Multiple Queries

3️⃣ Variable Based Queries

🧠 Important Points

Latest comments

Statistics in `@Table` Variable vs `#Temp` Table

1️⃣ Statistics in Table Variable (`@Table`)

2️⃣ Statistics in Temporary Table (`#Temp Table`)

📊 Comparison: `@Table` vs `#Temp` (Statistics Perspective)

❌ Using `@Table`

✅ Using `#Temp Table`

✅ Use `@Table` when:

✅ Use `#Temp Table` when:

🔹 Case 1: `#Temp Table` without Primary Key / Index

🔹 Case 2: `#Temp Table` with Primary Key / Index

✅ Performance Comparator ( `|` Separated Queries / SPs )

🔧 Step 2: Stored Procedure ( `|` Separator )