Back to all posts

Statistics in SQL Server

Statistics SQL Server ke “mind” jaise hote hain. Ye SQL Server ko batate hain ki table me data kis tarah se distribute hai — jaise kitne values unique hain…

Statistics SQL Server ke “mind” jaise hote hain. Ye SQL Server ko batate hain ki table me data kis tarah se distribute hai — jaise kitne values unique hain, kis column me zyada repeat hone wale data hai, aur kis range me values aati hain.


Main Purpose of Statistics

Statistics ka kaam hai Query Optimizer ko help karna best execution plan choose karne me.
Matlab — SQL Server ko jab koi query milti hai, to wo andaza lagata hai ki:

  • Kitni rows return hongi,
  • Kaunsa index use karna best rahega,
  • Kis type ka join fast chalega (Nested Loop, Merge, ya Hash Join).

Ye saare decisions statistics ke basis pe hi liye jaate hain.


Example:

Maan le tere paas ek table hai Employees, jisme ek column hai DepartmentID.
Ab agar tere table me 1 lakh rows hain, aur 90% log DepartmentID = 1 me hain —
to agar tu likhe:

SQL
SELECT * FROM Employees WHERE DepartmentID = 1;

SQL Server ko pata hai ki ye query bohot saari rows return karegi,
to wo decide karega “Table Scan is better” (kyunki data zyada hai).

Lekin agar likhe:

SQL
SELECT * FROM Employees WHERE DepartmentID = 10;

aur Statistics bataye ki department 10 me sirf 50 rows hain —
to SQL Server bolega “Index Seek better rahega”.


How Statistics Work Internally

  • Jab tu index banata hai, SQL Server automatically statistics bhi create karta hai.
  • Non-indexed columns ke liye bhi statistics manually create kar sakte ho: CREATE STATISTICS stat_EmpSalary ON Employees(Salary);
  • Ye statistics ek histogram store karta hai (data distribution ka chart jaise).
    Histogram batata hai — kis range me kitni values hain.

When Statistics Go Outdated

Agar table me data bar-bar update/insert/delete ho raha hai,
to statistics outdated ho sakti hain.
Result? SQL Server galat execution plan choose karega → Query slow chalegi.

Fix:

  • Auto Update Statistics ON rakho (default hota hai).
  • Ya manually update karo: UPDATE STATISTICS Employees;
SQL
CREATE OR ALTER PROCEDURE Update_Outdated_Statistics
    @Mode INT=2,                                -- 1 = Show data, 2 = Update
    @Threshold DECIMAL(5,2) = 0.20,           -- Default 20%
    @SchemaName SYSNAME = NULL,               -- Optional schema filter
    @TableName SYSNAME = NULL                 -- Optional table filter
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE @StartTime DATETIME = GETDATE();
    PRINT '====================================================';
    PRINT 'Process started at: ' + CONVERT(VARCHAR(20), @StartTime, 120);
    PRINT 'Threshold set to: ' + CAST(@Threshold * 100 AS NVARCHAR) + '%';
    PRINT '====================================================';

    -- Create log table if not exists
    IF OBJECT_ID('dbo.StatsUpdateLog', 'U') IS NULL
    BEGIN
        CREATE TABLE dbo.StatsUpdateLog (
            LogID INT IDENTITY(1,1) PRIMARY KEY,
            TableName SYSNAME,
            StatName SYSNAME,
            UpdatedOn DATETIME DEFAULT GETDATE(),
            Status NVARCHAR(50)
        );
    END

    -- Mode 1: Show outdated stats
    IF @Mode = 1
    BEGIN
        PRINT 'Step 1: Checking outdated statistics...';

        SELECT 
            OBJECT_SCHEMA_NAME(s.object_id) AS SchemaName,
            t.name AS TableName,
            s.name AS StatName,
            sp.last_updated,
            sp.rows,
            sp.rows_sampled,
            sp.modification_counter,
            CASE 
                WHEN sp.modification_counter > (sp.rows * @Threshold) THEN 'Update Needed'
                ELSE 'OK'
            END AS Status
        FROM sys.stats AS s
        INNER JOIN sys.objects AS t ON s.object_id = t.object_id
        CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
        WHERE t.type = 'U'
          AND (@SchemaName IS NULL OR OBJECT_SCHEMA_NAME(s.object_id) = @SchemaName)
          AND (@TableName IS NULL OR t.name = @TableName)
        ORDER BY sp.last_updated DESC;

        PRINT 'Report completed. Use @Mode = 2 to update outdated statistics.';
    END

    -- Mode 2: Update outdated stats
    ELSE IF @Mode = 2
    BEGIN
        PRINT 'Step 2: Updating outdated statistics...';

        DECLARE @sql NVARCHAR(MAX) = N'';
        DECLARE @ErrorMsg NVARCHAR(4000);

        SELECT @sql = @sql + 
            'BEGIN TRY ' + CHAR(13) +
            'UPDATE STATISTICS ' + QUOTENAME(OBJECT_SCHEMA_NAME(s.object_id)) + '.' + 
            QUOTENAME(OBJECT_NAME(s.object_id)) + 
            ' (' + QUOTENAME(s.name) + ') WITH FULLSCAN; ' + CHAR(13) +
            'INSERT INTO dbo.StatsUpdateLog (TableName, StatName, Status) ' +
            'VALUES (''' + OBJECT_NAME(s.object_id) + ''', ''' + s.name + ''', ''Updated''); ' + CHAR(13) +
            'END TRY BEGIN CATCH ' +
            'INSERT INTO dbo.StatsUpdateLog (TableName, StatName, Status) ' +
            'VALUES (''' + OBJECT_NAME(s.object_id) + ''', ''' + s.name + ''', ''Failed: ' + 
            REPLACE(ERROR_MESSAGE(), '''', '''''') + '''); ' +
            'END CATCH;' + CHAR(13)
        FROM sys.stats AS s
        CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
        INNER JOIN sys.objects AS t ON s.object_id = t.object_id
        WHERE t.type = 'U'
          AND sp.modification_counter > (sp.rows * @Threshold)
          AND (@SchemaName IS NULL OR OBJECT_SCHEMA_NAME(s.object_id) = @SchemaName)
          AND (@TableName IS NULL OR t.name = @TableName);

        IF LEN(@sql) > 0
        BEGIN
            PRINT 'Running updates...';
            EXEC sp_executesql @sql;
            PRINT 'All outdated statistics have been updated.';
        END
        ELSE
        BEGIN
            PRINT 'All statistics are already up to date.';
        END
    END
    ELSE
    BEGIN
        PRINT 'Invalid parameter! Use @Mode = 1 for report or @Mode = 2 to update.';
    END

    DECLARE @EndTime DATETIME = GETDATE();
    PRINT '====================================================';
    PRINT 'Process completed at: ' + CONVERT(VARCHAR(20), @EndTime, 120);
    PRINT 'Total Duration (sec): ' + CAST(DATEDIFF(SECOND, @StartTime, @EndTime) AS NVARCHAR);
    PRINT '====================================================';
END;
GO

Summary of All Columns in Your Query

Column NameMeaningExample ValueNotes
t.nameTable ka naamEmployeeMasterFrom sys.objects — user table ka actual name
s.nameStatistic ka naam_WA_Sys_00000002_7B264821Har index ya column ke liye SQL Server automatically statistics banata hai
sp.last_updatedStatistics last time kab update hue2025-10-05 14:20:35.450Batata hai ki ye stats kitne purane hain — agar bahut purane ho gaye to refresh karna chahiye
sp.rowsTable me total rows (stats banate time)1000000Ye count stats ke banne ke time ka snapshot hota hai
sp.rows_sampledKitni rows sample hui thi statistics banate time200000Agar FULLSCAN hua to rows_sampled = rows, warna kam hota hai
sp.modification_counterLast stats update ke baad kitni rows badli (insert/update/delete)300000High value means data change hua hai → stats update needed
Status (calculated column)Batata hai ki stats update karni hai ya nahiUpdate Needed / OKAgar modification_counter > (rows * 0.20) → “Update Needed”

Statistics in @Table Variable vs #Temp Table

SQL Server me hum do tarah ke temporary storage use karte hain:

1️⃣ Table Variable → @MyTable
2️⃣ Temporary Table → #MyTempTable

Dono ka kaam data ko temporary store karna hai, lekin Statistics ka behavior dono me completely different hota hai, aur ye performance pe direct impact dalta hai.


1️⃣ Statistics in Table Variable (@Table)

🔹 Default Behavior

SQL Server @table variable ke liye statistics create hi nahi karta (SQL Server 2019 se pehle).

Iska matlab:
👉 Query Optimizer hamesha assume karta hai ki sirf 1 row hai.

Chahe tumne 10 rows insert ki ho ya 10 lakh —
Optimizer bolega: “Mere hisaab se to sirf 1 row hai”.


🔹 Example

SQL
DECLARE @Emp TABLE (EmpID INT, Salary INT);

INSERT INTO @Emp
SELECT EmpID, Salary
FROM Employees;   -- Maan lo 5 lakh rows hain

SELECT *
FROM @Emp
WHERE Salary > 50000;

❌ Problem

Optimizer sochega:
👉 “Bas 1 row hogi, index ki zarurat nahi, nested loop best hai.”

Reality:
👉 5 lakh rows process ho rahi hain → slow query, high CPU, wrong join method.


🔹 Why No Statistics?

Table variables:

  • Memory optimized design ke liye bane the
  • Lightweight object maane jaate hain
  • SQL Server unpe cost based estimation nahi karta

Isliye:
👉 Row count estimate = 1 (by default)


⚠️ SQL Server 2019+ Improvement (Deferred Compilation)

SQL Server 2019 me Table Variable Deferred Compilation introduce hua.

Matlab:

  • Query compile hote waqt nahi
  • Actual execution ke time row count dekhta hai
  • Fir better execution plan banata hai

But:
👉 Ye complex queries, joins aur large datasets me phir bhi temp table se weak hota hai.


2️⃣ Statistics in Temporary Table (#Temp Table)

🔹 Default Behavior

#Temp table ke liye SQL Server:
Automatically statistics create karta hai
✅ Data distribution samajhta hai
✅ Best execution plan choose karta hai


🔹 Example

SQL
CREATE TABLE #Emp (EmpID INT, Salary INT);

INSERT INTO #Emp
SELECT EmpID, Salary
FROM Employees;   -- 5 lakh rows

SELECT *
FROM #Emp
WHERE Salary > 50000;

✅ Optimizer kya karega?

  • Salary column ka histogram dekhega
  • Andaza lagayega: “Kitni rows return hongi?”
  • Decide karega:
    • Index Seek vs Table Scan
    • Hash Join / Merge Join / Nested Loop

Result:
👉 Better plan, fast execution, low resource usage


📊 Comparison: @Table vs #Temp (Statistics Perspective)

Feature@Table Variable#Temp Table
Statistics created?❌ No (2019 se pehle)✅ Yes
Row count estimation❌ Always 1 row✅ Actual data based
Histogram available❌ No✅ Yes
Execution plan quality❌ Often wrong✅ Optimized
Suitable forSmall data (few rows)Large data / complex joins
Indexes allowedLimitedFull indexing supported
Recompile based on data❌ No✅ Yes

🧠 Real-Life Office Scenario

Maan lo:

  • Tum ek Sales Dashboard bana rahe ho
  • 10 lakh transactions ka data hai
  • Tum intermediate result ko store kar rahe ho

❌ Using @Table

SQL
DECLARE @Sales TABLE (OrderID INT, Amount DECIMAL(10,2));

➡ Optimizer sochega: 1 row
➡ Join galat choose hoga
➡ Dashboard slow, timeout, client complain 😐


✅ Using #Temp Table

SQL
CREATE TABLE #Sales (OrderID INT, Amount DECIMAL(10,2));

➡ SQL Server statistics banayega
➡ Correct row estimation
➡ Best join type
➡ Dashboard fast 🚀


⚡ When Should You Use What?

✅ Use @Table when:

  • Data bahut kam ho (1–100 rows)
  • Simple logic, no heavy joins
  • Scalar operations, small lookups
  • Functions / short stored procedures

✅ Use #Temp Table when:

  • Data large ho (1000+ rows)
  • Multiple joins / aggregations
  • Performance critical queries
  • Reporting / ETL / analytics
  • You need indexes + accurate statistics

🛠️ Best Practices (Production Level)

✔ Large dataset? → Always prefer #Temp Table
✔ Multiple joins? → Avoid @Table
✔ Debugging & tuning? → Use #Temp so you can see stats and execution plan
✔ SQL 2019+? → Table variable improved hai, but still not equal to temp table
✔ ETL / Reporting? → #Temp only

#temp table par statistics tab bhi banti hai chahe tum PRIMARY KEY / index na banao.

Lekin… performance aur execution plan par farq zaroor padta hai.
Step-by-step samajhte hain.


🔹 Case 1: #Temp Table without Primary Key / Index

SQL
CREATE TABLE #Emp
(
    EmpID INT,
    Salary INT
);

INSERT INTO #Emp
SELECT EmpID, Salary
FROM Employees;

✔ Kya hoga?

1️⃣ SQL Server:

  • Column-level statistics automatically create karega
  • Histogram banega (data distribution ke liye)

2️⃣ Query Optimizer:

  • Ye samajh paayega ki kitni rows hain
  • Ye estimate kar paayega:
    • kitni rows filter ke baad aayengi
    • kaunsa join type best rahega

❌ Lekin kya problem hogi?

  • Koi index nahi hai → Data search ke liye:
    • Table Scan lagega
  • Matlab:
    • Statistics sahi hai
    • Par access path slow ho sakta hai

👉 Stats milengi, par seek nahi hoga → sirf scan


🔹 Case 2: #Temp Table with Primary Key / Index

SQL
CREATE TABLE #Emp
(
    EmpID INT PRIMARY KEY,
    Salary INT
);

✔ Kya extra benefit milega?

1️⃣ Primary Key banate hi:

  • Clustered Index ban jaata hai (default)
  • SQL Server index statistics bhi create karta hai

2️⃣ Query Optimizer:

  • Histogram bhi dekhega
  • Index bhi use karega

👉 Result: Index Seek + Accurate row estimation = Fast query 🚀


📊 Comparison

ScenarioStatisticsIndexExecution PlanPerformance
#Temp without PK✅ Yes❌ NoTable ScanMedium / Slow (large data)
#Temp with PK✅ Yes✅ YesIndex Seek/Table ScanFast
@Table variable❌ No (pre-2019)LimitedPoor planSlow

Bilkul Boss 👍 — comma , ki jagah pipe | separator use kar lete hain.
Logic same rahega, bas split character change kar denge.

Neeche final clean version hai jo:

| separated SP / queries accept kare
✅ CPU, Time, IO compare kare
✅ End me BEST flag de


✅ Performance Comparator ( | Separated Queries / SPs )

🔧 Step 1: Log Table (Same)

SQL
IF OBJECT_ID('dbo.QueryPerformanceLog', 'U') IS NULL
BEGIN
    CREATE TABLE dbo.QueryPerformanceLog
    (
        LogID INT IDENTITY(1,1) PRIMARY KEY,
        TestName NVARCHAR(200),
        QueryText NVARCHAR(MAX),
        CPU_Time_ms BIGINT,
        Elapsed_Time_ms BIGINT,
        Logical_Reads BIGINT,
        ExecutionTime DATETIME DEFAULT GETDATE()
    );
END
GO


🔧 Step 2: Stored Procedure ( | Separator )

SQL
CREATE OR ALTER PROCEDURE dbo.Compare_Query_Performance
(
    @TestName NVARCHAR(200),
    @QueryList NVARCHAR(MAX)   -- | separated SPs / Queries
)
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE 
        @Query NVARCHAR(MAX),
        @Pos INT = 1,
        @NextPos INT,
        @Len INT = LEN(@QueryList);

    PRINT '===========================================';
    PRINT 'Performance Test Started: ' + @TestName;
    PRINT '===========================================';

    WHILE @Pos <= @Len
    BEGIN
        SET @NextPos = CHARINDEX(&#039;|&#039;, @QueryList, @Pos);
        IF @NextPos = 0 
            SET @NextPos = @Len + 1;

        SET @Query = LTRIM(RTRIM(SUBSTRING(@QueryList, @Pos, @NextPos - @Pos)));

        PRINT &#039;-------------------------------------------&#039;;
        PRINT &#039;Executing: &#039; + @Query;
        PRINT &#039;-------------------------------------------&#039;;

        -- Metrics BEFORE execution
        DECLARE 
            @CPU_Before BIGINT,
            @Elapsed_Before BIGINT,
            @LogicalReads_Before BIGINT;

        SELECT  
            @CPU_Before = SUM(total_worker_time),
            @Elapsed_Before = SUM(total_elapsed_time),
            @LogicalReads_Before = SUM(total_logical_reads)
        FROM sys.dm_exec_query_stats;

        -- Execute SP / Query
        EXEC sp_executesql @Query;

        -- Metrics AFTER execution
        DECLARE 
            @CPU_After BIGINT,
            @Elapsed_After BIGINT,
            @LogicalReads_After BIGINT;

        SELECT  
            @CPU_After = SUM(total_worker_time),
            @Elapsed_After = SUM(total_elapsed_time),
            @LogicalReads_After = SUM(total_logical_reads)
        FROM sys.dm_exec_query_stats;

        -- Save result
        INSERT INTO dbo.QueryPerformanceLog
        (
            TestName,
            QueryText,
            CPU_Time_ms,
            Elapsed_Time_ms,
            Logical_Reads
        )
        VALUES
        (
            @TestName,
            @Query,
            (@CPU_After - @CPU_Before) / 1000,        
            (@Elapsed_After - @Elapsed_Before) / 1000,
            (@LogicalReads_After - @LogicalReads_Before)
        );

        SET @Pos = @NextPos + 1;
    END

    PRINT &#039;===========================================&#039;;
    PRINT &#039;Test Completed.&#039;;
    PRINT &#039;===========================================&#039;;

    -- Final Comparison with BEST Flag

    
      
    SELECT 
        TestName,
        QueryText,
        CPU_Time_ms,
        Elapsed_Time_ms,
        Logical_Reads,
        --CASE 
--            WHEN CPU_Time_ms = MIN(CPU_Time_ms) OVER()
--             AND Elapsed_Time_ms = MIN(Elapsed_Time_ms) OVER()
--             AND Logical_Reads = MIN(Logical_Reads) OVER()
--            THEN &#039;BEST&#039;
--            ELSE &#039;NOT BEST&#039;
--        END AS PerformanceFlag,
        ExecutionTime
    FROM dbo.QueryPerformanceLog
    WHERE TestName = @TestName
    ORDER BY ExecutionTime DESC

    TRUNCATE TABLE dbo.QueryPerformanceLog;
END;
GO


▶️ Usage Examples

1️⃣ Multiple Stored Procedures

SQL
EXEC dbo.Compare_Query_Performance
    @TestName = &#039;SP_Comparison_Test&#039;,
    @QueryList = 
    &#039;EXEC dbo.GetEmployees_V1 |
     EXEC dbo.GetEmployees_V2 |
     EXEC dbo.GetEmployees_V3&#039;;


2️⃣ Multiple Queries

SQL
EXEC dbo.Compare_Query_Performance
    @TestName = &#039;Query_Comparison_Test&#039;,
    @QueryList = 
    &#039;SELECT * FROM Employees WHERE Salary > 50000 |
     SELECT EmpID, EmpName FROM Employees WHERE Salary > 50000 |
     SELECT EmpID FROM Employees WHERE Salary > 50000&#039;;


3️⃣ Variable Based Queries

SQL
DECLARE @Q1 NVARCHAR(MAX) = &#039;SELECT * FROM Employees WHERE DepartmentID = 1&#039;;
DECLARE @Q2 NVARCHAR(MAX) = &#039;SELECT EmpID, EmpName FROM Employees WHERE DepartmentID = 1&#039;;

EXEC dbo.Compare_Query_Performance
    @TestName = &#039;Variable_Query_Test&#039;,
    @QueryList = @Q1 + &#039; | &#039; + @Q2;


🧠 Important Points

🔹 Statistics capture via DMVs → CPU, Elapsed Time, Logical Reads
🔹 BEST flag → Jiska CPU + Time + IO sabse kam

Plain Text


Keep building your data skillset

Explore more SQL, Python, analytics, and engineering tutorials.