PySpark Pivot and Unpivot DataFrame

✅ What is Pivot and Unpivot? Pivot = Convert rows into columns Unpivot = Convert columns into rows 🌀 Sample DataFrame Let’s start with a small DataFrame t…

Mar 19, 2026 2 min read

✅ What is Pivot and Unpivot?

Pivot = Convert rows into columns
Unpivot = Convert columns into rows

🌀 Sample DataFrame

Let’s start with a small DataFrame to work with:

from pyspark.sql.functions import *

data = 

columns = 

df = spark.createDataFrame(data, columns)
df.show()

🔍 Output:

Name	Subject	Score
Himanshu	Math	85
Himanshu	Science	90
Amit	Math	78
Amit	Science	83
Neha	Math	92
Neha	Science	88

🔄 Pivot in PySpark

Goal: Turn Subjects into columns and fill with Score.

pivotDF = df.groupBy("Name").pivot("Subject").agg(first("Score"))
pivotDF.show()

🧾 Output:

Name	Math	Science
Himanshu	85	90
Amit	78	83
Neha	92	88

✅ Explanation:

groupBy("Name"): Group by the student name.
pivot("Subject"): Turn values in "Subject" into column headers.
agg(first("Score")): Get the score (since there is only one score per subject).

🔁 Unpivot in PySpark

PySpark doesn’t have a built-in unpivot() function, but we can simulate it using selectExpr or stack.

Let’s unpivot the above pivotDF back to its original format.

unpivotDF = pivotDF.select(
    "Name",
    expr("stack(2, 'Math', Math, 'Science', Science) as (Subject, Score)")
)
unpivotDF.show()

Dynamic Way==>

# 🔹 Step 1: Dynamically get all columns to unpivot (except the id column)
id_column = "Name"
value_columns = 

# 🔹 Step 2: Create stack expression
stack_expr = "stack({0}, {1}) as (Subject, Marks)".format(
    len(value_columns),
    ', '.join()
)

# 🔹 Step 3: Use selectExpr with dynamic stack
unpivotDF = pivotDF.selectExpr(id_column, stack_expr)

# 🔹 Step 4: Show result
unpivotDF.show(truncate=False)

🧾 Output:

Name	Subject	Score
Himanshu	Math	85
Himanshu	Science	90
Amit	Math	78
Amit	Science	83
Neha	Math	92
Neha	Science	88

🔁 Summary

Operation	What It Does	PySpark Code Example
Pivot	Rows ➡ Columns	`df.groupBy().pivot()`
Unpivot	Columns ➡ Rows (using `stack`)	`df.select(expr("stack(...)"))`

🎯 Real-world use case:

Scenario: HR wants to see attendance status by day.

Employee	Day	Status
Himanshu	Monday	Present
Himanshu	Tuesday	Absent

Pivoted:

Employee	Monday	Tuesday
Himanshu	Present	Absent

And vice versa (Unpivoted).

# Sample pivoted data
data = 
columns = 

pivotDF = spark.createDataFrame(data, columns)

# 🔹 Step 1: Dynamically get all columns to unpivot (except the id column)
id_column = "Product"
value_columns = 

# 🔹 Step 2: Create stack expression
stack_expr = "stack({0}, {1}) as (Country, Amount)".format(
    len(value_columns),
    ', '.join()
)

# 🔹 Step 3: Use selectExpr with dynamic stack
unpivotDF = pivotDF.selectExpr(id_column, stack_expr)

# 🔹 Step 4: Show result
unpivotDF.show(truncate=False)

🌀 Sample DataFrame

🔍 Output:

🔄 Pivot in PySpark

🧾 Output:

✅ Explanation:

🔁 Unpivot in PySpark

🧾 Output:

🔁 Summary

🎯 Real-world use case:

Latest comments