Groupby in Pyspark

Function Description Example count() Counts the number of rows per group df.groupBy("col").count() mean() Returns the average value per group df.groupBy("c…

Mar 19, 2026 2 min read

Function	Description	Example
`count()`	Counts the number of rows per group	`df.groupBy("col").count()`
`mean()`	Returns the average value per group	`df.groupBy("col").mean("value")`
`max()`	Returns the maximum value per group	`df.groupBy("col").max("value")`
`min()`	Returns the minimum value per group	`df.groupBy("col").min("value")`
`sum()`	Returns the total sum per group	`df.groupBy("col").sum("value")`
`avg()`	Returns the average value per group	`df.groupBy("col").avg("value")`
`agg()`	Performs multiple aggregations at once	`df.groupBy("col").agg(sum("val"), avg("val"))`
`pivot()`	Converts unique values in a column into new columns	`df.groupBy("col").pivot("col2").sum("val")`

# Sample Data
data = 

# Creating DataFrame
columns = 
df = spark.createDataFrame(data, columns)

# Show original DataFrame
print("Original DataFrame:")
df.show()

# 1. Count the number of employees in each department
print("Count of employees per department:")
df.groupBy("Department").count().show()

# 2. Get the mean (average) salary per department
print("Mean (Average) salary per department:")
df.groupBy("Department").mean("Salary").show()

# 3. Get the maximum salary per department
print("Maximum salary per department:")
df.groupBy("Department").max("Salary").show()

# 4. Get the minimum salary per department
print("Minimum salary per department:")
df.groupBy("Department").min("Salary").show()

# 5. Get the total (sum) salary per department
print("Total salary per department:")
df.groupBy("Department").sum("Salary").show()

# 6. Get the average salary per department (same as mean)
print("Average salary per department:")
df.groupBy("Department").avg("Salary").show()

# 7. Perform multiple aggregations at once using agg()
print("Multiple aggregations per department:")
df.groupBy("Department").agg(
    count("Name").alias("Employee_Count"),
    sum("Salary").alias("Total_Salary"),
    avg("Salary").alias("Avg_Salary"),
    max("Salary").alias("Max_Salary"),
    min("Salary").alias("Min_Salary")
).show()

# 8. Pivot: Convert unique values in 'Name' column into new columns
print("Pivot DataFrame based on Employee Names:")
df.groupBy("Department").pivot("Name").sum("Salary").show()

Latest comments