These functions are commonly used with groupBy() , agg() , or select() to compute things like sum, average, max, min, count, etc. PySpark functions come fr…
Author
SQLDataDev Editorial Team
Mar 19, 2026 3 min read
These functions are commonly used with groupBy(), agg(), or select() to compute things like sum, average, max, min, count, etc. PySpark functions come from pyspark.sql.functions, which includes a wide variety of operations like aggregation, date/time, string, and more.
🔹 1. Aggregation Functions
These are used to perform calculations on a group of rows.
Function
Description
Example
count()
Count number of rows
df.select(count("*"))
sum()
Sum of column values
df.select(sum("salary"))
avg()
Average of column values
df.select(avg("salary"))
max()
Maximum value
df.select(max("salary"))
min()
Minimum value
df.select(min("salary"))
mean()
Alias for avg
df.select(mean("salary"))
Python
from pyspark.sql.functions import count, sum, avg, max, min
df.select(count("*"), sum("salary"), avg("salary")).show()
from pyspark.sql.functions import current_date, datediff, year
df.select(current_date(), datediff(col("end_date"), col("start_date")), year("start_date")).show()