PySpark provides two functions, sort() and orderBy(), to arrange data in a structured manner.
1. Understanding sort() in PySpark
from pyspark.sql.functions import col
# Sample Data
simpleData =
columns =
# Create DataFrame
df = spark.createDataFrame(data=simpleData, schema=columns)
df.show(truncate=False)
# Using sort()
df.sort(df.department.asc(), df.state.desc()).show(truncate=False)
df.sort(col("department").asc(), col("state").desc()).show(truncate=False)
2. Understanding orderBy() in PySpark
# Using orderBy()
df.orderBy(col("department").asc(), col("state").desc()).show(truncate=False)
3. Difference Between sort() and orderBy()
Both sort() and orderBy() serve the same purpose of sorting data, but orderBy() is preferred in SQL-style queries and provides better readability. Internally, orderBy() is an alias for sort(), meaning both functions behave the same way.