Back to all posts

orderBy() and sort() in PySpark

PySpark provides two functions, sort() and orderBy() , to arrange data in a structured manner. 1. Understanding sort() in PySpark from pyspark.sql.function…

PySpark provides two functions, sort() and orderBy(), to arrange data in a structured manner.

1. Understanding sort() in PySpark

SQL
from pyspark.sql.functions import col

# Sample Data
simpleData = 
columns = 

# Create DataFrame
df = spark.createDataFrame(data=simpleData, schema=columns)
df.show(truncate=False)

# Using sort()
df.sort(df.department.asc(), df.state.desc()).show(truncate=False)
df.sort(col("department").asc(), col("state").desc()).show(truncate=False)

2. Understanding orderBy() in PySpark

SQL
# Using orderBy()
df.orderBy(col("department").asc(), col("state").desc()).show(truncate=False)

3. Difference Between sort() and orderBy()

Both sort() and orderBy() serve the same purpose of sorting data, but orderBy() is preferred in SQL-style queries and provides better readability. Internally, orderBy() is an alias for sort(), meaning both functions behave the same way.

Keep building your data skillset

Explore more SQL, Python, analytics, and engineering tutorials.