Back to all posts

orderBy() and sort() in PySpark

PySpark provides two functions, sort() and orderBy() , to arrange data in a structured manner. 1. Understanding sort() in PySpark from pyspark.sql.function…

PySpark provides two functions, sort() and orderBy(), to arrange data in a structured manner.

1. Understanding sort() in PySpark

SQL
from pyspark.sql.functions import col

# Sample Data
simpleData = 
columns = 

# Create DataFrame
df = spark.createDataFrame(data=simpleData, schema=columns)
df.show(truncate=False)

# Using sort()
df.sort(df.department.asc(), df.state.desc()).show(truncate=False)
df.sort(col("department").asc(), col("state").desc()).show(truncate=False)

2. Understanding orderBy() in PySpark

SQL
# Using orderBy()
df.orderBy(col("department").asc(), col("state").desc()).show(truncate=False)

3. Difference Between sort() and orderBy()

Both sort() and orderBy() serve the same purpose of sorting data, but orderBy() is preferred in SQL-style queries and provides better readability. Internally, orderBy() is an alias for sort(), meaning both functions behave the same way.

0 likes

Rate this post

No rating

Tap a star to rate

0 comments

Latest comments

0 comments

No comments yet.

Keep building your data skillset

Explore more SQL, Python, analytics, and engineering tutorials.